Today James and myself (Nicola) from the JISC GECO team are at the UK Data Archive, based at the leafy University of Essex campus in Colchester, for the INSPIRE for Social Sciences event taking place all day at the UK Data Archive. We will be live-blogging the day here and welcome your comments. The UKDA will be making the presentations available after today’s event and we will link to those resources as soon as they go live. We are also hoping to capture some audio and images of the event – those links will also follow. We will be tweeting with the U.Geo tag/hashtag #geoukda today.
Please note that as this is a liveblog it is subject to error, typos, etc. Please do let us know if you spot an important correction, update or link that we should note.
The Programme for the day (which will become headings for this blog as we go through the day) is:
Welcome and introduction – James Reid, EDINA and Veerle Van den Eynden, UK Data Archive
Veerle is welcoming us. The idea of today is to share experience of what we are all doing but also to discuss what we can do going forward. And this came out of the U.Geo project funded by JISC which has been assessing the geo potential of the Data Archive. The UKDA has been looking at INSPIRE for several years and we will talk about this more later on.
James is starting by asking how many of those room know what INSPIRE is and roughly what it means – it looks like about half of those here. At EDINA we provide various geospatial services and it’s from that context that we are interested in INSPIRE and from the point of view of data being pumped back out into the university community. We wanted to think about what would be eligable to fall under INSPIRE and what that would mean – I will talk about that later. The other thing to say is that there has been a whole raft of geospatial projects funded by JISC under their geospatial programme (these are the #jiscgeo projects that we blog about here). My own project in that strand is the Geopatial Engagement and Community Outreach project and one of our strands is to specifically look at the INSPIRE directive and case studies around this. Do ask me about this at any point if you’d like more information. There’ll be opportunity for questions and discussions later.
INSPIRE and the National Data Strategy – Peter Elias, ESRC Strategic Advisor Data Resources
I will mainly be talking about the National Data Strategy and the opportunities provided.
It’s the way of ensuring that the data we need for research is made available. Sustaining, enhancing and supporting data is an important part of this process. This work begin in 2009 and we committed to assess the impact of the UK Location Strategy and what needs to be done to improve data resources around that. I will talk today about progress and what needs to be done to improve research access to geospatial resource. The way we approach this is from a research perspective. We took the view that the INSPIRE directive would make available many more sources of data with a geospatial dimension. So what do we do with that as a research community? How do we avail ourselves of that? What are the needs and requirements for Researchers around spatially orientated data and resources.
A couple of years ago (2009) Anne Green and David Own at Warwick undertook research speaking to experts in this field – from the policy perspective or from a research perspective. Senior professors, leaders, people in research centres etc. They asked them about how they saw the INSPIRE directive panning out over the next few years. Then they spoke to users – geographers etc – to ask about how they saw research availability. And then, the most interesting part of their work, they conducted a web survey – we usually call these web enquiries but in this case the population contacted was people who had downloaded data from the UK Data Archive. We created a very short web survey. Questions about their knowledge and use of geospatial information, what they saw as their knowledge and skills, but also lots of space for them to write about their needs and requirements. We took this data, we wrote a report that is available from the ESRC website and that’s a very interesting perspective about the way in which INSPIRE will evolve over the next few years.
Of these 5oo+ respondents who took part in this survey – all empirical social scientists with experience of geospatial data – around 25% claimed to be experts of working with this data, they were the geographers. The majority said they had some knowledge but little proficiency with geospatial data and resources – including many of the experts. And for this group their needs were:
- Geospatial linking services – a very interesting finding to reflect upon – to link data sets, to put a geospatial link and then do something with it
- Mapping and Visualisation Services
- Advice and guidance on using geospatial information
These are significant needs and that led to the ESRC putting forward to the UK Data Forum – who guide the UK National Data Structure across sectors. We presented a recommendation that the true potential of geospatial information was not being realised by social science researchers. Although that data will increase – through routes like data.gov.uk – these resources are going to provide us with much more information which has the potential for geospatial analysis. Our concern is that we do not have sufficient people with the appropriate skills that will keep Britain at the the top of contributions to social sciences research
We proposed a UK spatial data advisory service. We recommended a single centralised location but with a virtual presence. First of all information about access – where to go not necessarily a data repository, but a repository of metadata information, advice and guidance on geospatial information, geospatial data linking services- tools if you will, and ways to build mash-ups and location based visualisation. The response when we presented this was “your teaching your grandmother to eat eggs”. There was less enthusiasm as the attendees saw this as their own expertise being diluted by a group with other skills but could be seen by them as having less expertise in their academic areas. We didn’t see it this way. Increasingly there is a need for social scientists to engage with other disciplines – particularly in the area of environmental science for instance – and to do this around geospatially enabled data.
We made recommendations about skills. The main aim of training must be to promote the effective and appropriate use of geospatial data to address substantive research questions. Training directed towards specific communities of interest. Case studies etc. The views from the workshop on a Geospatial Resources Advisory Service asked whether the support needed to be close to the need. A lot of concern about the cost of this and how it would have domain distributed relevance. And how could the impact of such a service be measured? And Peter Burnhill of EDINA raised lots of ideas and questions that look at how such an advisory service should link to existing resources and support.
The Geospatial Advisory Service has not been implemented as the resources are limited post Comprehensive Spending Review. Also JISC pulled in their budgets at that time. JISC and the ESRC both said the time is not right. But we have not lost sight of the ball. First of all JISC is looking hard at what it’s Geospatial Working Group (GWG) does and plans for what role a new GWG will have – you can see this on the web. And that’s important. JISC are coming round to the view that it is among the generalists that there is a real need to bridge the gap and to undertake the kind of recommendations made in our review. Various things are happening within ESRC which are keeping this need addressed to some extent. The third round of the NCRM (National Centre for Research Methods) commissioning stressed the need for geospatial skills training.
Also, and this is very new news, the ESRC will be bringing together the various aspects of ESDS in a more coherent way. See the commissioning document of the Core Service on he ESRC website. Doing this and adding initial value added services is Stage 1. In Stage 2 we will be adding specialist Census components to the core. Stage 3 will add not yet specified value added services in the future – it’s there that we will want to give consideration to a geospatial advisory service and the way in which geospatial services will be added to and built onto the core for the new UK Data Service.
That’s where we now stand. I’ve said very little about INSPIRE. Except that we think INSPIRE will change the landscape for the social sciences research community in terms of the data that we have available and we don’t yet have the resource available to take advantage of the opportunities through this. We will talk more about this later today. I want you to think hard about the recommendations we’ve made already about how to extract better social science research and findings from geospatial data.
Q1 – James, GECO) Have the ESRC looked at the directive and, in practical terms, considered what the detail of INSPIRE means for ESRC as an organisation.
A1) Not personally but those who have have told me that if it is to be implemented – and it will – then we need more information. And we need to look also at the open data initiatives and the way in which lots of government departments are putting data into the public domain and regulation coming in around this. This is a whole philosophy around data sharing and that giving data to the community will be useful. My argument is that I’,m not sure that just making data available will be helpful, we need to have the skills to analyse that data from a proper research perspective. Being able to use and enhance your research with this data is what we are looking at very carefully. It’s those skills we think are in shortage.
Q2) Peter your survey found a need for visualisation tools and services – have you done anything to progress that?
A2) Not directly but under the Digital Social Sciences Agenda, and work from the previous e-Social Science agenda, and work at EDINA around this. But we have nothing approaching what we have in the physical sciences for example and I think we have significant gains to be made here. There is an interesting article form the Geography Society on the huge opportunity here.
Q3) My background is Virtual Reality – what is needed in terms of a large scale 3D Virtual Reality GIS tool?
A3) Well, I’m not sure but if you would like ESRC support for that sort of work we’d welcome suggestions.
INSPIRE – an overview and an introduction to data specifications – James Reid, EDINA
So I am going to give a quick overview of INSPIRE – I’m a bit jealous of Peter not needing to read the directive over in huge detail.
The aim of INSPIRE is to:
create a European Spatial Data Infrastructure (SDI) to improve the sharing of “spatial information between public authorities and improve accessibility to the public.
This is designed to allow the EC – and you need to remember that this is a European directive – and Member States to design and deliver better environmental policies that will result in improved environmental outcomes. And to improve the quality and quantity of information allowing it to be combined and interoperable in useful ways. Interoperability is implicit and important – the example here for instance is things like using data in crisis situations.
What are the issues and opportunities for opening up content. In the UK there are two drivers. There is the UK Location Programme and this had been going for a good five years. It is a cross domain public sector effort to establish the UK SDI and to partially realise UK obligations, under INSPIRE, to make materials discoverable and interoperable.
There are some issues – the definition of “public authority” in the UK regulations around FOI (Freedom of Information) includes Universities and Research Councils. This means that they are required to comply with the Directive. But there is a clause about whether it is the universities “public task” to be INSPIRE compliant. We have sought advice from the Scottish Information Commissioner thinks that it is “probably” part of a university public task. Park that issue but assume that as a public authority universities need to be INSPIRE compliant. So we have two perspectives to take here:
- Academia as Data Provider/Creator
- Academia as Data User (as Peter touched on)
UK HFE as INSPIRE Data Provider. What does that mean? Well for universities it is unlikely that much of the geospatial data they hold would come under INSPIRE (certainly Annex I and II Themes). But two caveats:
- As the focus shifts to Annex III it is possible that data held in universities might be in scope – e.g. species distribution, habitats, atmospheric conditions
- Studies of environmental change require an understanding of how phenomena changes over time. That requires earlier editions and historical data to be made available and that may be held by univerisities.
In addition the commission indicates “a fundamental right of third parties to enrich the European Spatial Data Infrastructure with data sets currently hidden or difficult to find”. And of course this sits alongside the open data agenda and the Linked Data agenda comes in. These have the same aim even with the delivery mechanism being different.
So, in practice… We need to look at data harmonisation; we need to look at provision of online services on discovery, view (e.g. WMS – Web Map Services), download (implicitly through WFS – Web Feature Services), transform – as a provider you need to indicate compliance with existing schema, the data structure is compliant with INSPIRE – it needs to be accessible and usable to researchers across the EU ; we need to think about licensing arrangements – metadata should be open, no argument there, but it’s not necessarily the case that all services will be free (particularly around downloading for instance). And to meet UK Location programme we need to think about monitoring and co-ordination.
It’s worth remembering that INSPIRE comes from an environmental perspective. Annex I and II both address this fairly specifically but Annex III includes data types that are far more amorphous. Of possible interest to social science are population distribution – demography – this is in scope and there are examples here that indicate that many social science data centres will be brought in. In the Human health and safety section the examples include general statistics on health, causes of poor and good health – such as risk factors, etc.
If you have a weekend to burn I recommend reading D2:5: Generic Conceptual Model and D2.7 Encoding Guidelines. This gives a sense of the themes in INSPIRE and then the detail – and the devil is in the detail – of what this actually mean. Look at the data specification.
What is a data specification? Well if you are a researcher and want your data to be INSPIRE compliant you need to know this, or you need someone to do this for you. You need to be able to publish harmonized data. Each specification has a standard script: two executive summaries at the start; then quite a complex breakdown of things you need to do and have answers for – data content and structure is the super techie bit; and how you assert the rules adhered to and indicate that it is harmonized.
The real scary bit is in Chapter 5 of he Data Specification – Application Schema – UML (Unified Modelling Language). This is important for data publishers. As researchers you just need to know that that this stuff exists and that your data should be made compliant – we can do that for you.
The key thing is to be consistent and use identifiers and consistent identifiers, that object references are appropriate, and that your geometry representation is correct, and that temporal representation is correct and appropriate and how that data will be managed, updated, curated. At the heart of all of this are a series of International Standards around these aspects. Interestingly INSPIRE says little about preservation – if you want a whole new research topic that would be interesting to look into.
The Feature Catalogue is the key section to read as it provides a clear outline of what type of data is in scope and what must be done with that.
In terms of timeline the consultation and testing phase is currently taking place (22 June – 21 October 2011). If you want to put views forwards speak to James who can put forward your views to the group.
So… If we look at UK HFE as an INSPIRE Data Consumer what does that mean? Well it means easier access to data. We conducted a survey of UK academics in the geospatial sector and the same argument came up in terms of access. There are still barriers perceived here. INSPIRE addresses some of this but not how to use that data – as Peter talked about. You have to understand technical aspects – what the data means, how it has been manufactured, what it means – to interpret it. Huge potential here with harmonized data.
The Draft Implementing Rules for INSPIRE are due 3rd-21st September 2012. Done by JRC data specification team with support from TWGS. In the meantime we all need to be aware and participate!
- More info on INSPIRE at: http://inspire.jrc.ec.europa.eu/
- More on UK LP at: http://locations.defra.gov.uk
- More info on INSPIRE and UKLP for HE/FE at: https://geco.blogs.edina.ac.uk/category/inspire/
Q1) What is the aggregation?
A1) It’s not about each data set. Data could be aggregated to different degrees of detail. The assumption is that some of this will be at Eurostat level. But at HE level it’s about whether it’s Public Task or not. But you may separately want to expose your data – for REF as well as for INSPIRE. It’s still early days for ANNEX III. I know population and demography has had lots of change. Each areas has a thematic lead. Aggregated does mean aggregated not individualized.
Q2) What has technology vendors reaction to INSPIRE been?
A2) Well it is a new business opportunity but many are sitting on the sidelines until finalised. Some vendors are enabling automatic creation of metadata. But we, EDINA, are funded by JISC to help you get your metadata INSPIRE compliant so do come to us. But there is a real opportunity here. The Ordnance Survey has created a software stack to help with this. The requirement for a generic metadata publication can easily be done by a tool we have called GoGeo and a metadata creation tool called GeoDoc – this is harvested at European level. Vendors do see the possibilities but small public authorities will have more challenge in the data publishing requirements. It is all about the data but unlike FOI it implies far more work and obligation from the data publisher – including the need to keep data up to date.
Q3) Looking at the Annex III theme, population, if you look at some of the big data sets in UKDA like the census and the UK household survey… that seems likely to fall under Annex III
A3) You need to look in the Feature Catalogue to specifically see. Perhaps if any attribute is in the Feature Catalogue then the whole data set comes under INSPIRE. Perhaps if only 4 attributes are in the Catalogue out of say 5o attributes then perhaps not justifiable. Still unclear. But there are reasons to publish data anyway – it is an opportunity for good data management and a good opportunity to work with outputs of the rest of Europe. But if you’re in this you have to play with the sometimes tortuous rules
Q4) Those important aspects of data harmonization and discoverability that can lead to interoperability are great. However in the social sciences, and the environmental sciences, the data we have come from people and organisations. We have to respect their identities and privacy. This is enshrined in EU legislation as well. The problem as I see it is that when we try to share data at the level of Spatial Definition we are potentially at risk of identifying people and even more so of identifying organisations. And that is a real challenge in terms of how you handle spatial definition but that could mean removing some degrees of accuracy and usefulness of the data. Tell me about what you view about that conflict about protecting data subjects and data sharing more broadly.
A4) On the INSPIRE front the data is anonomised and aggregated..
Q4) Stop there! It depends on how it has been anonomised and aggregated as to whether that is actually protective. People can still been identifiable.
A4) I think in that theme in particular there is a lot of thinking to be done. There is an opt out for INSPIRE more generally for third party rights or IPR issues with publishing. But there hasn’t been any acid test. Annex III is still in definition. The UK research community should feed views back into the working group for consideration although I think this has already been raised at the 2011 INSPIRE conference/working group meeting. I know that that doesn’t answer your question exactly… it is still under consideration. But you are right. And remember that so far we haven’t created any harmonized data sets yet. But now is the time to influence the agenda.
INSPIRE/Data Documentation Initiative (DDI) metadata mapping and the UK Data Archive’s roadmap towards INSPIRE compliance – Tom Ensom and Veerle Van den Eynden, UK Data Archive
Veerle and Tom will bring those first two talks together.
U.Geo has been looking at the geospatial potential of the UKDA – the metadata, the coding and how usable it is. We have looked at the ESRC report and we thought about how UK academics and researchers use our data and how we could improve our data services with geospatial information. Full information is on our blog: http://www.data-archive.ac.uk/about/projects/ugeo
Are UKDA metadata compliant to INSPIRE / GEMINI? We have done metadata mapping and that will be available online (some time next week). This follows on from previous work on data compliance. We had already done some comparison of environmental data schema in the past looking to find compatibility with our own metadata. Our comparison looked at INSPIRE, ESDS, the DDI (Data Documentation Initiative) and we worked with EDINA to get their advice on that mapping. We found that the 13 INSPIRE elements match one-to-one but some need translating or mapping. There are some obvious ESDS metadata catalogue – we do not hold bounding box data, spatial resolution / spatial reference system are not the same, and we need to ensure INSPIRE compliance. We also need to bundle various metadata elements into a single element for INSPIRE.
We have used this analysis to create a roadmap towards UKDA INSPIRE compliance.
- Compliant discovery level metadata – we need to capture new metadata, we need to amend the catalogue structure and controlled vocabulary, and then export data to INSPIRE compliance
- Publish compliant discovery leel metadata – GEMINI schematron, UK Location Programme and GoGeo Harvesting
- Create view services – we haven’t looked at this in detail for UKDA so we need to do more work in this area – I’d be interested in knowing what people here think about how feasible view services are at the appropriate level
So the archive is moving from DDI2 level to DDI3 level and the U.Geo, UK Location and INSPIRE work is feeding into that ongoing future development work. And we need to look at how we will gather data from depositors.
The Immediate implications of the roadmap are recommendations to that DDI3 task team, development/improvement of cataloguing procedure, improvement of existing metadata.
In terms of a Web Mapping Service we are looking at how to do this. We are looking at a resource discovery tool with faceted searching across the survey. It will let researchers look at which units, boundaries, access etc. There is a study view as well as the unit view.
So here is a mock up of our Geo-Browser. We have a core set of studies to choose from and a series of filtering criteria – year, group of studies, and most usefully of all we have the use conditions which have pretty drastic implications for what researchers can do with data. This provides boundary data on a particular unit, further information of the unit and a link to the ESDS catalogue which has additional information. Going to the other view, the unit view, allows views of different types of spatial units – this will also be useful as a separate website. We are trying to ensure we provide what social scientists actually need and this second view isn’t needed for INSPIRE but is useful. So we capture this information and look at linkage around this. A lot of qualitative information on missing time references etc. The Geo-Browser lets you look at how to link the data we have to other data out there, information on limitations etc.
Something else to note. This is sort of a stand alone stage at this stage but it is also about feeding into what our catalogue could be and do.
Q1) I think this is really interesting and I want to know more. Unit view means Spatial Units is that correct? The problem with Spatial Units is that they change through time, how deep will you go, how far back will you go.
A1) Yes, Spatial Units. If you look at something like Postcode they change every year. It is truly difficult to present that information in an interface that is comprehensive and it’s near impossible to track that change over time.
Q1) That button to find data is great, really interesting. And you have that button to download boundary data. I think that could be a problem as many people won’t know what to do with that boundary data. And the next thing to do is to put together that data with that boundary data.
A1) In terms of tracking changes we need to ask for that information, some really interesting work we need to do with our depositors. This project has led to a lot of interesting implications in terms of working with our depositors in the future
Q2) So if I am a researcher depositing data will you apply the DDI/INSPIRE standard to it? What’s the application here? Because the problem can be that researchers just don’t do data compatibility, OA materials etc. Are you providing someone or reference to another organisation to encourage that use?
A2) The idea is that when they come to the archive the metadata is captured in the deposit form. So the first step forward is for the catalogue record created to become INSPIRE compliant. We need to capture that data better to mean there is no cost to make the record INSPIRE compliant. That will require discussion at higher levels here. That’s an important first step. Then the next step of publishing and exposing the metadata is something that the Archive will have to do.
Q3) You said the interface is divorced from your pragmatic DDI version – my concern is that INSPIRE is theoretical but the more practical
A3) In a way the Geo-Browser shouldn’t be there, it should be part of the catalogue and that is very much our long view of how this data will be used. Visualisation not just for geo should also be in the catalogue. But at the moment the metadata is not structured in a way that makes this possible. But all this work is feeding into DDI3 to improve the infrastructure of the catalogue. It’s actually ideal timing because of the catalogue development
Q4) When will the Browser be available? WIll it be updated?
A4) Hopefully next month. This will probably be a snapshot at first but there is hope to develop this on. But we will publish that final DDI table in the next week or so as a product. And there is continued support for working on geospatial and geospatial potential of our data – it doesn’t end with the project.
Q5) The DDI to INSPIRE/GEMINI has an added benefit as GEMINI is automatically published to data.gov and exposing data to a wider audience
A5) And we are seeing that having impact already.
Q6) Surely there is significant intellectual property in that mapping – should that be free and in the public domain?
A6) Well that is with the organisation at the moment but it needs to be used
And with that we are breaking for lunch…
… and we’re back!
INSPIRE in Sweden – Johan Fihn, Swedish National Data Service
Johan begins by saying that he is a programmer rather than a social scientist. At the Swedish National Data Service e do something quite similar to the UKDA. We were established in 2008 and SND is a service organisation for Swedish research within the HUmanities, Social Sciences and Medicine – so a slightly wider remit. And we were the Swedish node in the international network of Data Centres.
As an organisation we support Swedish researchers by facilitating and developing the researchers access to data inside and outside the coutry and offer support for research through out the entire research proecess. We we have a responsibilities for the long term curation and preservation of research data – we are tasked with finding out about all researech data being produced.
INSPIRE in Sweden – In 2006 the Swedish Government commussioned Lantmateriat – the Swedish mapping, cadastral and land registration authority to coordinate geodata nationally and they are coordinating INSPIRE work in Sweden. More information here: http://www.geodata.se/en/
The National infrastructure for geodata brings in a broad range of local authorities, country administrative boards, authorities, organisations etc. They are coordinating a single national implementation for INSPIRE. They are responsible for answering any questions on INSPIRE. The Swedish geodata strategy forms the basis for the National Infrastructure for geodata. There is a business model here. Authorities afected by INSPIRE offer each other their gedata for use in authority-related business – they exchange data. Other information owners can participate by contributing their data to this collaborative project. Other end users buy licenses for the right to use information. And there are “refiners” – those who create services based on geodata. And there are special licence terms for use of data by the general public – a bit like the UK Principle of Public Data/Information. This data should be free to use but the authorities can charge you for the work associated with delivering that data.
There are around 20 authorities with inormation responsibility for INSPIRE. You can see Universities in Sweden are not on this list and are not being seen as needing to comply with INSPIRE but most researchers in Sweden are getting their data from one of these 20 authorities so it is not so clear cut. There is also something called a Contribution Agreement – you can have the right to publish metadata in something called the Geodata portal, which I will show you later, and you therefore have the opportunity to contribute.
The Geodata Council and Swedish standards SIS/Stanli organisation say that you should use the ISO 19100 standard for production of metadata and specifies a standard national taxonomy to accompany this. The Geodata Portal presents geodata and it is the entrance for Sweden’s participation in INSPIRE.
So why is the Swedish National Data Service interested in INSPIRE? We are not legally required to comply but our researchers have begun depositing GIS data in SND (mainly around Archeology) and having discussions about SND being a portal for Archaeology. We will be mapping e.g. ISO19139 and DDI etc.
We haven’t really scratched the surface of INSPIRE so we’re keen to get some insight really.
Q1) Looking at Social Sciences data is that something that the data that government departments produce? Do you supply data directly?
A1) For archealogy data comes from county council boards. You use the GeoData portal to access metadata – and then you request the data from the appropriate authority. We do provide data but not data in the GeoData portal
Q2) The contribution agreement – any significant interest from the universities there?
A2) It’s not that many who have signed those agreements yet. If we were interested in signing one and being part of the portal that would be one thing, I don’t think universities are likely to do this.
Q3) There is a parallel here between the Swedish National Data Service and the Australian National Data Service. As you may know the Australian National Data Service is taking a real lead role in the research data management infrastructure area. Does the Swedish NDS have big aims and how do you plan to achieve those?
A3) The Australian NDS have a system we are looking at. We want to look at combining the National Board of Health and Statistics Sweden where the the data are supposed to be in a secure place and held separately but you run queries in a harmonised way across the different authorities. The discussions in Sweden are at the Science Council level. The latest I’ve heard has been that the cenral point at which the data connects would be sciecnce council
Q3) Are you funded by the Science Council
A3) We are funded by the Science Council and we are supported by the University fo Gothemburg where we are based
Q4) Would the Swedish data portal be the starting point to find any INSPIRE compliant data?
A4) No the Swedish portal is just for Swedish data. If you want to access other data you go to the International data portal
The WISERD GeoPortal – a tool to search, view, map and download social science metadata – Scott Orford & colleague, Wales Institute of Social & Economic Research
WISERD is the Wales Institute of Social ad Economic Research, Data and Methods set up as a major investment in research infrastructure in Economic and Social Sciences in Wales (£4.8 million over 3 years), We were set up look at building research bids and sustainability in Wales. And we also have a major theme around sharing data and we have main themes of: “Knowing” localities, policy analysis and evaluation, building capacity etc. We have set up a WISERD Geoportal (WGP) to enhance the researchers ability to discover quantitative and qualitative sociao-economic research data for Wales to enable geographic as well as thematic and temporal searching and to encourage reuse of existing data and more collaborative research.
There are five types of data we are wanting to bring into WGP: Quantitatuive Survey Data; Qualitative Data; Administrative Data’, “grey” data; Public Linked Data. You can search the WGP textually, spatially – with a specific query, spatially – everything in the area and also spatially and textually. We have developed a proof of concept. We have focused on census data, labour force data, we have the data from the Welsh Rural Observatory, we have some third sector data such as Shelter Cymru. We include 63 surveys of which 30 have responses maps. There is also a lot of administrative data – the pupil level schools attainment records, qualitative data – in depth interviews from locality studies, and we have looked at Grey MTB but we haven’t yet looked at the Linked data yet.
So, for the techie bit. The system consists of three pieces of software. Geoportal itself, bespoke software to create the metadata (qualitative metadata and survey metadata) – we are manually turning those surveys from PDFs etc. into tables. The qualitative metadata is generated via a web form and uses Unlock and OpenCalais Web Service to produce Dublin Core metadata. This goes into our database and is served through the website.
The metadata form pulls in all or most of the data a researcher would want to know. We’ve tried to make this form Dublin Core compliant and relate fields to DDI counterparts. For our surveys we have response data that is geographically tagged and that is handled by our database.
A slightly different form is used for the qualitative metadata – you paste in data and we generate metadata from those web services – we want to spatially tag the data but without compromising the privacy of that survey data.
We wanted our metadata to be standards compliant. A lot of our spatial data is user generated so it’s a bit tricky. But when we designed the database we wanted to make it designed for good database management and standards and able to cope with INSPIRE compliant data.
We are using free open source technology for the portal. These tools give us the most flexibility for the system we wanted. And we’ll show you that shortly.
At this stage we have an early stage prototype. We want to make it look better and work faster. We have some issues. The aesthetics of the cartography isn’t great. The speed of database searches aren’t great – especially for spatial searches. Eventually we’d like to move the whole thing to the cloud. Further developments are planned through to December 2011 at the moment. We want to expand the metadata database including new metadata types – public data (e.g. ONS data), environmental data, linked data/RDF. We want to improve data analysis and sharing – so that you don’t need your own GIS but can just use the portal. Eventually we’d like to enhance our infrastructure/put it onto the cloud, we want to integrate metadata entry tools into WGPortal so that end users upload their own metadata. We desperately want to move onto a data service. We want to work with the welsh government to serve data via a secure site. Downloading metadata in GIS formats, community features, published linked data etc.
At the moment, within the next 2 or 3 weeks, we will release the geoportal for user testing and you can see a video about the site and sign up for an Alpha testing account here: http://wiserd.comp.glam.ac.uk/home/
And now for a live demo…
This is the very early development stage version. It will probably be called something different that has more meaning to social scientists. We have a keyword search right now so if we search for “education” we see a set of results split into different types of metadata and you can view the metadata, relevant notes and prompts etc. You can search survey questions and responses or you can see questions in context. All material has thematic tags and groupings to help you find your way around the data. Where there is data associated with the metadata you can plot, e.g. survey responses, on a map. What we’re trying to do is let researchers avoid the GIS to see activity in an area. At present we can map response rates. We are going to add a print module for this too. The real difference with this software is that we can search for metadata geographically – beyond a simple database search. So we have a few different data searching tool. You draw a shape on the map to search for data in that area. And the system will return materials with an association with that area. The idea is that in the future you might be able to mouse over or right click over the mapped metadata to download the source data.
A1a) to go from metadata to source data… that’s a huge jump! And for instance in that survey none of the respondants may be in that area, they may just be part of a survey that includes that area.
Q1b) Is that point data on the map of just overlapping areas’ data?
A1) It is taken from overlapping areas. It’s not there yet and it is an ambitious project.
Q2) Is there a reason that this is a Welsh initiative? There’s nothing welsh about it other than the area covered
Comment) I can see that funding wise it is important to be Welsh. I know the Welsh Assembly Government has funded 3 or 4 research data fellowships – how do they come into this?
A2) We are talking to various people and do have contact with those kinds of individuals. We are showing them a prototype rather than an abstract. They are really interested in the data and the linkages between them, the interface is the smaller part of this.
We are playing with word clouds of survey data (done per interview but using metadata generated from the data, not a cloud from the full text) but we are experimenting with this sort of tool to give the user a sense of what the interview is about and what it talks about without accessing the survey directly – the words link to the survey questions as well. They can then request the source data. We want to link any interviews and surveys to the research published on that data. Another thing we’ve done using the geographical place names to get a sense of coverage of data. We are trying to think of something that will give a researcher an idea of the geographical focus of the data included.
We are playing with ways to display and explore metadata and data. We were thinking seeing a word cloud for a given area that could be really useful. We want to pooling surveys and interview transcripts.
Q3) What about issues of anonimisation with automatically creating metadata with the data
A3) We’ve had a real issue with transcripts and that’s representative of a much wider issue around sharing qualitative metadata in general. There are all kinds of issues of disclosure. We use Unlock to find placenames in transcripts – we have hand found the placenames in texts and know what should be picked up but not all are. Unlock isn’t designed for interview transcripts – it picks up on some unexpected things and not others. An interesting research project for someone on how one geocodes transcripts.
Q4) How well does this metadata represent the actual transcript?
A4) We plan to ask the localities team to give us some feedback on this but we haven’t done this yet.
We have a map and heatmap of placenames mentioned – again this could be done for groups or individual surveys.
Q5) Could you clarify – are you passing potentially sensitive data to OpenCalais and Unlock?
A5) At the moment we are potentially doing that.
Q5) I don’t ask to criticise – we are interested in doing this as well but there are real issues around transmitting sensitive data to third party sites.
A5) You can use the open source Unlock software to host your own instance. The web service is a web service. But you would want to tune your instance to your data.
Another demo – a search for a placename and the space around that location. So if we search for Cardiff we can look for materials within 5 miles of Cardiff. Looking for GIS data only at the moment. Of course a big problem at the moment is that only half of our surveys are geotagged.
Discussion: how should the social science community prepare for INSPIRE
I think we’ve really broadened scope beyond INSPIRE for social science – maybe this is more broadly around geo/geospatial and the social sciences?
James) INSPIRE interests me as I want to know if INSPIRE matters – does it matter to you? How would you go about doing INSPIRE at your institution? What do you need? What’s missing? What happens? Who funds it? How does that happen? How does it rank as a priority?
Commenter) I want access to INSPIRE data but if I generate data I don’t want to INSPIRE it – speaking as a researchers
James) A classic repository type answer. INSPIRE will let you do that to an extent but the data you want may not be free – you’ll just be able to find out that it exists. Various parts of the UK programme are looking at what does it mean to make data available? What are the implications? How do you do that securely? INSPIRE sort of piggy backs on the idea of open data but some data is valuable and you don’t wnat to give that away. One of the projects we’re involved with in the JISC geo strand looks at using Shibboleth with web services – so you could use that in the WISERD portal potentially. Combining secure data etc. You can sit on both sides of the fence but it might be nice to inspect our own models.
Commenter again) I think you said that REF may be a way to encourage data sharing. Are there are opportunities for you here?
James) Well my perception is that researchers don’t get rewarded for good data management. Kudos lies in peer review. We all know that roles can be pretty fluid. We don’t thibnk about the research 5 years down the line who wants to build on what they’ve done. From an upfront position INSPIRE will let you discover what data is out there. Whethe ryou have to pay for it or can use it for free is anothe rmatter. But knowing what stuff exists is a big thing to overcome.
Veerle) Getting your data discoverable does help you become INSPIRE compliant doesn’t it?
James) Well there are four elements you are mandated to provide if you are being entirely INSPIRE compliant and publishing your metadata is a small part of that. But it’s still very useful – we’ve been able to do that for a long time and it has real usefulness. That’s where stuff like GeoDoc sits and acatully there you have a choice of who sees your metadata there.
Commenter) What I hear is that I don’t have to do that myself, the UKDA could do that for my data instead.
James) You have existing tools for the top level requirements (GoGeo, GeoDoc) easily right away. If the institution is responsible then implicitely there are other tools that you need to provide but you can ask someone else to deliver those. EDINA are happy to take that on as stands. YOu need to know if you’re going for MUST or SHOULD for INSPIRE compliance.
Peter) I want to try to summarise the key messages for ESRC, JISC and the funding councils here:
- INSPIRE is a very useful European initiative creating a common infrastructure for discovery, harmonisation and ultimately sharing. And having an impact on standardisation. In the UK, especially Social Sciences, it’s mainly been a focus on metadata discovery and improvement tools. Also very useful and helping researchers understand the research landscape that they may want to address.
But within all of that what are the problems for the future?
- Skills agenda – the number of people able to use the data they discover. We assume there are needs there. An issue for ESRC and other funding councils. Some work here but not enough.
- Data Sharing agenda – being able to get hold of data and share it. More complex as we have various issues affecting this. Commercial interests and issues – do we destroy value by sharing data? Then the issue of consent – whether or not a person has given consent to use data in a particular way, for a particular purpose. There are times when we obtain data without consent and leave those up. And there is a third issue of security and confidentiality – how do we ensure data used at very detailed level is secure so that no harm comes to those individuals. And finally what are the incentives – sticks or carrots? Depositing data in the UKDA may be a thing people tick the box for but don’t know the process for. And the rewards, if we think about the REF, are really non existant for data sharing.
If we can tackle the skills agenda and the data sharing agenda we can take a huge step forward on the INSPIRE directive.
James) A further consent issue – you need the consent of the research groups – and by extension the institution and it’s records management staff. That implies that the consent of the researcher is therefore important. The presumption is that everything is open unless you expressly say why it’s not open. And that’s a cultural mindshift across the piece that is required.
Veerle) When you talk about the limitations around confidentiality and concerns of privacy. There has been enormous work at ESRC and UKDA to make it clear that consent for data sharing has to be sought early on and indeed ways to securely share data for research. Huge steps have been made here. But more work is needed in Research Data Managament plans.
Peter) There has been great work done recently but there is a long way to go. I have colleagues who are supposed to be looking at Department of Work and Pensions administrative data – blocked by other parts of the department which will not let them access data despite having a contract WITH the department to work on that data. That’s a shocking waste of public money.
James) Work for the Transparancy board there. The push by JISC is that publicly funded goods should be in the public domain but the devil is in the detail of course. The impulse for researchers is to say “this is special, I’m special, this doesn’t apply”. The default should be open and public.
Christina) In terms of sharing data it’s not just about lack of knowledge of consent but also about raising public awareness.
Dave) I’m interested in the bottom up stuff, the skills issue. If you get people with skills who want to play with this stuff then that rising tide should have a broader impact. You need lots of hours, subject disciplines, etc. to make a real push. WISERD looks great but you’ll need to train people to use that. How do we step in and fund training of people for skills. Peter is recommending ESRC and JISC push that foward but…
Peter) Two ways forwared. You can embed the skills in the individual and then there is the sort of deskilling approach where you do things for them. And I think you need a bit of both. Some great researchers in some respects really struggle with visualisation etc. They want to go to a service to deliver or explain how to do this. But you do want to focus on younger training researchers to ensure geospatial skills are embedded by default. They will be do important to understanding so many aspects of life that they are essential skills.
Dave) When was your study? 2009? In data terms that’s decades. Is there a need or a call… has Sweden or Wales looked back out into the sector. At the Postgraduate level how many institutions are providing the core skills. Can we get that data?
Peter) We can get that data. The answer is not big. Probably can count on one hand how many universities do significant training. For instance my university does not have a geography department – if they need spatial training they have to buy it in…
James) There is also an aspect of “you don’t know what you don’t know” – where do nieve researchers go to to see if their work is actually valid and useful. If you are in a spatially literate discipline that comes in at post grad level at least. But with Google Maps etc. others use these tools without that grounding….
Peter) If you look for those skills and look for where they sit… well if you see newspapers they often generate great graphics using spatial tools as visualisation. For added value side of things it’s the commercial sector. The firm that runs the Tesco Clubcard system does incredible things. That spatial data is hugely important to their company. They have done such work on geodemographics, they know where to build new stores, how to lay them out etc.
Micheal) Where do they get new staff from? I’m wondering whether in terms of Dave’s suggested survey whether you should look to business schools – are they doing this?
Dave) I know developers on Sainsbury’s Nectar card. They are developers who have picked up geo skills playing with tools as needed
Tom) You do see trainees geospatial roles quite often. You do see poeple coming in with basic geospatial backgrounds. But the private sector is possibly taking these people’s limited experience and turning them into experts.
James) Depends on how much of a purist you want to be. Local authorities have geospatial departments with staff with non geospatial backgrounds. Looking at GI professionals in the UK vs Europe vs the US there is a big difference in how the profession is regarded. Not just GI, I know physicists who work in the city for similar reasons.
Veerle) Tom and I have earth sciences background. Perhaps there is not enough of a technical skills base in the social sciences?
James) Dave Unwin did a talk a while ago dissing all geographers outside of GI. What constitutes core spatial literacy? What does teaching that mean? Has it every been creacked? Or maybe not a problem – perhaps we undervalue the geospatial aspects.What is special about spatial and why do geographers have a monopoly here?
Dave) There’s a real issue there. If you say not using GIS you’re not doing geography… but the special geospatial moments are when you start to do correllation analysis and lets you find new hypotheses.
James) A real issue there. You need the basic precepts to understand what you are looking at, what it means. How to label maps is a classic that probably needs updating for the moden era. The tools to draw inappropriate maps are absolutely rife.
Dave) But if you really want to engage with psychologists you need to use those visual mapping tools to inspire them. Do we drop researchers right into ArcGIS or do we show them fun stuff that makes them think about new questions
James) Well what question are you trying to answer? Are you looking for questions or to analyse data. I’m playing devils advocate but you do have to cater to all classes. In straightened funding times which is more important academic purity or popularism.
Commenter) Well you look at those Twitter maps of the London rioting and it was incomplete. We used that example to explain about the cautions of using geospatial in their research – that you need to really
Dave) But we have SOAS work on the Iran protest from social media and that has real value – although in social science you *never* have a complete data set.
James) I think the specialist geospatial expert and the generalist geospatial visualisation approaches do need to go hand in hand. Perhaps we need the AQMeN model – geofolk help non specialists from other disciplines. Maybe Google Knol approach vs Wikipedia. How do you create an appropriate response?
And with that inconclusive but interesting discussion Veerle thanks us all for coming and the workshop finishes. Thanks for following along on the blog!