Census Hub Project

download Census Hub Project

If you can't read please download the document

description

The Census Hub Project can be considerated at the moment as the most advanced project where Internet technologies and SDMX solutions for data transmission get together for an ambicious goal: the data dissemination of Census 2011 results. We analyze the Census Hub architecture, where a central Hub at Eurostat side manage the user interface, transforming all selections made by the user on the screen in an sdmx query. This query is sent to the web service at NSI side, that parses the query and transforms it in an SQL query that can be used with a data base containing census data. Depending on how many countrys are involved in the answer, the hub will query the web service provided for that country. Finally, the Hub receive all answer fron NSI's and build up a final table, putting all answers toghether. The importance of this implementation is that is a completely new system that change completely the way to disseminate and exchange official data among organizations.

Transcript of Census Hub Project

  • 1. The Census EuropeanHub Project Workshop on Data Transmission 17-19 June Becici -Montenegro Vincenzo PATRUNO

2. Overview It's the proposal of anew system to achieve the publication of the 2011 Census data on Eurostat websiteusing SDMX standards 3. Overview Census taking is a very cost intensive exercise justified by the unparalleled quality of the result.Important aspects of that quality are:

  • The flexibility to cross tabulate different variables

4. An easy access to data 5. Detailed data methodogically comparable 6. Overview F L E X I B I L I T Y HAR MO NI ZA TION 7. Access to detailed Census data that are methodologically comparable among the Member States and structured in the same wayHarmonization 8. Final user should have the possibilityto cross tabulate different variablesFlexibility 9. The Goals The dissemination of the result of the censuses in the EU should reflect those advantages to the highest possible extent. 10. The Traditional Approach Member States provide microdata to Eurostat. Eurostat aggregates microdata and stores obtained data in a central repository. This repository will be used for data dissemination Member States provide predefinited tables to Eurostat. Eurostat publishes those tables on its website 1 2 11. Approach(1)maximises flexibility in offering data to final users. But:Aggregation functions on the central system could be very difficult to implement due to: different confidentiality rules to be applied to microdata from different Countries; whether data come from a "full" census (conventional or register-based) or from a sample survey. Data maintenance could be very cumbersome because every time a revision is issued, an entire set of microdata needs to be updated or replaced. The Traditional Approach 12. Approach(2)greatly simplifies the exercise But:It doesn't offer enough flexibility to final users, who would have limited possibilities to tailor data to their information needs. The Traditional Approach 13. The Traditional ApproachNSIs EUROSTAT 14. We have normally two different approach to exchange data:PUSHandPULL Push and Pool 15. PUSHmode means that the data provider takes action to send the data to the party collecting the data.PULLmode implies that the data provider makes the data available via the Internet. The data consumer then fetches the data on his own initiative.Push and Pool 16. SDMXis primarily focused on theexchangeanddisseminationof statistical data and metadata. SDMX promotes a data sharing model to facilitate low-cost, high-quality statistical data and metadata exchange. Data Providers publishes the availability of data/metadata to Data Consumers and the latter are responsible for fetching the data/metadata at will. . Data Sharing Model 17.

  • Data-sharing only works if there arestandard formats

18. Like the Web itself, a data-sharing model relies onpullexchanges, notpushexchanges

  • Data consumers discover the data they need, and its location, and then go and get it

19. Data producers dont have to send data Notes about Data Sharing 20. The Census Hub is based on the concept ofdata sharing :A group of partners agree on providing access to their data according to standard processes, formats and technologies The Census Hub Idea IT, IE, DE, PT, MT, SI, EE, BG Countries involved GB, ES and GR Additional Countries involved before the end of the year 21.

  • SDMX standards support the"pull"mode of data sharing, where the collecting organization retrieves the data from the providers' web servers. The data:
    • may be made available for download ina SDMX-conformant file
  • 22. may be retrieved from a database in response to an SDMX-conformant query

This architecture often includes also an SDMX registry thatimplements the general idea of a metadata registry The Census Hub Idea 23. Each National Statistics Institute (NSI) creates a set of non-disclosure data.The delivery of this data would be via an information hub that enabled data sharing on the Internet. Each NSI would provide web access to their data according to standard formats and technologies. A data user browses the hub to search for a dataset of interest using structural metadata (dimensions, attributes, code lists, etc). Data is retrieved directly from the NSI system to the Hub. The Census Hub Idea 24. The Pilot Project Architecture 25. Census Hub pilot project architecture

  • The central Hub Eurostat side
  • The web service NSI side
  • The pilot hypercube
    • Sex
  • 26. Age

27. Current Activity Status 28. Geography 29. Data Sharing in Census Hub Query SDMX Data SDMX-ML WS NSI 30. The Pilot Project Architecture The Q uery builderconstructs one or more SDMX queries that will be sent to the related NSIs web services through the W eb service client.When theWeb service clientreceives the responses (in the format of a SDMX cross-sectional data message) from the queried web services, it forwards those to theResult aggregation manager .TheResult aggregation managerputs together all the received SDMX data messages and sends the result to the D issemination transformerthat makes a transformation from an XML format to HTML or CSV. 31. The Pilot Project Architecture Theweb servicereceives a SDMX query and forwards it to the SDMX q uery parser . TheSDMXQ uery parserbreaks down the query and sends it to theSQL query builder . TheSQL query buildercreates one or more SQL queries and sends them to D atabase . The result is assembled, by theSDMX-ML assembler , in a SDMX cross-sectional message that will be sent, by the web service, to the central Hub.NSI 32. The Pilot Project Architecture Statistics Portugal Architecture Model 33. The Pilot Project Architecture Statistisches BundesamtArchitecture Model 34. The Pilot Project Census Task Force(in the April 2007 meeting) agreed to explore the Hub solution and decided to launch a pilot project (DE, IE, IT and PT involved);Eurostat define some guidelines to this project:

    • Simple hypercubein order to let NSI produce it quickly;
  • 35. Data should comprise the following dimensions:Sex, Age, Current Activity Status and Territory;

36. AData Structure Definitionalso provided 37.

  • January 2008:start of the pilot project. Four countries decided to participate (Germany, Ireland, Italy and Portugal);

38. March 2008:preparation of requirement specification, functional and technical analysis; 39. April 2008:choice of one data hypercube and related breakdowns to use during the pilot; development of the Data Structure Definition (DSD); 40. June - September 2008:building of application modules (both Eurostat and NSI side); tests; 41. October 2008:evaluation report of the pilot; functional and technical analysis for the full 2011 Census Hub. The Pilot Project Roadmap 42. Eurostat has developed the central Huband, at the beginning of February 2009, it will be accessible in a test environment. Italy, Portugal, Germany and Ireland have already setup the architecture Italy, Portugal and Ireland have produced documents (available on CIRCA) regarding their experience during the pilot phase( http://circa.europa.eu/Members/irc/dsis/x-dis-xensus-hub/library?l=/census_documents_1/case_studies)Results of the pilot project 43. 44. Moreover it was produced theCensus Hub Web Service implementation Guidelines3that explains how to build web services, using different IT technologies, capable of communicating correctly with the central hub. (http://circa.europa.eu/Members/irc/dsis/x-dis-xensus-hub/library?l=/census_documents_1/documents ) Finally it is important to highlight how sharing experience and software, between all the involved actors (Eurostat and NSIs), have allowed the reduction of production costs and development time. Results of the pilot project 45. The following benefits will be real:

  • P articipants will be part of a project that will allow them to share experiences among the different actors, both statisticians and IT personnel, at different levels (planning, production, etc.);

46. Participants will build an IT infrastructure useful not only for the pilot exercise but also for their 2011 census data warehouse using standards recognized at international level; 47. The same SDMX architecture could be used in other projects with few or no changes. Benefits in participating to the project 48.

  • Costs for implementing an SDMX infrastructure needed for the Census Hub Pilot Project are limited and can be embedded in the more general project that each NSI will support for the 2011 Census;

49. The use of an XML-based data format will help to reduce costs of implementation as follows:

  • many NSIs are already using, or planning to use XML as the basis for their data management and dissemination systems;

50. a wide selection of IT commercial applications and tools are available to work with XML-based data; 51. expertise for working with XML is readily available and will often be available in-house

  • K nowledge and software developed by the participants at the first phase of the pilot areavailableand can be usedimmediately

Costs in participating in the project 52.

  • Involve more Member States in the exercise
  • DevelopandTest additional functionalities
    • Cache system
  • 53. New GUI
  • Develop all the necessary DSDs related to the more 100 hypercubes foreseen in the population and housing regulation

What milestones in 2009 54.

  • The Census Hub pilot project has been necessary in order to well understand how to proceed for the 2011 Census

55. The used architecture represents the most advanced example of the data sharing detailed in the SDMX standards 56. Volunteer NSIs can acquire a good experience in managing complex IT projects and a good knowledge of SDMX standards 57. As the Pilot has been planned as simple as possible in order to let all the NSIsparticipate with a minor effort,this project is a good occasion for all those who want to start using SDMX Conclusion 58. Thank You for Your Attention [email_address] 59. 60. 61. 62. 63. 64. 65. 66. 67.