DIME/ITDG Plenary February 2018 - European Commission€¦  · Web viewSeveral challenges have...

61
DIME/ITDG Plenary February 2018 DIRECTORS OF METHODOLOGY/IT DIRECTORS PLENARY MEETING 22/23 FEBRUARY 2018 Smart Statistics business case DIME/ITDG Plenary

Transcript of DIME/ITDG Plenary February 2018 - European Commission€¦  · Web viewSeveral challenges have...

DIME/ITDG Plenary February 2018

DIRECTORS OF METHODOLOGY/IT DIRECTORS

PLENARY MEETING

22/23 FEBRUARY 2018

Smart Statistics business case

DIME/ITDG Plenary

22/23 February 2018

EUROSTAT

Smart Statistics & Big Data

Date: 16/01/2018

Version: 3.0 – Draft

PM² Simplified Version Template V.0.6 (October 2016)

Document Control InformationSettings ValueDocument Title: Business CaseProject Title: Smart Statistics and Big Data >Document Authors: Konstantinos Giannakouris, Albrecht WirthmannProject ID (from PMR-site):Project Owner: DDGProject Manager: A. WirthmannDoc. Version: 3.0 (draft)Sensitivity: LimitedProject Type: CriticalEstimated start date: Nov 2018Estimated end date: Oct 2020Approval Date:Consultation: (Units/Persons consulted on the document before submission – if the case)1.Document Approver(s) and Reviewer(s):NOTE: All Approvers are required. Records of each approver must be maintained. All Reviewers in the list are considered required unless explicitly listed as Optional.Name Role Action Date

Project Management Officer for Eurostat

< Review >

Stakeholders < Review >Project Owner < Approve >

1 For critical projects, the Eurostat Project Management Office (Crista Filip) must be consulted for a quality check of all Project Management documents before their submission to the DM. For unexperienced project managers, an earlier involvement of the PMO is strongly recommended to avoid extensive re-writing.

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 2 / 43

Contents

1. Purpose..........................................................................................................................................5

2. Consultation process......................................................................................................................6

3. Current Situation and Mandate for Change...................................................................................6

3.1. Problem statement.................................................................................................................6

3.2. Mandate (legal base)..............................................................................................................7

3.2.1. First Implementation phase ...........................................................................................8

3.2.2. New pilot projects 2018-2020........................................................................................8

3.2.3. Extend the work on smart statistics................................................................................8

4. Objectives and Deliverables...........................................................................................................8

4.1. Scope statement.....................................................................................................................9

4.1.1. First Implementation phase - Scope...............................................................................9

4.1.2. New pilot projects 2018-2020 - Scope..........................................................................10

4.1.3. Extend the work on smart statistics and the investigation of innovative applications in the domain of trusted smart statistics - Scope.............................................................................10

4.2. Aims and Objectives.............................................................................................................11

4.2.1. First Implementation phase – Aims and objectives......................................................11

4.2.1.1. First Implementation phase – Work packages..........................................................13

4.2.1.2. Online job vacancies (WP.I.1)...................................................................................13

4.2.1.3. Enterprise characteristics (WP.I.2)............................................................................14

4.2.1.4. Measuring electricity consumption, identifying energy consumption patterns (WP.I.3) 14

4.2.1.5. Maritime and inland waterways statistics, environmental statistics (WP.I.4)...........14

4.2.2. New pilot projects 2018-2020 – Aims and objectives...................................................15

4.2.2.1. New pilot projects 2018-2020 – Work packages.......................................................17

4.2.2.2. Use of financial transactions data (WP.N.1)..............................................................17

4.2.2.3. Remote sensing (WP.N.2).........................................................................................18

4.2.2.4. Online platforms such as social media and sharing economy platforms (WP.N.3). . .18

4.2.2.5. Mobile network operator data (WP.N.4)..................................................................19

4.2.2.6. Innovative sources and methods for tourism statistics (WP.N.5).............................19

4.2.3. Extend the work and the investigation of innovative applications in the domain of trusted smart statistics – Aims and objectives.............................................................................20

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 3 / 43

4.2.3.1. Extend the work and the investigation of innovative applications in the domain of trusted smart statistics – Work packages.....................................................................................20

4.2.3.2. Use of citizen science data for individuals' well-being (WP.S.1)...............................21

4.2.3.3. Citizen science data and smart cities (WP.S.2)..........................................................21

4.2.3.4. Smart cities and connected vehicles (WP.S.3)..........................................................21

4.2.3.5. Smart farming (WP.S.4)............................................................................................22

4.3. Deliverables and Key Milestones..........................................................................................22

4.3.1. First Implementation phase..........................................................................................22

4.3.2. New pilot projects 2018-2020......................................................................................23

4.3.3. Extend the work on smart statistics and the investigation of innovative applications in the domain of trusted smart statistics..........................................................................................25

4.4. Indicators..............................................................................................................................26

4.4.1. First Implementation phase..........................................................................................27

4.4.2. New pilot projects 2018-2020 and extension of the work in the domain of trusted smart statistics..............................................................................................................................27

4.5. What the project does not include.......................................................................................27

4.5.1. First Implementation phase..........................................................................................27

4.5.2. New pilot projects 2018-2020 and extension of the work in the domain of trusted smart statistics..............................................................................................................................28

5. Impact Assessment.......................................................................................................................28

5.1. Stakeholder Analysis.............................................................................................................28

5.2. Project Environment.............................................................................................................30

5.3. Cost-Benefit Analysis............................................................................................................31

5.4. Risk Analysis.........................................................................................................................33

6.1. Methodology........................................................................................................................35

6.2. General Description (including proposed financial instruments)..........................................35

6.3. Resources and Lead Times....................................................................................................35

6.4. Project Funding.....................................................................................................................37

6.5. Dissemination of results.......................................................................................................37

7.1. Project Manager...................................................................................................................38

7.2. Reporting Structure..............................................................................................................38

7.3. Project Team........................................................................................................................38

7.4. Project Documentation........................................................................................................38

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 4 / 43

1. Purpose

The purpose of the business case "Smart Statistics and Big Data" is to outline a wide range of subsequent developments relevant to big data, smart technologies and smart systems, primarily within a short time frame during the period 2018-2020, that resulted from discussions and meetings with the various stakeholders in the last two years. Achievements and future prospects would be limited due to the available resources for 2018-2020, reflected in

and Figure 2.

Future developments include:

The first implementation phase, capitalising on the knowledge and experience of past achievements for a limited number of mature areas (e.g. estimating online job vacancies, measuring electricity consumption, maritime statistics, etc.).

The exploration of statistical 'themes' rather than 'sources' by developing new pilot projects in areas that have not yet been investigated, aiming at achieving concrete statistical outputs by deploying multi-source and multi-domain approaches capable of producing 'big data enhanced' statistical output.

The extension of the work and the investigation of innovative applications in the domain of trusted smart statistics in response to the near future challenges of IoT and the deployment of smart systems in everyday life.

This business case aims at deepening the collaboration with subject matter experts in order to maximise the focus on specific statistical themes and enhance relevant statistical output. A key aspect is data governance that is becoming more and more important and requires greater attention, discussion and, if necessary, adoption of different business models in the future. For example, access and initial exploration of certain 'global' (borderless) data sources may necessitate revisiting the current collaboration model of European statistics.

The current document outlines developments within the wider framework of a statistical information infrastructure that may potentially integrate a wide range of big data sources into the production of official statistics across the ESS and aims at maximising the benefits of using them. This wider framework is reflected to the ESS Big Data Action Plan and Roadmap 1.0 1 (BDAR) for the period 2014-2020 but can be extended at least until 2022 as schematically reflected in

and Figure 2 in terms of estimated resources. Therefore, the business case is consistent with the BDAR and builds upon the first stage of pilots 2016-2018 (Stage I). Moreover, the BDAR operates under the overarching strategy of the ESS Vision2020. An extension of the scope of the BDAR that would include "smart statistics" is envisaged in the short term.

1 Endorsed by the ESSC of 26 Sep 2014 (ESSC 2014/22/8/EN); available via:

https://ec.europa.eu/eurostat/cros/content/ess-big-data-action-plan-and-roadmap-10_enhttps://ec.europa.eu/eurostat/cros/system/files/ESSC%20doc%2022_8_2014_EN_Final%20with%20ESSC%20opinion.pdf

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 5 / 43

2. Consultation process

The Eurostat Directors' meeting approved the business case on 13 October 2017. The business case was extensively discussed in the ESS Big Data Steering Group on 20 October 2017. Subsequently, the business case was submitted to the Vision Implementation Group (meeting on 26 October 2017) and the ESS Directors of Methodology Steering Group (meeting on 10 November 2017) for opinion.

The updated version 2 of the business case was submitted to the ESS Directors of Methodology (DIME) (written consultation) as well as to the ESS Big Data Steering Group. The former will convene on 22-23 February 2018 and the latter on 21 February 2018.

During the consultation process, suggestions were made concerning the enlargement of the conceptual framework to smart statistics, the scope of the foreseen activities within statistical domains with beneficial output for the NSIs, the engagement with subject-matter experts, the expected statistical output in response to specific questions and the priorities that should be allocated in order that the current business case receives the full support of the ESS stakeholders. These suggestions have been taken on board in the updated business case that will be submitted to the ESSC (meeting on 8 February 2018) and improvements will be reflected in the forthcoming detailed specifications of the actions.

3. Current Situation and Mandate for Change

3.1. Problem statement

As stated in the Scheveningen Memorandum, recent innovations in the information and communication technologies have been leading to an increasing degree of digitization of economies and societies that offer new opportunities for the compilation of official statistics. In this context the use of big data for statistical purposes challenges the European Statistical System to effectively address a variety of issues. A number of these challenges identified in the Memorandum, particularly in identifying new data sources, were addressed by the pilot studies (2016-2018) and the prototypes developed within the 'ESSnet Big Data'1.

The implementation of successful pilots was foreseen after 2020. However, in order not to lose the momentum of the currently successful developments and in order to foster the integration of big data in the statistical production processes, the first implementation phase is foreseen in the current business case.

Several challenges have been identified on the future of statistics in a hyper-connected world dominated by an Internet of Things (IoT) data ecosystem and smart systems2. The term "IoT data

1 ESSnet Big Data is a project within the European statistical system (ESS) jointly undertaken by 22 partners. Its objective is the integration of big data in the regular production of official statistics, through pilots exploring the potential of selected big data sources and building concrete applications. ESSnet Big Data started in February 2016 and is to run for 28 months until May 2018; it consists of 10 work packages: eight of these are content-oriented, while the other two, Coordination and Dissemination, support the overall project. https://webgate.ec.europa.eu/fpfis/mwikis/ESSnetbigdata/index.php/ESSnet_Big_Data

2 Smart systems is a generic term used to describe different technological systems that are autonomous or collaborative and embed smart technologies. Smart technologies (includes physical and logical applications in all formats) are capable to adapt automatically and modify behaviour to fit environment. They are capable to use smart devices that are connected, interactive and intelligent– smart sensors, actuators - that provide data to analyse and infer, and draw conclusions from rules. Smart technologies are also capable of learning that is using experience to improve performance, anticipating, thinking and reasoning about what to do next, with the ability to

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 6 / 43

ecosystem" refers to an integrated system of smart devices and smart sensors that produce massive "machine generated" data in real-time or almost real-time, and cross-platform deployments of various embedded technologies. The extended use of smart systems and the IoT (Internet of Things) is expected to take big data to a whole new level and change the data landscape. Data capturing and processing for statistical and analytical purposes could be embedded in the smart systems themselves coupled with an intelligent data life-cycle enhanced with cognitive processes. This is what we refer to as smart statistics. In addition, future systems should allow for building trust into smart statistics with reference to auditable and transparent data life-cycle that guarantees accuracy and privacy by design hence the term "trusted smart statistics".

Concluding, in the same spirit of the previous actions and for addressing challenges arising from smart statistics, the current business case takes a step further and aims at:

Preparing the implementation of successful prototypes, capitalising on the knowledge and experience of past achievements (first implementation phase).

Developing new pilot projects in areas that have not yet been investigated with regard to deploying multi-source approaches including big data sources, exploring statistical themes, achieving concrete statistical outputs.

Extending the work and further investigating innovative applications in the domain of trusted smart statistics in response to the near future challenges of IoT and the deployment of smart systems in everyday life.

As mentioned above in the part describing the "Purpose"(1) of the document, these actions aim at deepening the collaboration with subject matter experts in order to maximise the focus on specific statistical themes and enhance relevant statistical output, as well as raise the issue of data governance.

3.2. Mandate (legal base)

In the Scheveningen Memorandum, the ESSC requested Eurostat and the NSIs to elaborate an ESS action plan and roadmap in order to follow up the implementation of the memorandum. At its meeting in Riga on 26 September 2014, the ESSC endorsed the Big Data Action Plan and Roadmap 1.01 (BDAR below) and agreed to integrate it into the ESS Vision 2020 portfolio. In addition, the ESSC agreed that the ESS Task Force on Big Data for official statistics would coordinate the work on the implementation of the BDAR.

In the BDAR it is clearly stated that a number of action items are to be carried out by means of one or more multiannual ESSnets without excluding the possibility if deemed necessary to use procurement procedures.

self-sustain. In addition, smart technologies are expected to have analytical and statistical capabilities. (adapted from the original source:

https://www.igi-global.com/dictionary/smart-technology/38186 accessed on 7/12/2017)

1 Endorsed by the ESSC of 26 Sep 2014 (ESSC 2014/22/8/EN); available via:

https://ec.europa.eu/eurostat/cros/system/files/ESSC%20doc%2022_8_2014_EN_Final%20with%20ESSC%20opinion.pdf

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 7 / 43

As requested by the ESSC, the implementation of the Big Data Action plan and Roadmap is part of the ESS Vision 2020 implementation portfolio, more specifically through the BIG DATA project1.

3.2.1. First Implementation phase 2

Consistent with the BDAR and subsequent to the first pilots – 2016-2018 - that aimed at exploring the potential of selected big data sources and building concrete applications, is the implementation and rollout of country specific adaptations.

It is in this context that Eurostat mandate is to promote the implementation and support NSIs in taking the necessary steps for introducing in the relevant statistical production processes part of work that has been developed as prototypes in mature areas related to online job vacancies, enterprise characteristics, measuring electricity consumption, identifying energy consumption patterns, maritime and inland waterways statistics and environmental statistics.

3.2.2. New pilot projects 2018-2020

It is reminded that the BDAR explicitly had foreseen a phased approach of the pilot projects, i.e. to start with a first wave in 2016 and to continue with a second wave in 2018. Reasons for such an approach were the availability of resources within the ESS and having a better focus on a smaller number of pilots at the beginning. Therefore, it was proposed to separate the two phases and create two subsequent ESSnets for each wave of the pilots. In this context, Eurostat mandate is to develop new pilot projects in areas that have not yet been investigated with regard to deploying multi-source and multi-usage approaches including big data sources, exploring statistical themes and achieving concrete statistical outputs.

3.2.3. Extend the work on smart statistics

Smart statistics follow innovative technological developments in the framework of a "Datafication" process. The latter term, which means "taking all aspects of life and turning them into data" (Cukier & Mayer-Schoenberger, 2013) 3. Most if not all data in a decade from now will be "organic", i.e. by-products from activities of people, systems and things (including billions of low-end and affordable smart devices and smart sensors connected to the internet, i.e. the Internet of Things (IoT)). Moreover, these smart devices will embed the production of statistical information. Therefore, Eurostat mandate for investigating innovative applications in the domain of trusted smart statistics in response to the near future challenges of IoT and the deployment of smart systems in everyday life falls within the overall scope of exploiting (new) big data sources.

4. Objectives and Deliverables

4.1. Scope statement

1 Adopted by the ESSC of 13 November 2014 (ESSC 2014/23/2a/EN).

2 It is reminded that originally in the BDAR the implementation of successful pilots was foreseen after 2020. However, in order not to lose the momentum of the currently successful developments and in order to foster the integration of big data in the statistical production processes, part of the implementation phase is foreseen in the current business case. Consequently, the new pilots within the period 2018-2020 might eventually be very limited.

3 Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt.

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 8 / 43

4.1.1. First Implementation phase - Scope

The scope of the proposed first implementation phase covers actions related to the promotion and the support to NSIs in taking the necessary steps for introducing in the relevant statistical production processes part of work that has been successfully developed as prototypes within the framework of the 'ESSnet Big Data' pilots. Work should be carried out in a way that other NSIs in the next implementation phase would be enabled to put in place the necessary system(s) and deploy or adapt the proposed solution(s) in order to produce statistical output. The first implementation phase concerns the following work packages (WP):

a) Online job vacancies (WP.I.1).1

b) Enterprise characteristics (WP.I.2).

c) Measuring electricity consumption, identifying energy consumption patterns (WP.I.3).

d) Maritime and inland waterways statistics, environmental statistics (WP.I.4).

The development of the pilot projects was considered a critical factor in the BDAR and eventually the integration of big data sources and methods into the production of official statistics. The latter was considered the subsequent step (stage two) after the successful development of the aforementioned pilots, originally not foreseen before 2020. The first implementation phase is outlined here bearing in mind firstly that only mature pilots should be chosen and secondly that not all countries in the ESS would be ready for the implementation at such an early stage. In addition, due to the limited resources the ideal size of countries for the implementation of each work package should be around 3 coordinated by an additional leader. Subsequent implementation phases that would benefit from the experiences of the first implementation should refer to all countries and additional successful pilot projects.

The scope will include a wide range of activities that are outlined in the section 4.2.1, related to the conceptual and practical level of the implementation of each of the aforementioned work packages.

In this context, within the scope falls the common access to data sources among NSIs, the development and implementation of common methods and quality standards delivered in the form of reusable statistical services, rethinking the current collaboration model for European statistics towards common architecture, possibly shared IT infrastructure and data governance. In addition, for specific work packages such as those related to webscraping, activities related to linking business registers with job vacancy portals or in general web scraped enterprises falls within the scope.

The use of big data will trigger the need for educating existing staff with new skills and assuring availability of new staff with required competences to the ESS. To a certain degree and for specific purposes this part may eventually be addressed as a prerequisite to the implementation and falls within the scope.

In addition, actions related to communication such as organising workshops, presentations of specific tools and results, etc. that would be beneficial for all NSIs would within the scope. Ideally, actions for the respective implementations should be led by the NSIs that carried out the pilots during the first

1 "I" in WP.I. stands for "Implementation"

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 9 / 43

phase and played a leading role in the whole development process. The implementations should include actions where collaborative resource-intensive work carried out by members of the ESS would be viewed as the most viable way forward.

For the purpose of clarity and consistency with other activities, section 4.5.1 describes certain aspects of the actions which fall outside the scope of this ESSnet.

4.1.2. New pilot projects 2018-2020 - Scope

The scope of the new pilot projects1 2018-2020 mainly covers new data sources not previously identified and properly explored serving multiple statistical themes. In particular, new pilot projects will concern the use of financial transactions data, remote sensing, the use of data from online platforms such as social media and sharing economy platforms and the use of mobile phone data for tourism statistics. It is reminded that the execution of pilot projects at the level of the ESS is considered a key critical factor in the BDAR. The pilot projects should contribute to the overall objective of the BDAR, namely preparing the ESS for the integration of big data sources and methods into the production of official statistics. New pilot projects include actions where collaborative resource-intensive work carried out by members of the ESS is viewed as the most viable way forward. Moreover, besides the development of the pilots, the contribution to relevant horizontal topics (legal aspects, data access conditions, development of methodologies of common interest, etc.) falls within the scope of the action. The investigation of new data sources in terms of multipurpose usage, hence serving multiple statistical themes, falls within the scope. For the purpose of clarity and consistency with other activities, section 4.5.2 describes certain actions which fall outside the scope of the action.

4.1.3. Extend the work on smart statistics and the investigation of innovative applications in the domain of trusted smart statistics - Scope

The scope of extending the work on smart statistics is within the overall scope of the big data. More specifically, the scope is investigating the possibilities to produce official statistics combining citizen science data for individuals' well-being statistics, citizen science data and smart cities, smart cities and connected vehicles, and smart farming. The number of vendors, technologies and protocols that smart devices may use make it challenging for them to communicate. It is expected that progress will be made in involving statisticians in formulating requests for using open standards and compatible devices, hence ensuring interoperability of systems and equipment. Therefore, actions related to the combination of data communication standards of smart devices and statistical information fall within the scope of the current extension of the work on smart statistics.For the purpose of clarity and consistency with other activities, section 4.5.2 describes certain actions which fall outside the scope of the ESSnet.

4.2. Aims and Objectives

1 It is reminded that the BDAR had foreseen a phased approach of the pilot projects, i.e. to start with a first wave in 2016 and to continue with a second wave in 2018. Reasons for such an approach are availability of resources within the ESS and having a better focus on a smaller number of pilots. Therefore, it is proposed to administratively separate the two phases and create two subsequent actions for each wave of the pilots. In the current business case we strictly refer to the second wave of pilots supported by an ESSnet.

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 10 / 43

The long-term goal of the BDAR is full integration of big data sources into the statistical information infrastructure. The overall objectives of the current business case are to support the implementation of the BDAR in the subsequent phases for 2018-2020 within the wider time frame up to 2022. More specifically these objectives are threefold: Firstly addressing the implementation of four successful pilots, secondly carrying out pilot exploration of new data sources and thirdly further developing smart statistics.

4.2.1. First Implementation phase – Aims and objectives

A satisfactory level of developments is expected to be reached by the end of the ESSnet Big Data in May 2018 for the work packages mentioned below. Therefore, from the operational point of view it is necessary to establish a Multi Beneficiary Grant Agreement (ESSnet) covering a period of 24 months (November 2018 – October 2020), which will have in its mandate several tasks at conceptual, functional and operational (practical) level of the implementation of each of the following work packages in the NSIs that will express an interest:

a) Online job vacancies (WP.I.1).

b) Enterprise characteristics (WP.I.2).

c) Measuring electricity consumption, identifying energy consumption patterns (WP.I.3).

d) Maritime and inland waterway statistics, environmental statistics (WP.I.4).

The output of the work should adequately enable other NSIs in the next implementation phase to put in place the necessary system(s) and deploy or adapt the proposed solution(s) in order to produce statistical output. From the practical point of view, the output should concretely specify the work that is needed to be carried out for the second implementation phase, hence producing draft specifications for full-fledged implementation at the ESS level.Clarifications concerning the term "implementation" (first phase) are provided in 4.5 and will be reflected in the forthcoming specifications for the individual prototypes. Overall, the specific ESSnet aims at successfully completing the following intermediate steps (activities) that would lead to the implementation of each of the work packages in the statistical production process in the NSIs:

Identification of statistical production processes that may be affected at national level. Pilots helped in understanding the implications of big data for official statistics through the development of prototypes or proofs of concepts that essentially demonstrated the feasibility of producing official statistics by using new data sources. However, a key critical success factor is the identification of statistical production processes that would lead to a successful implementation of the prototypes. The issue is far beyond transposing a methodology at national level. The implementation is expected to affect to different degrees the various sub-processes of the phases of the statistical production processes, moreover, in the context of specific statistical themes that need to be identified and involved. Using the terminology of the GSBPM (Generic Statistical Business Production Model), at least the "Design", "Build", "Process", "Analyse" and "Disseminate" phases are expected to be affected. It is noted that

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 11 / 43

GSBPM is the basis for the Statistical Production Reference Architecture1. The latter element is a subset of the ESS Enterprise Architecture Reference Framework. Statistical production processes should in addition refer to data integration with data from other sources.

Definition of the implementation requirements of prototypes in the relevant statistical production processes at European and national level. Complementing the analysis of the statistical production processes that may be affected, a range of implementation requirements need to be defined and eventually satisfied. This objective aims at addressing issues at European and national level, related to organisational and management issues, using or sharing the necessary infrastructure among the NSIs, obtaining the necessary skills, complementing the relevant methodologies, producing comparable and harmonised statistical output, etc. In particular, the issue of common infrastructure should be examined in detail in order to achieve important economies of scale for certain work packages. Specific implementation requirements for each work package prototypes need to be identified and adequately addressed. For example, related to webscraping job vacancies, it is important to raise the issue of "content" of web-scraped information. On the one hand, identification of portals or other websites that should be considered in order to minimise the noise e.g. related to the job descriptions. On the other hand, in order to avoid duplication – noise in the overall web-scraped data - adequate techniques should be developed and deployed. For the former, it may be necessary to link identified portals with the national business register. For the use of data from the automatic vessel identification system, specific implementation requirements would be the (possibly centralised) treatment, validation and re-organisation of the data hence yielding the information usable across the ESS. Intermediate data-products would aim in satisfying multipurpose statistical themes (e.g. transport statistics and environmental statistics).The collaboration and the engagement of subject-matter experts at national level is required.

Fulfilling implementation requirements and definition of a quality management framework to guide the integration process, definition of complementary statistics (indicators) and of the required quality, establish metadata requirements at European and national level. Provided that the necessary implementation requirements have been identified they need to be subsequently fulfilled. In addition, the integration process would require a quality management framework that would ensure that the necessary implementation activities are conducted effectively and efficiently at national level. Ideally, further activities related to the definition of relevant indicators, the measurement and the assessment of the quality of the data, the metadata requirements should be carried out in order to produce harmonised statistics comparable at EU level.

Transfer of knowledge, methodologies, software development, testing and maintenance, at European and national level. This is essentially the core objective and the main step for the implementation of the pilots. It combines both statistical and technical activities (data science and information technology). It involves the documentation of the process and the

1 https://ec.europa.eu/eurostat/cros/content/ess-enterprise-architecture-reference-framework_en

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 12 / 43

methodologies at national level, the incremental implementation of software and eventually the production of statistical indicators. Depending on the specific work package the scope of implementation – explained in 4.5 - may be limited, and will be reflected in the forthcoming specifications for the individual prototypes. Efficiency is an important element of the specific step. Externalisation of relevant activities should not be excluded, remains though to be decided according to the national practices.

Monitor the progress and define milestones at European and national level, adapt if needed. Tracking accomplishments during the implementation phase is important at national and European level. The implementation is not restricted to national implementation due to the fact that the same data sources may be commonly used in a shared environment.

The subsequent paragraphs of this section provide a brief description of the aims and objectives of the individual work packages related to the first implementation phase.

4.2.1.1. First Implementation phase – Work packages

The first strand of work has the ambition of implementing within a limited scope (see 4.5), in the official statistical production certain big data applications which have reached a sufficient level of maturity. It will be based on the experience and lessons learned during the current pilot projects (ESSnet 2016-2018). Such mature areas for implementation relate to online job vacancies, enterprise characteristics, measuring electricity consumption and identifying energy consumption patterns with the use of smart meters, maritime and inland waterways statistics and environmental statistics with the use of data from the automatic vessel identification system. It is proposed to launch a new ESSnet to carry out this work strand.

The overall aim of this ESSnet (2018-2020) will be to develop functional production prototypes and promote and support their implementation in a limited number of participating NSIs. The involvement of subject matter experts is critical and needs to be ensured throughout the first implementation phase. The main achievements and the remaining challenges in terms of introducing the above areas in the regular statistical production process is based on an in depth assessment of work that has been accomplished by the pilot projects within the framework of the 'ESSnet Big Data'1, on the following "content-oriented" work packages (WP):

4.2.1.2. Online job vacancies (WP.I.1)

The aim of implementing this pilot will be to produce statistical estimates in the statistical theme of online job vacancies. Suitable techniques and concrete methodologies have been developed during the pilot phase of the project. Implementation in the ESS will be based on work that was carried out regarding the conditions that web scraping techniques can be used as far as the quality of the scraped data is concerned, as well as the use of mix sources including job portals, job adverts on enterprise websites, and job vacancy data from third party sources. Within the same statistical

1 ESSnet Big Data is a project within the European statistical system (ESS) jointly undertaken by 22 partners. Its objective is the integration of big data in the regular production of official statistics, through pilots exploring the potential of selected big data sources and building concrete applications. ESSnet Big Data started in February 2016 and is to run for 28 months until May 2018; it consists of 10 work packages: eight of these are content-oriented, while the other two, Coordination and Dissemination, support the overall project. For more

information consult https://webgate.ec.europa.eu/fpfis/mwikis/ESSnetbigdata/index.php/ESSnet_Big_Data

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 13 / 43

theme, the combination of existing data from multiple sources should be promoted and embedded in the methodology.

4.2.1.3. Enterprise characteristics (WP.I.2)

The aim of implementing this pilot will be to use webscraping, text mining and inference techniques in order to collect, process and eventually improve or update existing information about enterprises in the national business registers e.g. kind of activity, key financial variables, structure of the company, etc. The implementation involves massive scraping of companies' websites and collecting and analysing unstructured data. The combination of existing data from multiple sources (administrative data, structural business survey or other surveys), should be promoted and embedded in the methodology.

4.2.1.4. Measuring electricity consumption, identifying energy consumption patterns (WP.I.3)

The aim of implementing this pilot will be to use smart meter1 data in producing energy statistics, as supplement to other statistics e.g. energy statistics of businesses, census, household costs, tourism seasonality or impact on environment. The implementation will include linking electricity data with other administrative sources for eventually producing statistics of businesses, households and identifying vacant living places or seasonal/temporary occupancy of living places. Work will include the identification of energy consumption patterns in households. Identification of energy consumption patterns in businesses that belong to the tourism industries could contribute to assessing the seasonality of those industries. In addition, noise reduction techniques for data editing and data cleaning will be used as well as traditional sampling techniques will be combined in order to produce high quality statistics. The use of standardised or compatible devices and communication protocols should be part of the implementation requirements. Exchange of views and close cooperation with international and national standardisation authorities is critical.

4.2.1.5. Maritime and inland waterways statistics, environmental statistics (WP.I.4)

The aim of implementing this pilot will be to use real-time measurement data of ship positions (measured by the so-called AIS-system) in order to firstly improve the quality and internal comparability of existing statistics and secondly to produce new statistical products. Using AIS data, a method will be implemented to build a reference frame of maritime ships. Linking AIS data to maritime statistics will be possible using more accurate AIS data for the number of ships in the ports. In addition, the implementation will allow improving data on departing ships regarding the next destination (traffic matrices), calculating the distance travelled by the ships, allowing the calculation of the transportation volume (TKM) hence improving the quality of the average distance matrix for all ports. Information about the location, identification and other critical data of maritime vessels combined with information on installed engine power may allow modelling and monitoring CO2 shipping emissions. The combination of existing data from multiple sources and for multiple purposes (administrative data, maritime statistics, and environmental statistics), should be promoted and embedded in the methodology.

1 Electricity consumption meters which can be read from a distance and measure electricity consumption.

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 14 / 43

4.2.2. New pilot projects 2018-2020 – Aims and objectives

Consistently with the first wave of pilot projects, harnessing new data sources is potentially providing scope to increase the quality and the variety of statistical products, enabling the ESS to better respond to fast growing and increasingly differentiated user needs. Based on the first pilot studies (2016-2018) and the practical experiences of these pilots, the new pilot projects will similarly investigate the possibilities to create proofs-of-concept and prototypes for new big data sources. It will be important to strengthen the conceptual and functional aspects of multisource statistics and multipurpose sources. For the former, the statistical themes that would profit from the use of multiple data sources should be identified and for the latter aspect big data sources should be fully exploited for serving a wide range of possible statistical themes. In addition, subject-matter experts from specific statistical domains should be involved since the beginning in order to provide expertise and support the use of multiple data sources.A precondition for using big data sources is having access to them. Difficulties with access prevent Member States from tapping the full potential of big data. Even where the legal environment is favourable, the laws may set the right of access in quite general terms; therefore their interpretation at the implementation phase can make a difference. Access can be denied/made difficult by private data holders that do not cooperate well. Therefore, the aim of new pilot projects will be to develop the relationships with data holders at both national and EU level in order to promote mutually beneficial solutions for both NSIs and private data holders. From the operational point of view it is necessary to establish a Multi Beneficiary Grant Agreement (ESSnet) covering a period of 24 months (November 2018 – October 2020), which should achieve or contribute to the achievement of the following aims of the BDAR: (New) pilots for generating statistics from big data sources at ESS level.The key critical success factor in the action plan is an agreement at ESS level to embark on a number of pilots. Similarly to the first wave of pilots a real understanding of the implications of big data for official statistics can only be gained through hands-on experience, ‘learning by doing’. Different actors have already gained experience conducting pilots in their respective organisations at global, European and at national level.The overall purpose of the pilots would be to gain experience with big data challenges, to analyse the data and identify potential statistical output, to contribute to discussions and problem solving related to the horizontal topics and to sketch possible ways for implementation of statistical processes involving big data sources at ESS level. Identification and analysis of output portfolio of big data sources (multipurpose data sources).This objective makes part of the pilots mentioned above. Big data sources may have the potential for generating statistics related to various statistical themes. E.g. the use of financial transactions data could contribute to the household budget survey, e-commerce, balance of payments and short term statistics. It is most likely that big data sources would necessitate the use of additional sources, such as other big data, administrative or survey sources to generate statistical data. Identification and definition of skills and competences.As already stated in the Scheveningen memorandum, the ESS has to be made fit for competences and skills for combining big data processing with statistical data production. The access, management, processing and analysis of big data require specific new skills or skills combinations

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 15 / 43

that are currently not adequately present in NSIs. These are closely related with the term of “data scientist”. Additional skills related to legal expertise, relationships with stakeholders or communication of results to users are equally important for successfully integrating big data sources into official statistics. Acquiring these skills within the European Statistical System will be essential for success of the BDAR. The ESSnet should contribute to identifying necessary skills and competences while exploring big data sources during the execution of the new pilots. Exchange of information with stakeholders within the statistical system and the research

community.An important element is to share experience on projects, applications, pilots and big data sources within the ESS. Means for achieving this goal include organising and participating in workshops or exchange of information via electronic communication platforms. Specific actions in the pilots should assure information flow and exchange between stakeholders. Development/review of methodological and quality frameworks for big data sources in official

statistics.The use of big data sources requires application of new methods in data analysis and processing. At the same time the methods are dependent on the data sources, e.g. if they contain structured data, textual information or images. Developing new methods will require collaboration with the scientific community. Actions related to methodology should aim at developing a common toolbox of methods that would become available throughout the ESS and fit for use for different statistical domains.The provision of high quality information, i.e. information fit for its intended purpose is one of the corner stones of official statistics. Quality profiles differ depending on the product type according to the statistical information infrastructure, i.e. indicators, accounting systems, and data. The final aim is to be equipped with a quality framework that would be adjusted to big data sources and that would allow describing quality of related statistics according to their intended use.The ESSnet should, by taking examples of concrete big data sources, contribute to further developing appropriate methodological and quality frameworks for big data processing from input to dissemination. Identification, definition and implementation of IT infrastructures for big data processing.The inherent characteristics of big data, including their volume, variety and velocity have implications on IT systems and infrastructures. In order to utilise the potential of big data it will be necessary to analyse requirements related to big data processing, including security and confidentiality issues. The design of future infrastructures would be very much determined by the future business model(s) implemented to produce statistics from big data. For this reason emphasis should be given to analysis and design of possible business processes under different conditions of collaboration.The ESSnet should contribute to identifying and defining of suitable IT infrastructures for big data processing. For the purpose of running the pilots a common IT architecture (shared sandbox environment for all pilots), should be utilized. Access to big data sources, identification and preparation of non-legal and legal conditions for

access and use of big data within the ESS.Access to big data sources is a primary condition for achieving the overall aim of the BDAR. At the same time, already existing pilots show that access is one of the most difficult issues. Therefore access to data sources should be assured before embarking on the pilots. The pilots should include

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 16 / 43

activities related to identifying issues and developing solutions for the use and processing of specific big data sources including issues of confidentiality and privacy.The approach requires coordination between data source related pilots and topic oriented actions. E.g. methods should be developed within the pilots but should then be consolidated at a more general level to produce a toolbox of big data methods. Some actions require general input to provide a general structure but should be further developed as part of the pilots and finally be consolidated at general level, again. This would for example be the case for adjusting the existing ESS quality framework to big data. In general, big data sources should be explored for their potential to contribute to different statistical products for a wide range of statistical domains.While these new pilot projects are by their very nature highly experimental, the potential added value for the ESS is also very high. The various criteria for pilots presented in Section 6 of BDAR will be applied to make sure that the potential added value to the ESS is maximised.

The subsequent paragraphs of this section provide a brief description of the aims and objectives of the individual work packages related to the new pilot projects 2018-2020.

4.2.2.1. New pilot projects 2018-2020 – Work packages

Discussions on the content of the currently proposed second wave of pilot projects (period 2018-2020) have already started (ESS Big Data Steering Group, July 2017). The proposed new pilot projects concern the following work packages:

4.2.2.2. Use of financial transactions data (WP.N.1) 1

The use of the internet has profoundly transformed the way that trade transactions are performed. E-commerce is taking-up in Europe and companies provide consumers the means to electronically order goods and pay online. Electronic payments either by debit or credit cards leave an electronic trace that can be exploited in order to produce official statistics for the companies as well as for the consumers, hence complementing statistical information that currently is collected through surveys. The relevant data sources that register electronic payments through debit or credit cards are national entities that operate under national laws. Therefore, the commercial transactions between national e-commerce websites that have a virtual POS and individuals (B2C) and legal entities (B2B) that use bank payment cards (national or other) are registered. Mobile payments though Near Field Communication devices and other innovative payment methods evolve fast and offer convenience and flexibility when it comes to concluding payments. It is hence expected that financial transactions data will be a big data source with an immense exploitation potential.The aim of this pilot will be to firstly identify the stakeholders and investigate the conditions for accessing relevant data at national level. Conditional to the output of the investigation for accessing data would be the use of the data. Subsequently, the aim of this pilot would be for example whether information on geographical segmentation of transactions can be produced and identification of companies' activity sectors. In addition, the aim will be to demonstrate the production of information related to household budget and companies' share of turnover through e-commerce (multipurpose

1 "N" in WP.N.1 stands for "New"

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 17 / 43

aspects). In particular, due to the multipurpose aspects that need to be investigated, the involvement of subject-matter statistical experts will be critical. Financial transactions data may also be approached by developing "citizen-data apps" with the direct involvement of citizens (similar to 4.2.3.2).Use in specific statistical themes: Statistical themes that may profit from relevant findings and should be investigated would be those related to e-commerce, enterprise turnover through e-commerce, enterprise economic activities, household budget statistics, tourism statistics, etc.

4.2.2.3. Remote sensing (WP.N.2)

(Use of satellite images, use of Light Detection and Ranging LIDAR1 remote sensing method) Several attempts have already been made at national level to exploit satellite imagery for identifying land use characteristics (LUCAS classes). Methodologies were based on combining data from multiple national data sources. However, it has been acknowledged that a harmonised approach should be further developed together with the necessary skills for exploiting the full potential of satellite images combined with other data sources. The aim of this pilot will be to support official areal statistics with remote sensing data, to develop a harmonised methodology at European level, to verify and improve the plausibility of the methodology, and to support agricultural statistics (e.g. crop yields based on remote sensing and growth models). From the technological point of view, the aim of this pilot project will be to use new methods e.g. artificial neural networks for image segmentation and object recognition, and deep learning for satellite image analysis. Findings and results from work which has already been accomplished from using remote sensing data in the framework of "ESSnet Big Data SGA No 2 - WP7 Multi domains" should be taken on board and exploited.Use in specific statistical themes: Statistical themes that may profit from relevant findings and should be investigated would be those related to land use characteristics, agricultural statistics, etc.

4.2.2.4. Online platforms such as social media and sharing economy platforms (WP.N.3)

Sharing economy business models, typically in the form of transaction-based platforms for the provision of services e.g. transport, accommodation, etc., are rapidly emerging and growing not only across Europe but around the globe. Online platforms and sharing economy platforms, such as the US based Airbnb and Uber, connect providers with users and facilitate transactions between them. Assuming that activities over online platforms will substantially increase, the aim of this pilot will be to investigate how feasible and useful it would be to estimate and produce relevant statistics on the part of these activities within the economy. In addition, the pilot should demonstrate whether official statistics2 may profit in using data from online platforms for producing complementary to existing statistics relevant to labour market, employment, source of income, etc. The issues with data holders (e.g. identification of data holders, 1 LIDAR, which stands for Light Detection and Ranging, is a remote sensing method that uses light in the form of a pulsed laser to measure ranges (variable distances) to the Earth.

2 SWD(2016) 184 - European agenda for the collaborative economy - supporting analysis (Point 6.1 Challenges for official

statistics) https://ec.europa.eu/docsroom/documents/16881/attachments/3/translations/en/renditions/nativeFor more information: http://ec.europa.eu/growth/single-market/services/collaborative-economy_en

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 18 / 43

data access and data quality) remain and should be further investigated in the context of the new pilot projects. Use in specific statistical themes: Statistical themes that may profit from relevant findings and should be investigated would be those related to e-commerce, labour market, employment, source of income, etc. as well as in the context of national accounts.

4.2.2.5. Mobile network operator data (WP.N.4)

Mobile network operator (MNO) data was among the pioneering sources being explored in the context of modernisation of official statistics. The behaviour of citizens in using the services of MNOs is constantly changing (e.g. in view of the recent abolition of roaming charges); at the same time the technology and data sources are in constant development (e.g. the shift from Call Detail Records to signalling data).The aim of this (continued) pilot is to investigate – at European level – the operational model for data holders (MNOs) and data users (NSIs) to cooperate and exchange data; on the one hand for ad-hoc exploratory projects, on the other hand in view of regular production of official statistics.A second aim is to develop a methodology to use MNO data, jointly with other big data sources and/or traditional data sources for producing statistics. Use in specific statistical themes: A multi-domain approach (e.g. population statistics, mobility statistics, migration statistics, tourism statistics) will be essential in order to fully exploit the potential synergies. Standards and adequate methodologies could be developed in order to test the validity of statistical definitions (e.g. usual resident population) and assist in the identification of population groups (e.g. usual residents, workers, tourists, commuters, etc.) depending on various forms of their mobility (e.g. daily commuting, week-day/weekend commuting, holiday visits or seasonal moves). In that aspect the involvement of subject matter experts is critical.

4.2.2.6. Innovative sources and methods for tourism statistics (WP.N.5)

Notwithstanding the need for a holistic approach to the pilot projects, smaller scale and domain-specific pilots can be complementary to the multi-country / multi-domain oriented pilots outlined above in the previous paragraphs. Given the nature of the phenomena to be observed, tourism statistics seems to be one of the forerunners in the area of big data research.The aim of this particular pilot is not to explore one data source. On the contrary, the aim is to explore how different innovative sources (for instance mobile network operator data, stores cashier data, financial transactions, smart energy meters, web activity) and/or innovative methods (mixed-mode surveying exploiting the potential of smartphones as a data collection tool) can interact to build a solid production system for tourism statistics, beyond the currently used traditional business and household surveys. Moreover, a particular challenge will be data integration and the quality framework that needs to be adapted to accommodate specificities of big data sources. The initiatives and set-up are likely to be not only domain-driven but also country-driven, however the objectives include that the insights obtained from these smaller scale pilots be cross-fertilising the above listed Work Packages, be replicable in other countries or in related domains and be in line with the multi-source/multi-domain idea of big data. The pilot project should build on the efforts, the experiences and the achievements of the "ESSnet Big Data - WP7 Multi domains" 2016-2018. In the framework of

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 19 / 43

the latter ESSnet the following pilots were conducted: flight movement, tourism accommodation places and road sensors.

4.2.3. Extend the work and the investigation of innovative applications in the domain of trusted smart statistics – Aims and objectives

Work at Eurostat on investigating innovative applications in the domain of smart statistics has already started in 2017. The aim has been the development of proofs of concept within an extended Internet of Things (IoT) ecosystem for the production of smart statistics. Results are expected in the course of 2018. However, during the coming decade, we will see a massive proliferation of electronic devices and sensors that are connected to the internet. They will generate and communicate huge amounts of data via this network. Therefore, involving statisticians at a very early stage and proactively extending our work on smart statistics is crucial. Extending the work on smart statistics has the ambition of widening the scope of big data by considering the entire data ecosystem as an opportunity for intelligent production of relevant data for European Statistics. While the big data project follows a centric approach accessing data in places where they are collected, smart statistics analyse the conditions for using the network and its components for producing relevant statistics, largely instantly and in an automated way.In an IoT environment, smart devices and electronic production of data that embed statistical information will allow the production of a different kind of statistics for smart cities, connected vehicles, farming, etc. In addition, harnessing citizen science where relevant may be a strategic decision that should be deeply explored, potentially having organisational consequences for the NSIs.The specific project should achieve or contribute to the achievement of the aims of the BDAR (expected to be revised in order to include smart statistics) that were mentioned under 4.2.2, where applicable from the smart statistics perspective.

4.2.3.1. Extend the work and the investigation of innovative applications in the domain of trusted smart statistics – Work packages

The subsequent paragraphs provide a brief description of the third strand, in particular the aims and the objectives of the following work packages related to smart statistics. From the operational point of view the third strand concerns either using only public procurement procedures or combined with a Multi Beneficiary Grant Agreement (ESSnet) covering a period of 24 months (November 2018 – October 2020).

4.2.3.2. Use of citizen science1 data for individuals' well-being (WP.S.1) 2

Research collaborations between scientists and volunteers expand the opportunity for data collection in a wide range of diverse fields such as ecology, medicine, psychology, and many more. From the big data point of view, there have been several efforts to develop platforms that support crowdsourced citizen-science data. In the scope of the current project is the part of citizen science 1

"Amateur science", "crowdsourced science", "volunteers monitoring" and "public participation in scientific research" are also common aliases for citizen science.

2 "S" in WP.S.1 stands for" Smart"

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 20 / 43

data produced by smart devices and smart sensors related to individuals' well-being and analysed in terms of individuals' physical activities. Moreover, wearable sensor technology enables continuous seamless interaction with real-time health information e.g. pulses, blood pressure, glucose meter, etc.Therefore, the objective of this pilot will be to demonstrate the usefulness and the feasibility to collect data directly through wearables (such as fitness trackers) or app-equipped devices (smartphones) that incorporate functions capturing and transmitting for example individuals' temperature, pulse and blood pressure adequately stamped with date, time and geographic coordinates. Use in specific statistical themes: Statistical themes that may profit from relevant findings and should be investigated would be those related to health statistics (supplementary to self-perceived health reporting), etc. The potential of "citizen-data apps" may also be extended to more traditional but very relevant fields of interest like tourism, time use survey and household budget survey.

4.2.3.3. Citizen science data and smart cities (WP.S.2)

An additional, big data source could be in the context of smart cities1. The purpose of this pilot will be to demonstrate the usefulness and the feasibility to collect citizen science data in the context of quality of life, environmental protection (measuring air pollution), resource efficiency, smart traffic, etc. focusing on the use of new technologies and sensors in an urban environment. The combination with information from smart sensors is within the scope of the specific work package. Subsequently, a wide range of specific statistical themes can benefit from investigating applications that would use the relevant data sources.

4.2.3.4. Smart cities and connected vehicles (WP.S.3)

In the context of smart cities, vehicles equipped with smart sensors have the potential to reduce traffic jams and increase safety on the road. Detecting abnormal road conditions combined with sudden severe weather outbreaks could actively contribute to safety on the road. Moreover, alerts and direction to available parking spaces may lead to reduced pollution. In fact, connected vehicles are in the heart of smart cities provided that the necessary (Internet-of-Things) infrastructure exists to collect, aggregate and dispatch the relevant information to appropriate decision making centres.Vehicles can be connected in two ways: embedded and tethered. The former vehicles use built-in hardware, while tethered connections use hardware that allows drivers to connect to their cars via their smartphones. The aim of this pilot will be to demonstrate the feasibility to collect and use effectively geolocated data produced by "smart vehicles".Use in specific statistical themes: Statistical themes that may profit from relevant findings and should be investigated would be those related to urban mobility and optimised transportation resources, autonomous vehicles, etc.

4.2.3.5. Smart farming (WP.S.4)

IoT and other technology is extended to crop and animal farming using [internet-]connected physical devices enabling the collection and exchange of data between them. Combined data from farm

1 Citizen Science and Smart Cities, JRC Technical Reports, Report of Summit, Ispra, 5-7th February 2014, Max Craglia and Carlos Granell (Eds.), http://publications.jrc.ec.europa.eu/repository/bitstream/JRC90374/lbna26652enn.pdf

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 21 / 43

machinery sensors satellites, drones, wireless smart sensors in the field are may enhancinge farm management systems aiming at reducing costs and input, optimising improving production and the overall efficiency of agriculture. Such systems may be exploited in greenhouses as well as in open field agriculture to determine best conditions for crops, monitoring the soil and weather conditions, etc. In addition, smart devices may provide the necessary information for running statistical yield prediction models. Computer-based systems have long been used in as well as enhance livestock farming with information on production volumes, food intake and early detection of possible disruptors infections (e.g. udder infections) in the production system.

The aim of this pilot will be to demonstrate the feasibility to effectively use smart statistics on agricultural inputs and production hence complementing statistics collected traditionally. Moreover, the pilot project will aim at creating a more modern, flexible and efficient system for providing relevant and appropriate official EU agricultural statistical data of high quality, particularly in response to a changing environment with an increasing diversification of agricultural activities and higher attention for the environmental impact of farming.

Use in specific statistical themes: Statistical themes that may profit from relevant findings and should be investigated would be those related to crop and animal farming, environment, public health, economic development, etc.

4.3. Deliverables and Key Milestones

4.3.1. First Implementation phase

Work packages are defined for the four respective pilot projects that will be implemented. The activities outlined in section 4.2.1 will be carried out for each work package. However, these activities may need to be adjusted to the specificities of the respective data sources. The following (indicative) deliverables related to the administrative and financial aspects of the first implementation phase should be different from the technical reports that refer to the specific work packages.

Deliverables (Administrative reports for the management of the ESSnet):

Management and quality assurance plan drawn up by the coordinator for the execution of the ESSnet.

Regular progress meetings with the ESSnet members and Eurostat.

There should be at least an annual meeting of all ESSnet members.

Regular reporting on the progress of the ESSnet.

The management of the ESSnet should report quarterly on the progress of the ESSnet.

A final report is produced for the multi-beneficiary grant agreement and the contained actions.

Maintenance of website (CROS Portal)

Deliverables (technical reports for each of the pilot projects to be implemented, if applicable to European and/or national level, including material necessary for the implementation such as statistical procedures and software programs, etc.):

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 22 / 43

Technical report concerning the identification of the relevant statistical processes that may be affected by the implementation.

Technical report concerning the implementation requirements of prototypes in the relevant statistical processes.

Technical report on the quality management framework of the integration process and on the management of the process (detailed milestones and achievements, quality framework) related to the fulfilment of the technical implementation requirements.

Technical report on the statistical output, required quality and definition of the necessary metadata at European and national level.

Technical report including the necessary material for the integration process (transfer of knowledge, methodologies, IT infrastructure, toolbox of methods, software development, testing, and maintenance)

For all work packages that would be included in the first implementation phase an important milestone would be the identification of the data holders and the access to the data for the countries that will participate. Any subsequent activities related to the implementation would be strictly conditional to being able to access the relevant data and assess the quality. Therefore, the participating countries to be selected should be evaluated at the point of answering to the call for proposals.

4.3.2. New pilot projects 2018-2020

The new pilots (described in 4.2.2) will explore the potential of the various data sources for producing statistics for various statistical domains. The findings of the new pilots should be generalised and contribute to the completion of actions related to horizontal topics, such as review and update of quality framework or the compilation of a toolbox of methods. In order to ensure a harmonised input to these horizontal topics, it will be necessary to prepare standard structures within the horizontal work packages to be applied during the execution of the different pilots.The pilots will focus on the aspect of producing statistics at European level, i.e. plans for implementation of data production processes have to cover the entire ESS.While the data sources of the pilots will be different, the execution of the pilots should follow similar patterns. The following non-exhaustive list of considerations should be taken into account:

Access to the data (European dimension). Continuity of data sources and statistical information for longer time period. Portfolio of statistical products based on data sources and needs of statistical data users. Definition of business processes and derived metadata (auditable steps including assurance

of data security and confidentiality; ensuring data quality and its documentation). Quality of data (input, processing, output). Development of methodology for production of statistics (ensuring confidentiality of

output). Definition of IT infrastructures for data processing (applicable at large scale, ensuring

quality reporting and security of data).

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 23 / 43

Communication to the public (as contribution to an overall communication strategy). Treatment of legal issues (related to data access, processing and output). Pilot production of statistical data and assessment of quality (including multi-purpose and

multi-source aspects). Provision of a common IT infrastructure for executing the pilots suitable for analysis and

processing of data by the ESSnet. Implementation of process at ESS level.

Deliverables:

The deliverables related to the administrative and financial aspects should be different from the technical reports that refer to the specific work packages. Each pilot will produce a final technical report suggesting solutions for the above mentioned relevant issues by the end of the pilot duration. The report will be partially completed according to the phases of the pilot, e.g. chapters on data access and statistical products could be finalised at an earlier stage.

Each final technical report should contain chapters related to horizontal topics Evaluation of big data sources and definition of possible statistical products from examined

big data sources. Business architecture suitable for big data processing. Methodological framework. Quality framework. Metadata framework. IT infrastructure. Protection of privacy and confidentiality and other legal issues. Inventory, requirements, definition, and specification of future big data integration.

For all work packages that would be included in the new pilots an important milestone would be the identification of the data holders and the access to the data. It is necessary to investigate the issue, bearing in mind that any subsequent activities related to the implementation would be strictly conditional to being able to access the relevant data and assess the quality.

Administrative reports related to the administrative and financial aspects should refer to the regular progress meetings, milestones and overall achievements, delivery of the abovementioned technical reports and should be related to the payment requirements relevant to the project.

4.3.3. Extend the work on smart statistics and the investigation of innovative applications in the domain of trusted smart statistics

Work on smart statistics will demonstrate the usefulness and the feasibility to use data from smart devices and sensors in order to produce statistics. The use of communication standards, the issue of quality framework and the production of statistics applicable throughout the ESS should be in the core of the findings of the work on smart statistics. The following non-exhaustive list of considerations should be taken into account:

Use of standards in smart devices, sensors and actuators in terms of interoperability and communication protocols.

Access to the data (European dimension).

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 24 / 43

Continuity of data sources and statistical information for longer time period. Portfolio of statistical products based on data sources and needs of users for statistical data

of higher granularity, extended scope, higher frequency and higher automation. Definition of business processes and derived metadata (auditable steps including assurance

of data security and confidentiality; ensuring data quality and its documentation). Quality of data (input, processing, output). Development of methodology for production of statistics (ensuring confidentiality of output). Integration of a distributed ledger technologies (e.g. Blockchain) or other similar

technologies ensuring security, integrity, authentication, transparent algorithms, audit trails, etc.

Definition of IT infrastructures for data processing (applicable at large scale, ensuring quality reporting and security of data).

Communication to the public (as contribution to an overall communication strategy, in particular on the issue of trusted smart statistics).

Treatment of legal issues (related to data access, processing and output). Pilot production of statistical data and identification of quality issues in relation to standards

such as for algorithmic bias considerations. Provision of a common IT infrastructure for executing the pilots suitable for analysis and

processing of data by the ESSnet. Implementation of process at ESS level.

Deliverables:

The deliverables related to the administrative and financial aspects should be different from the technical reports that refer to the specific work packages.

Each work package will produce a final technical report suggesting solutions for the above mentioned relevant issues by the end of the action's duration. The report will be partially completed according to the phases of the action.Indicatively each final report should contain chapters related to horizontal topics

Evaluation of big data sources in the context of smart devices and sensors and definition of possible statistical products.

Business architecture suitable for big data processing in the context of smart devices and sensors.

Methodological framework including statistical algorithms (computational statistics). Quality framework. Metadata framework. IT infrastructure. Protection of privacy and confidentiality and other legal issues, security. Inventory, requirements, definition, and specification of future big data integration in the

context of smart devices and sensors in a multipurpose and multisource statistical environment.

For all work packages that would be included in the smart statistics an important milestone would be the identification of the data holders/data platforms and the access to the data. It is necessary to

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 25 / 43

investigate the issue, bearing in mind that any subsequent activities related to the implementation would be strictly conditional to being able to access the relevant data and assess the quality.

4.4. Indicators

The scope of the business case is rather wide and of diverse nature. It comprises activities for the first implementation phase related to using big data sources for official statistics, exploration of new data sources and smart systems for their potential of utilizing them in the context of official statistics. Exploration of new data sources is comparable to the scope of the current Big Data ESSnet while smart statistics must be seen in a wider context of the IoT. Therefore indicators monitoring the implementation and the success of the project might differ.

First of all, we would propose indicators related to the implementation and management of the project itself. These are:

Building of a knowledgeable consortium bearing the potential for successful outcome of the project.

This will be a condition for further progress of the project. It will be evaluated by Eurostat on the basis of the proposal(s).

Implementation of the project according to planning.

This can be monitored and evaluated by the project steering group.

Quality of the reports.

The quality of the reports will be evaluated by Eurostat. In addition, we propose to establish an independent review board with members from the European Statistical System. Such a board was introduced for reviewing the reports of the Big Data ESSnet.

Involvement of stakeholders.

The involvement of stakeholders can be assessed against their contribution to the project and their active participation in events such as seminars and workshops, where results of the project are discussed.

4.4.1. First Implementation phase

Indicators will be used to measure the progress of the implementation applicable to the deliverables in section 4.3.1 within the reports that will be accordingly foreseen. Indicators for a successful implementation of this part of the project could be:

Assurance of access to data sources. The number of different data sources. The number of statistical products derived as a result of the implementation and the affected

statistical domains. The achievement of pre-defined quality requirements (according to applied quality

framework). The actions and participation to these actions for transferring developments and knowledge

within the ESS.

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 26 / 43

4.4.2. New pilot projects 2018-2020 and extension of the work in the domain of trusted smart statistics

Indicators on new big data pilots and smart statistics have to reflect the exploratory nature of these parts of the projects.The following indicators are proposed to be used for measuring success of the project:

Access to data sources. Identification of use case and possible statistical outputs. Communication to stakeholders. Use of common IT infrastructures. Possibility of reuse of developed applications. Soundness of methodological approach and completeness of metadata. Follow-up of implementation activities.

4.5. What the project does not include

4.5.1. First Implementation phase

The project only covers the implementation in selected lead countries but not a complete roll-out of processes within the entire ESS. Reasons for this are limited availability of funding, the nature of the output of the previous project, the Big Data ESSnet. However, this project will create a solid base for implementation within the ESS, i.e. solutions should be chosen taking into account conditions for European wide implementation.

The meaning of "implementation" will be sufficiently detailed in the forthcoming specifications in terms of requirements and final output for each of the four successful prototypes. Implementation may vary from case to case with regard to the quality and the use of the final statistical output, hence it may be part of experimental statistics or complementary to already produced official statistics. From that point of view "implementation" does not imply integration in the national production statistical processes related to the provision of data to Eurostat that may be otherwise governed by EU legislation.

In addition, the issue of common and "national implementation" specificities should be clarified in terms of requirements at conceptual, functional and operational level. Though conceptual and functional implementation guidelines can be commonly developed and adopted, specific technical or practical implementations would remain in the discretion of the NSI. However, the possible development of toolkits to be used could remain within the scope in the case that NSI would like to use off-the-shelf tools. Toolkits should be understood in the wider context of the necessary IT infrastructure that may be necessary to install for the practical implementation of the prototypes. In that aspect it may be necessary to organise workshops, presentations of tools and results and in general communicate effectively the findings to all NSIs.

4.5.2. New pilot projects 2018-2020 and extension of the work in the domain of trusted smart statistics

These new pilot projects are of an exploratory nature, and do not include integration of these big data sources in the official statistics production across the ESS.

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 27 / 43

Implementation is not included in this part of the project. However, activities will profit from already made experiences during first Big Data ESSnet and from the early implementation activities.

5. Impact Assessment

5.1. Stakeholder Analysis

In the Scheveningen Memorandum1 statistical offices recognised the potential and the need for harnessing new data sources for official statistics. Subsequently, the ESS started activities on exploring these sources as part of the ESS vision "Building the Future of European Statistics" 2 and the VIP Big Data3. The Big Data ESSnet will terminate in May 2018 and preliminary results of the 1st SGA encourage actions to converting outputs of the pilots to statistical production as well to continue further exploration of other big data sources. In addition, smart environments, which are now being developed, are challenging the way statistical data would be collected in the future. There is a need to further develop the capabilities of statistical offices to use the potential of these future smart digital systems for official statistics to ensure relevance of the statistical system in future.

On 6 May 2015 the European Commission adopted the Digital Single Market Strategy4 with the aim of opening up digital opportunities for citizens and businesses to tap the full potential of a future digital single market for the European economy. In the Communication "A European agenda for the collaborative economy from 2 June 20165, the EC recognised the need for monitoring the collaborative economy and demanded that "collaborative platforms should cooperate closely with the authorities, including the Commission, to facilitate access to data and statistical information in compliance with data protection law." The Communication on "Building a European Data Economy" adopted in January 20176 stated the intention of the EC to discuss issues of access to privately held data by public authorities, including statistical offices, for public interest. A workshop7 on this topic was held by DG CONNECT on 26 June with a dedicated session on access to privately held data for official statistics in which five NSIs expressed the need to gaining access to this data for producing official statistics in the future. In its mid-term review on the implementation of the Digital Single Market Strategy8 from May 2017, the EC announced its intention to further explore this issue in the

1 adopted by the ESSC on 27/09/2013: https://ec.europa.eu/eurostat/cros/system/files/SCHEVENINGEN_MEMORANDUM%20Final%20version.pdf

2 http://ec.europa.eu/eurostat/web/ess/about-us/ess-vision-2020

3 Adopted by the ESSC on 13 November 2014 (ESSC 2014/23/2a/EN)

4 Com (2015) 192 final, http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:52015DC0192&from=EN

5 Com (2016) 356 final, https://ec.europa.eu/transparency/regdoc/rep/1/2016/EN/1-2016-356-EN-F1-1.PDF

6 Com (2017) 9, http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:52017DC0009&from=EN

7 https://ec.europa.eu/digital-single-market/en/news/workshop-access-public-bodies-privately-held-data-public-interest

8 Com (2017) 228 final, http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=COM%3A2017%3A228%3AFIN

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 28 / 43

context of the review of Directive 2003/98/EC1 on the re-use of public sector information. In this regard a public consultation on the review of the PSI Directive was launched on 19 September 20172.

On 18 October 2016 the EC adopted a Communication to the Commission on "Data, Information and Knowledge Management within the European Commission3 in which the EC recognised them as strategic assets and set out a corporate strategy to manage them accordingly, focussing on maximising the use of data for policy making. The objective is to build an infrastructure for data management and analysis in support of evidence based policy making and use this capability for the Commission's political priorities as well as to better coordinate the current data analytics activities which are fragmented across different DGs. This corresponds to the activities of the Data4Policy group of Director-Generals (ESTAT, CONNECT, JRC, DIGIT, OP and EPSC) aiming at developing pilot projects in close consultation with the relevant policy DGs and focussing on some key policy areas (e.g. skills, better regulation, migration, regional policy, security) to demonstrate the potential of data at all stages of the policy life cycle.

Most of these new data sources and smart systems are developed and maintained by private enterprises. In general, these data sources are exhaust products of the digitalisation of society and economy and they are not re-used for any other purposes. However, an increasing number of applications are developed with the explicit intention to collect data for subsequent data analysis, e.g. to enhance provided services or to better target advertising. An increasing data market and data economy is a result of this development. In general, private enterprises act as data producers as well as producers and users of data analytics, including statistical information. In the context of this data economy, statistical offices have to explore their share in using these data for public interest and providing statistical data back to users of official statistics.

The user community of official statistics expect high quality statistical information on the society, the economy and the environment at the lowest possible burden. High quality means to respond adequately to the needs of different users in terms of the various quality aspects, such as accuracy, timeliness, reliability or coherence. Facing a proliferation of all kinds of information in the digital media statistical offices should further develop their role as trusted third parties for statistical information ensuring privacy and confidentiality to the data subjects. Evidence based policy making is relying on appropriate trusted statistical information derived from all kinds of sources and increasingly from new data sources.

5.2. Project Environment

The ESS is already very active in exploring big data sources for purposes of official statistics. Activities are determined by the Big Data Action Plan and Roadmap4 (BDAR). Other stakeholders, such as the

1 http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:02003L0098-20130717&from=EN

2 https://ec.europa.eu/info/consultations/public-consultation-review-directive-re-use-public-sector-information-psi-directive_en

3 Com (2016) 6626 final, https://ec.europa.eu/transparency/regdoc/rep/3/2016/EN/C-2016-6626-F1-EN-MAIN.PDF

4 https://ec.europa.eu/eurostat/cros/content/ess-big-data-action-plan-and-roadmap-10_en

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 29 / 43

European System of Central Banks and the DGs of the European Commission are also engaged in exploring the use of new data sources and designing systems for analysing and extracting information from these sources (Data4Policy initiative). The BDAR only foresees deployment of new production system starting in 2020. However, it would be preferable to start with deployment activities immediately after the availability of the results of the first Big Data ESSnet in 2018. As the goal of the first ESSnet is producing reports on the feasibility and possible statistical outputs it would be necessary to further develop the use cases into statistical processes and outputs before general implementation at the level of the ESS. The first part of this project would cover this purpose.

The preliminary results of the first Big Data ESSnet encourage exploring additional big data sources for their potential of producing official statistics. A number of big data sources have been already identified in the BDAR. The actual choice of the sources proposed in this business case has been confirmed and supported by the Steering Group Big Data. The structure of the pilots would follow a similar approach as for the first ESSnet. However, having already gathered experience, it would be expected to conduct them in a more efficient way. By the end of the first Big Data ESSnet we will have outputs related to quality, metadata and IT infrastructures, which can be used and further refined in the second round of pilot projects.

The development of smart systems is the next step in datafication. Instead of collecting data as by-product of activities, data are collected to improve the functioning of systems in real time. Examples are optimisation of traffic or maintenance of devices. Machine learning algorithms are taught to extract appropriate information from the data to trigger actions for improving the functioning of these systems or devices. The systems are now being developed and it would be the right time to take part in these developments, i.e. influence the collection of data for producing specific statistical output. Once these systems will have been developed, there will be a high risk that official statistics will not have access to them and it would not be possible or at least very difficult to integrate requirements of official statistics as part of the various standardisation processes.

Eurostat is starting a project on smart statistics in 2017 that covers a proof of concept for selected domains1. At the same time, some NSIs are commencing similar activities, e.g. CBS is organising an international event "Big Data Matters – Towards Smart Statistics" on 27 Sep 2017 which is featuring the use of smart sensors for official statistics. In parallel with the contractual activities, it will be necessary to develop the capabilities of the members of the European Statistical System to act as an important stakeholder in this new environment.

5.3. Cost-Benefit Analysis

Concerning the first implementation phase, it must be acknowledged that the expected benefits will vary depending on the pilots to be implemented. The exact implementation costs are not yet known and may vary depending on the overall readiness of the NSIs. The overall expected benefits for the implementation phase, the new pilot projects and smart statistics can be summarised as follows:

Better response to user needs through availability of various new data sources.

1 see business case "Smart Statistics"

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 30 / 43

Acquisition of new competences to enlarge portfolio of official statistics and ensure role as centres of competence towards users.

Increased efficiency (if a statistical product is possible to produce at lower cost using big data sources) complementing existing statistics.

Increased quality (if a statistical product could be improved [timeliness, completeness, relevance, accuracy, ...] using big data sources).

Reduction of burden on respondents.

Faster adaptability (if the phenomenon that official statistics tries to capture “moves”, the big data source may possibly “move with it”, including new, relevant variables as part of an expanding business).

Increased flexibility of the statistical system to respond to changing and new user needs.

Increased portfolio of statistical products when redesigning outputs or creating new statistical outputs.

Provision of big data based official statistics, produced in compliance with sound statistical disclosure control (SDC) principles, may reduce the general public’s use of “non-compliant”, “alternative” statistics produced by other actors.

Pro-active approach that ensures / increases probabilities for integrating statistics into new data ecosystems.

Increasing relevance of official statistics.

Recognition of official statistics as important stakeholder in data economy and evidence based policy making.

Extension of range of data sources for producing official statistics.

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 31 / 43

5.4. Risk Analysis

Table 1: Risk Analysis

Nr Risk Name Prob.(L-M-H)

Impact(L-M-H)

Mitigation / Measure(relevant at European and/or national level)

1 Lack of sufficient resources in the Member States

M H - Clarify expectations in ToRs- Verify resource allocation in proposal- Monitor resources during project, issue early warnings and recommendations

2 Implementation is not as expected on time

M H - Efficient coordination, continuous follow-up, issue early warnings and recommendations

3 Delayed IT infrastructure (expected to be shared

among NSIs)

M H - Efficient coordination, involvement of DIGIT, issue early warnings and recommendations

4 Important big data sources not accessible(e.g. AIS data)

M H - The issue will be examined on a case by case basis.- Effective coordination between Commission services (e.g. DG MOVE, ESTAT,

CNECT)- Prior requirement for committing the data holder to make the data accessible to

the ESSnet (and an assessment of the sustainability of the data source).- Consideration of alternative data sources

5 Data security breaches L H - Prior privacy impact assessment and implement preventing measures- Threshold Assessment- Risk Identification- Risk Mitigation

- Definition of an action plan in case of breaches- Application of established security standards- Monitor data processing steps and data traffic (auditable steps)

6 Data confidentiality breaches

L H - Prior privacy impact assessment and implement preventing measures- Application of manuals and standards for protection of confidential data- Agree on applicable standards for confidentiality protection before starting implementation

- Apply agreed rules and verify application7 Unnecessary duplication /

repetition of work done by other entities

L L - Close collaboration and communication with stakeholders- Clarify expectations in ToRs- Frequent review of progress- Verify the outcome of the pilot projects (prior requirements); extend and adapt accordingly

8 Financial resources not allocated as planned

L M - Monitor resource allocation closely- Alert coordinator and emphasize responsibility

9 Not enough involvement by Member States

L H - Communication at different levels of the ESS- Only start project with sufficient support

10 High number of ESSnet participation

M M - Introduce appropriate management structure with different roles and responsibilities into ESSnet

11 Lack of availability of experts for ESSnet

H H - Ensure participation of NSI with relevant experience- Ensure inclusion of scientific community- Pay attention to availability of experts in proposal- Inclusion of scientific partners, e.g. universities- Recruitment of experts- Training of staff- ensuring backup staff

13 Changes in EU data protection legislation

L H - Monitor legislative developments- Conduct impact analysis

14 Different national legal environments

M M - Include specific action in ESSnet- Conduct separate study at appropriate point in time

15 Different national (technical, economic, societal, …) conditions, impact of languages

H M - Analyse conditions before or during pilot execution and consider results for implementation planning- include national modifications- foresee monitoring of national situations in project

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 33 / 43

6. Approach

6.1. Methodology

This project will follow the PM2 Methodology for all project activities.

6.2. General Description (including proposed financial instruments)

The goals of the project could best be achieved by shared cost actions (see Figure 1, Figure 2). Experiences from the first wave of pilot activities do justify the possibility of implementing the actions via multi-beneficiary grant agreements (ESSnet). The possibility to use procurement procedures for specific and well defined purposes cannot be excluded.

The results of Stages II and III (see Figure 1) should be beneficial to the ESS as a whole, taking advantage of joint efforts, the knowledge and the expertise of the leading NSIs. Before roll-out at the level of the ESS, successful pilots (2016-2018) should be implemented in 2018-2020 in some selected NSIs to create the conditions for a more general adoption of the newly designed processes at ESS level I 2020-2022. This will require appropriate communication at ESS level and would rule out solutions which would not be applicable at ESS level. In particular, during the period 2018-2020 the ESSnet should come up with methodological guidelines, an adequate quality framework, metadata requirements and the possibilities for sharing resources among NSIs in order to produce big data enhanced statistics. Part of the output of Stage II should provide the necessary elements for the work to be undertaken under individual grants in Stage IV provided that resources and budgetary allocation would be adequate.

Moreover, part of the specific strands of work (2018-2020) have an exploratory character (basically described as proofs-of-concept and prototypes) examining a wide range of aspects when dealing with big data hence multi-beneficiary grants are more flexible and agile in financing the relevant actions. Under the same financial provisions the "centralised" collaboration with academia, data holders, research institutes and standardisation bodies is more effective and efficient.

In the area of smart statistics, Eurostat has already started a call for tender to develop a prototype application. As future implementation will be done at Member State level and production of statistics will be concerned, shared cost actions will also be necessary for smart statistics. In this respect smart statistics shares communalities with exploring big data sources for official statistics. In addition, some NSIs have already started activities in this area. Support and coordination of these new activities will ensure application of results at ESS level.

6.3. Resources and Lead Times

The following table shows an estimation of the approximate resources and cost for the consecutive stages. Resources are estimated in full time equivalents. The costs are the value of possible future multi-beneficiary grant agreements divided between Eurostat and the participating Member States or cost for the procurement of statistical services.

Business case ESSnets Big Data 34

Figure 1: Resources and cost estimate

Eurostat (earmarked) NSIs (estimated)

First Implementation phase 2018-2020

FTE Budget FTE Budget (approx)

2018 0.50 7.002019 1.00 14.002020 0.50 7.00Pilots Phase II2018-2020

FTE Budget FTE Budget (approx)

2018 0.50 7.002019 1.00 14.002020 0.50 7.00Smart Statistics I2018-2020

FTE Budget FTE Budget (approx)

2018 0.50 - -2019 1.00 - -2020 0.50 - -2018 1.50 14.002019 3.00 28.002020 1.50 14.00

Smart Statistics II2019-2021

FTE Budget FTE Budget (approx)

2019 0.50

2020 1.002021 0.50

Smart Statistics III2020-2022

FTE Budget FTE Budget (approx)

2020 0.502021 1.002022 0.50Second Implementation phase2020-2022

FTE Budget FTE Budget (approx)

2020 0.502021 1.002022 0.50

100K€

980 K€ 100K€

485 K€

2.6 M€ Estimate not available Estimate not available

STAGEII

2018-2020

STAGE III

2019-2021

STAGEIV

2020-2022

2.465 M€TOTAL

1 M€

Estimate not available Estimate not available

Estimate not available Estimate not available

0.8 M€

2 M€

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 35 / 43

Figure 2: Allocation of resources

YearQuarter Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4

2016-2018Pilots Phase I2016-2018

STAGE I

2017-2018

Smart statisticsProofs-of-concept,quality, business process and information models

450 K€

2018-2020

First Implementation phase 2018-2020

Current business case 1 M€ STAGE II

Pilots Phase II2018-2020

Current business case 980 K€ STAGE II

Smart Statistics I2018-2020

Current business case 485 K€ STAGE II

2019-2021Smart Statistics II2019-2021

Updated Business case 2.6 M€ STAGE III

2020-2022

Smart Statistics III2020-2022

Updated Business case 0.8 M€ STAGE IV

Second Implementation phase2020-2022

Updated Business case 2 M€ STAGE IV

202220182016 2017 2019 2020 2021

6.4. Project Funding

As explained above shared cost actions should be used for implementing the project. The total cost of the project (current business case - STAGE II) is estimated at approximately 2.5 million Euro for Eurostat and approximately 0.25 million for the NSIs. The distribution of the available budget among the three objectives (First implementation phase, Pilots phase II and Smart Statistics I) may be further adjusted according to priorities defined commonly with the NSIs. An extension for the period of 2019- 2022 is foreseen that would require an additional input of 5.4 million Euro from Eurostat. The extension is part of the extension of the multi-annual work programme of Eurostat.

6.5. Dissemination of results

The project members will communicate internally via (mainly virtual) meetings, and use the Commission wikis for sharing documents and internal management.

The CROS portal will be used for final publication of the reports.

Programs will be documented and published using appropriate repositories.

The organisation of workshops and participation in seminars, conferences and meeting of relevant stakeholders will ensure communication and information on developments and results.

The project members should organise dedicated seminars for selected target groups to ensure communication, e.g. in the context of the pilots implementation.

It is essential that the implementation progress and the results of the integration in the NSIs is publicly available. It is expected that these first results will pave the way to extend the implementation to the cover the whole ESS.

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 36 / 43

7. Project Organisation

7.1. Project Manager

The project manager for this project is Albrecht Wirthmann

7.2. Reporting Structure

The project owner for this project is the DDG.

The reporting will follow the governance structure of the Big Data project.

7.3. Project Team

At Eurostat, the project will be coordinated by the task force big data. While the core members of the task force, i.e. those working mainly for the task force will ensure the management of the project, the other members can ensure coordination with activities within the subject matter, quality, methodology and IT domains.

As during the first phase it is expected that the ESSnet consortia could be large. In addition, any other national authorities1 which could possibly be involved should also be considered. The number of NSIs per implementation/pilot/smart statistics work package should be smaller, with approximately 3 “core members”. The implementation work packages could be possibly supplemented with additional members having the role of ensuring conditions of implementation across the ESS.

7.4. Project Documentation

The project will be documented according to document management rules of the European Commission.

1 List of National Statistical Institutes and other national authorities responsible for the development, production and dissemination of European statistics as designated by Member States

http://ec.europa.eu/eurostat/web/european-statistical-system/overview?locale=fr

http://ec.europa.eu/eurostat/documents/747709/753176/20170803_List_ONAs_EL/a70d4496-5022-453c-9618-4051a9925832

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 37 / 43

Annex 1 – Stakeholder Analysis

External Users Eurostat Internal Users Partners/SuppliersStakeholder Needs

EC P

olic

y DG

s

DG D

IGIT

DG JR

CDG

CO

NN

ECT

Acad

emia

Priv

ate

Busin

esse

s

UN

Cent

ral B

anks

Subj

ect M

atter

Uni

tsM

etho

dolo

gy, c

orpo

rate

ar

chite

ctur

eIT

infr

astr

cutu

re

IT fo

r sta

tistic

al

prod

uctio

nDa

ta &

met

adat

a se

rvic

es

and

stan

dard

sQ

ualit

y M

anag

emen

t

Lega

l Affa

irs

LISO

NSI

s

UN

Cent

ral B

anks

Acad

emia

Priv

ate

data

hol

ders

Priv

ate

anal

ytica

l ser

vice

pr

ovid

ers

Citiz

en S

cien

ce

Com

mun

ities

Data

Pro

tecti

on

auth

oriti

es

Additional statistical data X X X X X X X X XMore timely data X X X X X X X X XHigher flexibility X X X X X X XDesign and planning of IT infrastructure X X X X

Supply of IT infrastructure X X XDevelopment of new analytical methods and skills X X X X X X X X

Provision of analytical skills X X X X XQuality improvements (related to quality elements: timeliness, relevance, accuracy, coherence, comparability, …)

X X X X X X X X

Trusted statistical data X X X X X X X X X X XEfficiency gains X X XExpand portfolio of statistics X X X X X X X X XIntegration of big data in statistical production process X X X X X X X X X X

Quality framework for big data X X

Quality framework for smart statistics X X

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 39 / 43

Use of privately held data for public purposes X X X X X X X

Development of data economy X X X X X

Creation of digital single market X X X X X

Synergies for data, information and knowledge management

X X X X

Statistical data for SDGs X X XPrivacy X X X X X X XData security X X X X X X XConfidentiality X X X X X X X XAccess to new data sources X X X X X X

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 40 / 43

Smart Statistics & Big Data Date: 16/01/2018 Version: 3.0 41 / 43