Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist...

11
Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden

Transcript of Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist...

Page 1: Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden.

Two Paradigms for Official Statistics Production

Boris Lorenc, Jakob Engdahl and Klas Blomqvist

Statistics Sweden

Page 2: Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden.

Preliminaries

• The talk concerns data and knowledge about external world – not data and knowledge about producing statistics (but might have consequences for the latter)

• Inspired by the different discussions on ongoing developments and initiatives within (official) statistics

• May have certain relevance for editing• Naturally, the views presented herein are those of

the authors, not necessarily reflecting policies of Statistics Sweden

Page 3: Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden.

Preliminaries (cont’d)

• Transition from (many) Stovepipes to (few) Integrated System(s)

• Among intended goalsi. better integration of administrative data and survey

data,

ii. better/faster response to new or changing user needs

• How an integrated system should look like so as to satisfy these requirements• answer sought in the field of knowledge

systems/cognitive systems

Page 4: Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden.

Agenda

I. Preliminaries

II. On some distinctions and results regarding knowledge/cognitive systems

III. Consequences for representing data in Integrated systems for statistics production

IV. Further considerations for statistics methodology, including some thoughts regarding editing

Page 5: Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden.

Knowledge/Cognitive Systems

• Computational• symbolic

• first-order predicate logic• other formal logic• etc

• subsymbolic• artificial neural networks (ANNs)• etc

• Other (noncomputational)• embodied cognition• situated cognition• socially distributed cognition• etc

Good for restricted domains with clear rules (e.g. chess), less good for open-world problems

Page 6: Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden.

Database developments

• Relational Model• RDBMS (Relational Database Management System)

• implements first-order predicate logic• database schema: theory in predicate calculus

• NoSQL• schema-less (theory-less)• examples

• Google‘s BigTable• solutions underlying some functions on Amazon, Twitter, and

Facebook

• Perhaps related: Semantic Web• how to structure documents into a “web of data”• “a web of data that can be processed directly and

indirectly by machines”• uses Resource Description Framework (rather than RDBMS)

Page 7: Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden.

Consequences

• Paradigm I: Stovepipe + RDBMS• ‘manual’ management of a fairly restricted domain• single-purpose use

likely requires expert assistance to users in search and requirements specification

• Paradigm II: Integrated system + noSQL• automatic building of world knowledge pertaining to

the domain• multi-purpose use

likely empowers users to themselves explore available data and consider merits of requiring new data

likely requires expert assistance to users in search and requirements specification

likely empowers users to themselves explore

available data and consider merits of requiring new data

Page 8: Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden.

Sampling theory considerations

• In the context of Paradigm II:• use of weights

• what should they then reflect:• inclusion probabilities (if known)?• nonresponse information (including an assumed model)?• auxiliary information pertaining to specific variables to be

estimated?

• use of models• memorylessness vs. Bayesian statistics

Page 9: Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden.

Editing

• Editing for a purpose vs. editing “without a purpose”• adherence to general specifications (‘concept

validity’)• self-learning (unsupervised) tools from computer

science/ANN• model congruence (especially building automatic

models using methods from the KDD (Knowledge Discovery and Data Mining) field

• more?

Page 10: Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden.

Conclusions

• The distinction likely not as clear-cut as presented here, however the trend discernible:• transition from “manual” to automatic processing• potential increased need to use models

• In building representations of “world knowledge”, in addition to RDBMS, pay attention to developments in NoSQL, Big Data, and similar

• Perhaps strengthen work on• general-purpose data editing• automated data editing• model use• ...

(as already advanced in several contributions to the workshop)

Page 11: Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden.

Thank you

[email protected]