Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist...
-
Upload
bernard-french -
Category
Documents
-
view
214 -
download
0
Transcript of Two Paradigms for Official Statistics Production Boris Lorenc, Jakob Engdahl and Klas Blomqvist...
Two Paradigms for Official Statistics Production
Boris Lorenc, Jakob Engdahl and Klas Blomqvist
Statistics Sweden
Preliminaries
• The talk concerns data and knowledge about external world – not data and knowledge about producing statistics (but might have consequences for the latter)
• Inspired by the different discussions on ongoing developments and initiatives within (official) statistics
• May have certain relevance for editing• Naturally, the views presented herein are those of
the authors, not necessarily reflecting policies of Statistics Sweden
Preliminaries (cont’d)
• Transition from (many) Stovepipes to (few) Integrated System(s)
• Among intended goalsi. better integration of administrative data and survey
data,
ii. better/faster response to new or changing user needs
• How an integrated system should look like so as to satisfy these requirements• answer sought in the field of knowledge
systems/cognitive systems
Agenda
I. Preliminaries
II. On some distinctions and results regarding knowledge/cognitive systems
III. Consequences for representing data in Integrated systems for statistics production
IV. Further considerations for statistics methodology, including some thoughts regarding editing
Knowledge/Cognitive Systems
• Computational• symbolic
• first-order predicate logic• other formal logic• etc
• subsymbolic• artificial neural networks (ANNs)• etc
• Other (noncomputational)• embodied cognition• situated cognition• socially distributed cognition• etc
Good for restricted domains with clear rules (e.g. chess), less good for open-world problems
Database developments
• Relational Model• RDBMS (Relational Database Management System)
• implements first-order predicate logic• database schema: theory in predicate calculus
• NoSQL• schema-less (theory-less)• examples
• Google‘s BigTable• solutions underlying some functions on Amazon, Twitter, and
• Perhaps related: Semantic Web• how to structure documents into a “web of data”• “a web of data that can be processed directly and
indirectly by machines”• uses Resource Description Framework (rather than RDBMS)
Consequences
• Paradigm I: Stovepipe + RDBMS• ‘manual’ management of a fairly restricted domain• single-purpose use
likely requires expert assistance to users in search and requirements specification
• Paradigm II: Integrated system + noSQL• automatic building of world knowledge pertaining to
the domain• multi-purpose use
likely empowers users to themselves explore available data and consider merits of requiring new data
likely requires expert assistance to users in search and requirements specification
likely empowers users to themselves explore
available data and consider merits of requiring new data
Sampling theory considerations
• In the context of Paradigm II:• use of weights
• what should they then reflect:• inclusion probabilities (if known)?• nonresponse information (including an assumed model)?• auxiliary information pertaining to specific variables to be
estimated?
• use of models• memorylessness vs. Bayesian statistics
Editing
• Editing for a purpose vs. editing “without a purpose”• adherence to general specifications (‘concept
validity’)• self-learning (unsupervised) tools from computer
science/ANN• model congruence (especially building automatic
models using methods from the KDD (Knowledge Discovery and Data Mining) field
• more?
Conclusions
• The distinction likely not as clear-cut as presented here, however the trend discernible:• transition from “manual” to automatic processing• potential increased need to use models
• In building representations of “world knowledge”, in addition to RDBMS, pay attention to developments in NoSQL, Big Data, and similar
• Perhaps strengthen work on• general-purpose data editing• automated data editing• model use• ...
(as already advanced in several contributions to the workshop)
Thank you