Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute...

25
Expert Group Meeting Management of Statistical Information Systems (MSIS) 2014 1416 April 2014, Manila, PH Challenges, Opportunities and Issues Challenges, Opportunities and Issues on on Using Using BIG DATA BIG DATA for for Meeting Current and Emerging Meeting Current and Emerging Demands on Demands on Demands on Demands on Measuring Progress and Measuring Progress and Development Development Jose Ramon G Albert Ph D Jose Ramon G. Albert, Ph.D. Senior Research Fellow Philippine Institute for Development Studies (PIDS) E il j lb t@ il id h j lb t@ il Email: jalbert@mail.pids.gov.ph ; jrgalbert@gmail.com Former Secretary General Philippine National Statistical Coordination Board (NSCB) 20142015 President Philippine Statistical Association Inc. (PSAI)

Transcript of Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute...

Page 1: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

Expert Group MeetingManagement of Statistical Information Systems (MSIS) 2014 

14‐16 April 2014, Manila, PH

Challenges, Opportunities and Issues Challenges, Opportunities and Issues on on UsingUsing

BIG DATA BIG DATA for for Meeting Current and Emerging Meeting Current and Emerging

Demands on Demands on Demands on Demands on Measuring Progress and Measuring Progress and DevelopmentDevelopment

Jose Ramon G Albert Ph DJose Ramon G. Albert, Ph.D.Senior Research Fellow

Philippine Institute for Development Studies (PIDS)E il j lb t@ il id h j lb t@ ilEmail: [email protected][email protected]

Former Secretary GeneralPhilippine National Statistical Coordination Board (NSCB)pp ( )

2014‐2015 President Philippine Statistical Association Inc. (PSAI)

Page 2: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

Outline of the PresentationOutline of the PresentationMSIS 2014, Manila, PH

I. Public Policy, Statistics Dev’t & the Data Revolution

• Production of official statistics a complex process

II Traditional Data Sources in OfficialII. Traditional Data Sources in Official Statistics vs BIG DATA

Characteristics of Official Statistics• Characteristics of Official Statistics

• Characteristics of BIG DATA

III. The Rise of BIG DATA

IV BIG DATA for Dev’tIV. BIG DATA for Dev t

V. BIG DATA: Big News or Big Deal?

VI. Future of Possibilities

Page 3: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

1. Public Policy, Statistics Dev’t & MSIS 2014

the Data Revolutionhil i i i f bli li d• While statistics are impt for public policy, and 

for monitoring progress in dev’t goals (e.g., nat’l devt plans, MDGs), current statistics need re‐engineering for post 2015 Dev’tAgenda: – unfinished agenda from the MDGs 

• reaching Zero (Poverty)  • “Leaving No One Behind.”

– Data Revolution: High Level Panel Report on Post 2015 Agenda

– Open Working Group (OWG) on Sustainable DevtGoals (SDGs)  314‐16 April 2014, Manila, PH

Page 4: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

1 1 Production of Official Statistics

MSIS 2014

1.1. Production of Official Statistics

( ffi i l) i i d i i l• (Official) Statistics production is a complex process, initiated by demand (data users),  but constrained by resources (people & budgets)– In a National Statistical System, a number of govtoffices produce (official) statistics from  primary data collection and/or compilation of data (even i t li d NSS)in a centralized NSS)

– Data sources: censuses, sample d i i t ti ti tsurveys, administrative reporting systems

– Quality and timeliness of statistics aggregates d d th d d li f d tdepend on methods, and suppliers of data. 

414‐16 April 2014, Manila, PH

Page 5: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

2. Traditional Data Sources in MSIS 2014

Official Statistics vs BIG DATAOfficial Statistics BIG DATAOfficial Statistics BIG DATA1. Structured and planned product

1. Unstructured unfiltered “data exhaust”, i.e., by‐product of digitalp , , y p gproducts (transactions, web, social media)

2. Methodological and clear concepts 

2. Poor analytics

3 Regulated 3 Unregulated3. Regulated 3. Unregulated

4. Macro‐level ; based on high volume primary data

4. Micro‐level huge volume with high velocity (or frequency) and varietyvolume primary data velocity (or frequency) and variety

5. High cost 5. Generally little, or no cost

6. Centralized; point in time 6. Distributed; real‐time

5

6. Centralized; point in time 6. Distributed; real time

14‐16 April 2014, Manila, PH

Page 6: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

2 1 Characteristics of Official Statistics

MSIS 2014

2.1. Characteristics of Official Statistics

• Focus on precision and accuracy over timeliness and other quality issuesq y

• Structured design in data collection vs Big Data’s 3V’s : (high) volume velocity varietyData s 3V s : (high) volume, velocity, variety

• Tried and test methods of data collection for yielding “credible” statistics (based on representativeness of data)representativeness of data) 

614‐16 April 2014, Manila, PH

Page 7: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

2 2 Characteristics of BIG DATA

MSIS 2014

2.2. Characteristics of BIG DATA

i f d i i f h i i• tsunami of data arising from the increasing capacity to collect, store, retrieve, use and re‐use d tdata. – In 2012, data was reported to double every 40 months 

i th 1980 ith b t 2 5 i tilli (2 5 1018)since the 1980s, with about 2.5 quintillion (2.5 x 1018) bytes of data being created per day.

• data exhaust from electronic gadgets internet• data exhaust from electronic gadgets, internet search/social media and sensors Help Increase Public Need to “Know in (Real) Time”p ( )

• 3V’s : (high) volume, velocity, variety

714‐16 April 2014, Manila, PH

Page 8: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

3 The Rise of BIG DATA

MSIS 2014

3. The Rise of BIG DATA

• Rise of BIG DATA undoubtedly due to – Increasing use of mobiles, and of internet:g ,

• In the PH, internet penetration reached 36% in 2012 from 2% in 2000. As of 2012, there were already 102 mobile subscribers per 100 persons  in the PH

160.00180.00

Cambodia70.0080.00 Cambodia

Indonesia

40 0060.0080.00

100.00120.00140.00 Indonesia

Lao P.D.R.

Malaysia

MyanmarTimor‐Leste10.0020.0030.0040.0050.0060.00

Indonesia

Lao P.D.R.

Malaysia

Myanmar

0.0020.0040.00

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

Philippines

Singapore

Thailand

Timor‐Leste

Cambodia

Myanmar0.00

2000

2002

2004

2006

2008

2010

2012

P f I di id l

Philippines

Singapore

Thailand

Timor‐Leste

8

Mobile‐cellular telephone subscriptions per 100 inhabitants

Timor Leste

Viet NamPercentage of Individuals 

using the Internet  Viet Nam

14‐16 April 2014, Manila, PH

Page 9: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

3 The Rise of BIG DATA

MSIS 2014

3. The Rise of BIG DATA

M ki Gl b l I tit t (2011) i di t• Mckinsey Global Institute (2011) indicate 3 factors contributing to BIG DATA: 

1 & l t i d t i t t d t1. sensors & electronic gadgets interconnected to computing resources; 

2 availability of data on public domain (esp social2. availability of data on public domain (esp. social media), 

3. suitable technologies esp. statistical models and3. suitable technologies esp. statistical models and methods for data mining. – In Q2 2008, 3 out every 4 Internet surfers used ‘‘Social Media’’ (a 

significant rise from 56% in 2007)significant rise from 56% in 2007). – In the PH, as of Feb 2013, about 30.2 million Filipinos were on 

Facebook (making the PH have one of the highest Facebook penetration in ASEAN, next only to IND).

9

penetration in ASEAN, next only to IND).  – As of July 2012, there were around 9.5 million Twitter users in PH.

14‐16 April 2014, Manila, PH

Page 10: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

3 The Rise of BIG DATA

MSIS 2014

3. The Rise of BIG DATA

• BIG DATA harnessed by businesses – Amazon uses customer database creatively: y

• “customers who bought Product A also bought Product B, Product C or Product D …” based on a predictive modeling (association rules and collaborative filtering) 

– Social media, such as tweets on Twitter, are now examined in terms of  “polarity” (i.e., positive, negative, or neutral) of expressed ( p g ) psentiments on a product, such as a movie.  

– Public Wanting to “Know in (Real) Time”

10

Public Wanting to  Know in (Real) Time  

14‐16 April 2014, Manila, PH

Page 11: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

4 BIG DATA for Development

MSIS 2014

4. BIG DATA for Development

• Frontiers of BIG DATA beyond business applications: pp

– Health Surveillance: Google Flu Trends (J. Ginsburg et al, Nature , 200) , , )

1114‐16 April 2014, Manila, PH

Page 12: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

4 BIG DATA for Development

MSIS 2014

4. BIG DATA for Development

• Other Successes with BIG DATA―Beyond Health: Correlation of

S l D t d G l S h— UN Global Pulse reports of

work by Pulse Laboratory inSales Data and Google Search shown in “Predicting the Present with Google Trends” (Choi &

work by Pulse Laboratory in Jakarta relating about “rice” on Twitter with actual price

Varian, April 2009) of rice (Letouze, 2012)

1214‐16 April 2014, Manila, PH

Page 13: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

4 1 BIG DATA Applications in DRMM

MSIS 2014

4.1. BIG DATA Applications in DRMM

• Tracking Population Movements with Digital Traces from Mobile Phone Usage Potentials to examine people’s movements in the wake of 

disasters

Note that in PH there is concern that natural disasters are the new threat to development.   From 1970‐2012, there were 497 natural disasterswere 497 natural disasters 53% storms

» Over 34,000 deaths,

25% floods

» Incidence more than tripled

1314‐16 April 2014, Manila, PH

Page 14: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

4 1 BIG DATA Applications in DRMM

Why the interest in Climate Disasters?MSIS 2014

Tracks of Tropical Cyclones in the Western North PacificPeriod: (1948 2010)

4.1. BIG DATA Applications in DRMM

Period: (1948‐2010)

Visited by an averagey g19 to 20 Tropical cyclones

EVERY YEAR

Tracks of tropical cyclones that formed in the Western North Pacific (WNP) during the periodp y ( ) g p1948‐2010 (1641 TC and 1154 or 70% entered or formed in the Philippine Area of Responsibly(PAR) (Data used: JMA Data set)

14‐16 April 2014, Manila, PH

Page 15: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

4 1 BIG DATA Applications in DRMM

MSIS 2014

4.1. BIG DATA Applications in DRMMTechnologies Used by Project NOAHProject NOAH

Measures- Wind Speed- Wind direction- Air Temperature- Air humidity- Air pressure- Rain amount, duration &

Doppler Radarduration & intensityAutomated

Rain Gauge (ARG)

Automated Weather

Measures amount of rainfall over

Station (AWS)

a period of time

15

Stream Gauge14‐16 April 2014, Manila, PH

Page 16: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

4 1 BIG DATA Applications in DRMM

MSIS 2014

4.1. BIG DATA Applications in DRMMTechnologies Used by Project NOAH LIDARLIDARProject NOAH LIDARLIDAR

Light Detection & gRanging

Used to provide phigh‐resolution maps of landsmaps of lands 

and water bodies

1614‐16 April 2014, Manila, PH

Page 17: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

4 1 BIG DATA Applications in DRMM

MSIS 2014

4.1. BIG DATA Applications in DRMMProject NOAH Projections on CycloneProject NOAH Projections on Cyclone Haiyan 3 Days Before It Hit

1714‐16 April 2014, Manila, PH

Page 18: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

4 1 BIG DATA Applications in DRRMM

MSIS 2014

For “normal” climate disasters some successes

4.1. BIG DATA Applications in DRRMM

For normal climate disasters, some successes Some local govts, e.g. province of Albay, have

excellent disaster preparedness and have generallyexcellent disaster preparedness and have generallymet targets for zero casualties

Some areas in Mindanao that were unpreparedh i d dhave improved preparedness

• 676 deaths in CDO due to Sendong in 2011

• 1 death in CDO due to Pablo in 2012

… But for “extreme” events, rising toll and costs Super typhoon Yolanda (Haiyan) : 8201

deaths, about 0.9Billion USD in damages, 18 Millionpersons affected, 1.1 Million houses affectedp ,

1814‐16 April 2014, Manila, PH

Page 19: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

5 BIG DATA: Big News or Big Deal?

MSIS 2014

5. BIG DATA: Big News or Big Deal?

• Official statisticians taking note of BIG DATA, but with some degree of caution, as gbigger does not always mean better.

"[Big Data] is [wrongly] seen as a cure-all[Big Data] is … [wrongly] seen as a cure all, …. Chris Anderson…wrote in 2008 that … sheer volume of data would obviate … need for … scientific method…. [T]hese views are badly mistaken. ... If the quantity of information is increasing by 2.5

i tilli b t d th t f f lquintillion bytes per day, the amount of useful information almost certainly isn't. Most of it is just noise and the noise is increasing faster than thenoise, and the noise is increasing faster than the signal.” – Nate Silver, The Signal and the Noise

1914‐16 April 2014, Manila, PH

Page 20: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

5 BIG DATA: Big News or Big Deal?

MSIS 2014

5. BIG DATA: Big News or Big Deal?

P i M h f Bi D t b i t d• Privacy: Much of Big Data being generatedincludes personal information. Precise, geo-location-based information pushes boundarylocation-based information pushes boundaryof confidentiality/privacy.

“Big Brother is watching you!“- George Orwell, 1984 Amazon, Visa, Mastercard watching our shopping

preferences

g g y g ,

Google watching our browsing habits Twitter watching what’s on our mindsg Facebook watching various info, including our

social relationships Mobile providers listening to our conversations

2014‐16 April 2014, Manila, PH

Page 21: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

5 BIG DATA: Big News or Big Deal?

MSIS 2014

5. BIG DATA: Big News or Big Deal?

“N ti d C t” f U• “Notice and Consent” of Users Users giving “informed consent” to unknown use?

When Google Flu Trends was developed, did Googlehave contact all its users for approval to use old searchqueries for this project?

–Should users be asked to agree to any possible future useof their data?

Other ways to protect privacy, but imperfect: Opting out (but this can leave a trace) Anonimization (but “re-identification” still possible)

Legal issues, incl. sharing info with NSSs

2114‐16 April 2014, Manila, PH

Page 22: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

5 BIG DATA: Big News or Big Deal?

MSIS 2014

5. BIG DATA: Big News or Big Deal?

G i i l it ( d t)• Gains in velocity (and cost)over sacrificing precision andaccuracy i e Big Data mayaccuracy, i.e. Big Data maynot be completelyaccurate, but is thought of as, g“good enough.”

• But how good is “goodBut how good is goodenough?”

– Jan 2013 Google Virus Trends ofJan 2013 Google Virus Trends offlu levels in the US (11%) isalmost double the CDC’sestimate of about 6%)estimate of about 6%).

2214‐16 April 2014, Manila, PH

Page 23: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

5 BIG DATA: Big News or Big Deal?

MSIS 2014

5. BIG DATA: Big News or Big Deal?

P di ti A l ti G WildPredictive Analytics Gone Wild • Perilously Predicting Future Crime and Punishing Future

C i i l (i th i “Mi it R t”Criminals (in the movie “Minority Report” Parole boards in US using “predictions” from data

analysis for parole decisionsy p City of Memphis, Tennessee uses Blue CRUSH (Crime

Reduction Utilizing Statistical History) to concentratepolice resources in a specific area at a specific timepolice resources in a specific area at a specific time.(Crimes fell by a quarter from CRUSH inception in2006, but due to CRUSH???)

US Dept of Homeland Security uses FAST (Fture US Dept of Homeland Security uses FAST (FtureAttribute Screening Technology) to identify potentialterrorists (Reportedly 70% accurate ??? )

2314‐16 April 2014, Manila, PH

Page 24: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

6 Future of Possibilities

MSIS 2014

6. Future of Possibilities

• BIG DATA is here to stay 

• … But its use is not the end of official statistics… But its use is not the end of official statistics

• Ways forward: Legal Protocols and Institutional Arrangements

(Private Public Partnerships) for Access to BIGDATA holdingsDATA holdings

Addressing Privacy Issues with BIG DATAI t i C it B ildi t H BIG DATA Invest in Capacity Building to Harness BIG DATA

• Official Statistics community using its experience to identify “signals” within “noise” ; to certify quality; to decipher truthsignals  within  noise  ; to certify quality; to decipher truth from falsehood

2414‐16 April 2014, Manila, PH

Page 25: Jose Ramon G Albert Ph DPh.D. - Homepage | ESCAP...• MkiMckinsey Gl b lGlobal ItittInstitute (2011) idi tindicate 3 factors contributing to BIG DATA: 1. sensors & elt ilectronic

MSIS 2014

END Salamat sa inyong pakikinig. (Thank you for your attention)(Thank you for your attention)

Philippine Institute for Development StudiesSurian sa mga Pag‐aaral Pangkaunlaran ng Pilipinas

/PIDS.PH

@PIDS PH@PIDS_PH

http://www.pids.gov.ph

2514‐16 April 2014, Manila, PH