Olli big data_andai

55
LITTLE Issues with big Data and AI Jim Isaak 2015 SSIT Vice President 2010 Computer Society President Nov. 2017 v2

Transcript of Olli big data_andai

Page 1: Olli big data_andai

LITTLE Issues withbig Data and AI

Jim Isaak

2015 SSIT Vice President

2010 Computer Society President

Nov. 2017 v2

Page 2: Olli big data_andai

Society on Social Implications of Technology

What’s coming?

A quick history of how big data is, and why the 21st century is not the same as the previous millennium

And then some of the “So What?”

– Challenges the public needs to consider

– That technologists need to consider

– That Policy makers need to consider

– But first, a word from our Sponsor:

11/20/20172

Page 3: Olli big data_andai

Society on Social Implications of Technology

Impacts of Technology on Society

www.IEEESSIT.org

11/20/20173

Page 4: Olli big data_andai

Society on Social Implications of Technology

When I was a boy … (1972)

Computers typically had 32k bytes of RAM

And 2.5 MB disk drives

And took forever to do things we consider common place now

Moore’s Law – double in density/2 yrs(speed, and ½ price)

11/20/20174

Page 5: Olli big data_andai

Society on Social Implications of Technology

But now…

Intel’s latest “desktop” chip is 4GHz(1,000,000,000,000 faster than my 1970’s system)

Consider a person walking at 4 mph

now 6x the speed of light

My local storage has gone from two novels

to the Library of Congress

11/20/20175

Page 6: Olli big data_andai

Society on Social Implications of Technology

Bytes per (8bits):

11/20/20176

Item Bytes

Short novel 1 Megabyte1,000,000

A pickup truckFilled with books

1 Gigabyte1,000,000,000

The Library of Congress – print collection

10 Terabytes10,000,000,000,000

Note: storage is measured in Bytes,

Communications in bits …

“Broadband” network typically 500 kilobits and up

50,000 bytes – or 20 seconds per book

Page 7: Olli big data_andai

Society on Social Implications of Technology

A comparison in human terms

It takes me six seconds to get an ingredient from the frig (load from RAM)

(Ideally a single cycle for a processor)

For a 4GHz processor, Rotational delay2ms => 1.5 years!

For seek time plus rotation delay is4ms => 4.5 yrs and 100ms =>77 yrs

How many items can I get from the frig while waiting for one from “the store”

(and don’t even consider net latency!)

11/20/20177

Page 8: Olli big data_andai

Society on Social Implications of Technology

The new challenge/limitation

Watts –

– Power requirements

– And BTU’s of heat generated by thousands of processors/disk drives

It’s why Google, Amazon, et al are placing data centers near hydro power & cooling options ..or at least cheap power

And – how are you going to use that much stuff?

11/20/20178

Page 9: Olli big data_andai

Society on Social Implications of Technology

An example – Bluffdale Utah, NSA

65 MegaWatts(1 MW – 600+ homes)(200MW Kennecott Utah Copper)

Aug. 2016 water use6.6M Gallons for cooling

11/20/20179

Est 3-12 Exabytes3,000,000,000,000,000

(i.e. one 64bit processor of

address space)

Page 10: Olli big data_andai

Society on Social Implications of Technology

The 21st century realization(Google, et al)

1. All data has value – and you don’t know what will be useful in the future(Buffdale center: storing pocket litter)

2. Critically missing in traditional systemsfault tolerance, massive scalabilitymalleable schema’s, flexible queries

=> Community Development e.g. Hadoop

11/20/201710

Page 11: Olli big data_andai

Society on Social Implications of Technology

Going Viral(getting Real- Time)

“Google Flu Trends appeared to detect regional outbreaks of influenza 7–10 days before conventional Centers for Disease Control and Prevention surveillance systems” Clinical Infectious Diseases (2009) doi: 10.1086/630200

Simple concept: track search trends relate to symptoms, relate to location, identify potential hot spots. – this specific concept has been picked up with more focused algorithms applied

The Google experience did not work as well as might be desired– big data hubris

An associate of mine was researching social media streams to track potential ‘hot spots’ for civil unrest, terrorism, etc. She now works for NSA

….. Now Trending….11/20/201711

Page 12: Olli big data_andai

Society on Social Implications of Technology

Patients Like Me

“We're unleashing the power of data for good by empowering people to take control of their health because we believe real-world evidence can change the healthcare system”

Can trigger “instant” medical studies based on 400,000+ participants with 2500+ medical conditions

Lithium, Bi-Polar and ALS – 16 patients in journal article – but PLM found 69 in a day.

https://www.ted.com/talks/jamie_heywood_the_big_idea_my_brother_inspired

11/20/201712

Page 13: Olli big data_andai

Society on Social Implications of Technology

Now let’s see “Applications”

Summer 2016 CEO Cambridge Analytica(11 minutes)

CEO of Cambridge Analytica March 17(30 minutes)

Composite from these two presentations (25 Min) (not online)

11/20/201713

Page 14: Olli big data_andai

Society on Social Implications of Technology

Election 2016

Democratic DB – every voter, likelihood of voting, feedback from surveys on candidate preferences … do everything you can to get the expected supporters out.

Trump “Project Alamo” w/ Cambridge Analytica: Facebook “psych” survey + profiles + external data on 220,000,000 Americans w/4000+ data points each “voter registration records, gun ownership records, credit card purchase histories, and internet account identities”=>personally targeted ads to either:

– Gain support (funding, voting)

– Suppress turnout of targeted groups

11/20/201714

Page 15: Olli big data_andai

Society on Social Implications of Technology

What Data Sources?

Facebook profile

OCEAN like personality test

Credit Cards

Credit Record

Browser Searches

Email “terms”

Church attendance

CATV viewing

Car registration

Home ownership

Magazine subscriptions

11/20/201715

“and the beat goes on…”

Page 16: Olli big data_andai

Society on Social Implications of Technology

OCEAN personality analysis

Openness, which refers to how readily an individual will

take on new experiences or acceptance of non-conventional ideas, levels of creativity …

Conscientiousness, which applies to attention to

detail, vigilance, organization and a desire to complete a

Extraversion, which relates to assertiveness,

enjoyment of human interactions and risk-taking.

Agreeableness, which tends to be indicative of co-

operation, kindness and consideration for others.

Neuroticism, which relays levels of anxiety, ability to

deal with stress and maintaining calmness under pressure.

11/20/201716

Page 17: Olli big data_andai

Society on Social Implications of Technology

“They [the Trump campaign] were using40–50,000 different variants of ad every day that were continuously measuring responses and then adapting and evolving based on that response,” – Martin Moore, director of Kings College’s Centre for the

Study of Media, Communication and Power, told The

Guardian in early December.

11/20/201717

Page 18: Olli big data_andai

Society on Social Implications of Technology

Predictive analysis

Predictive analysis: finding and quantifying hidden patterns in the data using complex mathematical models that can be used to predict future outcomes.

“Amazon customers like you ….”

Think “Minority Report” … without the prescient mediums

11/20/201718

Page 19: Olli big data_andai

Society on Social Implications of Technology11/20/201719

Page 20: Olli big data_andai

Society on Social Implications of Technology11/20/201720

Page 21: Olli big data_andai

Society on Social Implications of Technology11/20/201721

Page 22: Olli big data_andai

Society on Social Implications of Technology11/20/201722

Page 23: Olli big data_andai

Society on Social Implications of Technology

From the man who “Liked” OCEAN

Dr. Michal Kosinski found that just a few facebook “Likes” could match you to your OCEAN profile with high probability. 3 Million Facebook Profiles (1/1000)

10+ and you know a personality as well as their co-workers

100+ family/friends

250+ you know them better than their spouse

• Michael’s Keynote on Privacy

11/20/201723

Page 24: Olli big data_andai

Society on Social Implications of Technology

Every friend you “like”

Sexual orientation 88%

Gender, political views, race (95%)

Age, IQ,

Birds of a feather – friends like friends

∑ trivial data points => non-trivial

– Facebook + credit-card + search…

Also language use …

11/20/201724

Page 25: Olli big data_andai

Society on Social Implications of Technology

To summarize

Your face may disclose:Humans can do gender, age, introvert, …

Political views, sexual orientation

Gay, liberal, atheism – capital crimes some places

5 pictures sufficient to get ‘gay’ at 92%

Also captured:

– Location data, continuous

– Sensors – heart rate

11/20/201725

Page 26: Olli big data_andai

Society on Social Implications of Technology

I fed a sample from my web page into the Cambridge tool

Test 1 – 10yr old text

22 yr old male

89% liberal

69% hard working

19% contemplative

51% team oriented

22% laid back

60% leader potential

INTJ “Jungian style”

Test 2 - Recent text

30 year old male

38% conservative

67% hardworking

27% contemplative

35% competitive

22% laid back

34% leader potential

ISTJ style

11/20/201726

Page 27: Olli big data_andai

Society on Social Implications of Technology

Save the Rhinos

Noseong Park, Edoardo Serra, andV.S. Subrahmanian document their predictive analytics software to save rhinosIEEE Intelligent Systems, August 2015

Tracking, and then predicting rhino movement, and poacher movement can help target drone and ranger patrols to save more rhinos

Ends with the caveat that your rhinos may differ

11/20/201727

Page 28: Olli big data_andai

Society on Social Implications of Technology

What if?

We used all of the Cambridge Analyticaand other available data …

And analyzed which persons were most likely to:

– Commit suicide (most common form of gun violence)

– Attack a church congregation

– Initiate a terrorist attack

“Subject 47 has bought 3 assault rifles in the last week and 300 clips of ammo”

What would/should we do?

11/20/201728

Page 29: Olli big data_andai

Society on Social Implications of Technology

AI – coming of age

Less than “the movies” view, Butmore than folks expect

Past the tipping point, so it’s hard to see where it can lead

11/20/201729

“Alexa …”

Page 30: Olli big data_andai

Society on Social Implications of Technology

Recent in AI: Deep Learning

Watson has spoken

– It’s not just a game show any more

– It’s natural language in context

– It’s open ended responses to open ended questions (Siri, Hello Barbie etc.)

And the AI folks are on board

– Deep Learning to go beyond understanding data to modeling “you”

Prof. Pedro Domingo’s , UW in his book “The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World”

11/20/201730

Page 31: Olli big data_andai

Society on Social Implications of Technology

Open Source Tools Emerging

For big data manage

Analytics

AI methods

And emerging open-data sources

“Data wants to be free”

=> Letting a thousand flowers bloom

11/20/201731

Page 32: Olli big data_andai

Society on Social Implications of Technology

Bit Rot

Vint Cerf, “father of the Internet”raises the concern

Consider the media (floppy disc), and the associated reading device(s), and the encoding technique (PC-DOS files in ASCII with data for WordStar) and the required environment (DOS 2.0)

Will we be able to access the data?

11/20/201732

Page 33: Olli big data_andai

Society on Social Implications of Technology

Provenance

Credibility – or is it just the number of times the lie is re-told?

– This is one rationale for ‘citations’ in academic literature

– And for “reproducibility” in the scientific method … But

For Big Data can there be quality control, authority, chain of evidence, credible source, validation..???

There will be “data jamming” attacks

11/20/201733

Page 34: Olli big data_andai

Society on Social Implications of Technology

The Right to be Forgotten

1998 the Spanish newspaper La Vanguardia published an announcement regarding the forced sale of properties

A property belonged to Mario Costeja González, who was named

In 2009, Costeja contacted the newspaper to complain that when his name was entered in the Google search engine it led to the announcements

In 2010 …

11/20/201734

Page 35: Olli big data_andai

Society on Social Implications of Technology

Jurisdictions

He took his concerns to the Spanish Agency of Data Protection

From there it went to the EU Advocate General

Then to the EU Court of Justice

Google’s online form for EU citizens or EFTA nationals to request the removal of links if the data linked is "inadequate, irrelevant or no longer relevant, or excessive in relation to the purposes for which they were processed“ 2014

11/20/201735

Page 36: Olli big data_andai

Society on Social Implications of Technology

POP Quiz

Can you name two politicians who would like some of their history “forgotten”

Or more challenging, can you name one who would not like this to happen?

11/20/201736

Page 37: Olli big data_andai

Society on Social Implications of Technology

The Proxy Did It!

O’Neil, Cathy; Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy; Random House, 2016 (also Discover Mag, Oct 2016 issue)

“models and algorithms encode human prejudice”

11/20/201737

Page 38: Olli big data_andai

Society on Social Implications of Technology

E-Scores vs FICO

Fair Isaac credit scores are based on YOUR personal financial history,But cannot be used in sales/marketing(just hiring, promotions, loans, etc.)

eScores are proxies for FICO in some ways, matching you into “buckets” and affecting YOUR job, credit, even your time on hold to get service

“e-scores are arbitrary, unaccountable, unregulated, and often unfair” (O’Neil)

11/20/201738

Page 39: Olli big data_andai

Society on Social Implications of Technology

Time on hold????? ???

Managing call center traffic

(“please dial 1 if you are rich, dial 2 to go on hold, dial 3 to talk to someone in India, and 4 if you just would like to dial in more numbers.”)

Ditto for web credit card web sites – before you even “look” your browsing and purchasing patterns are being evaluated.

These may not be your Friend

“People Like You…” (zip, job, search…)

11/20/201739

Page 40: Olli big data_andai

Society on Social Implications of Technology

e_Scores to “Score” ?

CreditScoreDating.com “at least the customers know what they are getting into and why” (O’Neil)

Job Applicants are “researched” on the web BEFORE any contact from the company – eScores, Facebook, etc.

“The law stipulates employers must alert job

seekers when credit issues disqualify them…”

(O’Neil) Right….

11/20/201740

Page 41: Olli big data_andai

Society on Social Implications of Technology

Show of Hands:

Your wait time should be based on e-score proxies --- folks like you….

Your wait time should be based onspecific knowledge about you …(“tends to complain a lot, let him wait a bit longer, play the subliminal message tape”)

11/20/201741

Page 42: Olli big data_andai

Society on Social Implications of Technology

How Big is BIG?

Microsoft and U. Washington have developed a system to store binary data in DNA sequences.

– All of the data on the 2016 Internet could fit into a shoe box

– Much lower energy, less risk of bit rot, but … right now, real slow read/write times

Oct 28, 2016 WSJ insert “Fast Forward Tech”

• Memristor’s (HP/SanDisk term)/ReRAMNot this high of density, but significantly faster, denser and lower power than SDRAM.

Jan 2017, IEEE Consumer Electronics Magazine

11/20/201742

Page 43: Olli big data_andai

Society on Social Implications of Technology

Opting Out

https://www.privacyrights.org/

http://www.stopdatamining.me/opt-out-list/

11/20/201743

Page 44: Olli big data_andai

Society on Social Implications of Technology

From the SSIT Blog

An Asian firm, “Deep Knowledge” has appointed a virtual director to their Board. In this case it is a construct designed to detect trends that the human directors might miss.

One suspects that Apple might want a model of Steve Jobs around for occasional consultation, if not back in control again

11/20/201744

Page 45: Olli big data_andai

Society on Social Implications of Technology

AI and Ethics

The Partnership on AI Ethicshttp://www.partnershiponai.org/

IBM, Google, Microsoft, Amazon, Facebook

IEEE Standards – Autonomous Systems Ethicshttp://standards.ieee.org/news/2016/ieee_autonomous_systems.html

11/20/201745

Page 46: Olli big data_andai

Society on Social Implications of Technology

Resources

http://www.bigbrotherawards.org/(European – Privacy International)

“Saving Rhinos with Predictive Analytics” IEEE

Computer Society “Edge”

IEEE Computer Magazine, April 2016Special Issue on Big Data

http://bigdata.ieee.org/https://sites.google.com/site/io/underneath-the-covers-at-google-current-systems-and-future-directions

https://applymagicsauce.com/ Cambridge Univ. Evaluation tool

11/20/201746

Page 47: Olli big data_andai

Society on Social Implications of Technology

SSITIEEE’s Forum for Academic, Practical and Policy dialog

on the Impact of Technology on Society

Engineers and Technologists who care about how

their products, discoveries, and services will

affect humanity

• Conferences world wide

• Quarterly publication

• Ongoing social media interactions

• Perennial issues to consider as technology happens

Major topics include:

Privacy, Security, Health, Ethics, Equity, Quality of Life

As affected by technology such as:

NanoTech, Genomics, networks, computing, RFID, drones

47 11/20/2017

Page 48: Olli big data_andai

Society on Social Implications of Technology

Social Media – Public Dialog

Blog and comments

LinkedIn Group

Facebook Group

YouTube Channel

Twitter

WWW.IEEESSIT.ORG

11/20/201748

Page 49: Olli big data_andai

Society on Social Implications of Technology

Questions?

Answers???

Thank You

11/20/201749

Page 50: Olli big data_andai

Society on Social Implications of Technology11/20/201750

Page 51: Olli big data_andai

Society on Social Implications of Technology

Alpha:the first step towards Omega

1992: DEC introduces Alpha, the first 64 bit commercial computer chip …

64 bits can directly address 16EB (Exabytes, 16 Billion GB) of “real” memory .. And Alpha was the fastest chip – so could seriously index lots of data

The Alpha App: Altavista – 1995 the first web index

1997- IBM introduces 16GB disk array

11/20/201751

Page 52: Olli big data_andai

Society on Social Implications of Technology

Donald Knuth, Stanford

Volume 3 (first ed. 1973)

Sorting and Searching, Second Edition (Reading, Massachusetts: Addison-Wesley, 1998), xiv+780pp.+foldout.ISBN 0-201-89685-0

Advisor and mentor to two students:Larry Page and Sergey Brin decided to implement a full version – 1998They call it “Google”

11/20/201752

Page 53: Olli big data_andai

Society on Social Implications of Technology

A side note on performance

Computer cycle times from MIPS to GIPS (instructions per second)

Disk rotation latency (half turn average)

Seek Time (1/3 of disc surface average)

Solid State Drives change the game again

Add DNA and Intel’s new chip 3Dxxx?

11/20/201753

4,000 RPM 7.14 ms 7 million Instructions

15,000 RPM 2 ms 2 million Instructions

100ms 100 million instructions

4ms 4 million instructions

Page 54: Olli big data_andai

Society on Social Implications of Technology

Emerging “Tricks”

For highly compact storage (not fast)

DNA tools are being developedmassive storage – slow access

• Intel 3-D “Optane” memory:

• Pushing “flash ram” capabilities into higher speed, more dense devices

• Data tools are now doing “memory first” operations, expecting terabytes of RAM

11/20/201754

Page 55: Olli big data_andai

Society on Social Implications of Technology

Data for Good “movement”

DataKind.org

– Harnessing the power of data science in the service of humanity

– DataKind is a unique way to build your skills and network with top data scientists around the world

The Data for Good Exchange is part of a long Bloomberg tradition of advocacy for using data science and human capital to solve problems at the core of society

11/20/201755