Big Data R&D InitiativeR&D Initiative - DELS Microsite...

35
Big Data R&D Initiative Big Data R&D Initiative Suzi Iacono CISE Directorate National Science Foundation National Academies of Science National Academies of Science Integrating Environmental Health Data to Advance Discovery January 11, 2013 Image Credit: Exploratorium.

Transcript of Big Data R&D InitiativeR&D Initiative - DELS Microsite...

Page 1: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Big Data R&D InitiativeBig Data R&D Initiative

Suzi Iacono CISE Directorate

National Science Foundation

National Academies of ScienceNational Academies of ScienceIntegrating Environmental Health Data to Advance Discovery

January 11, 2013

Image Credit: Exploratorium.

Page 2: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Advances in information technologies are transforming the

fabric of our society and data represents a transformative newrepresents a transformative new

currency for science, engineering, education and commerce.

Image Credit: CCC and SIGACT CATCS

Page 3: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Where do the data come from?

Why do we have a national initiative?Why do we have a national initiative?

Page 4: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

The Big Data Landscape I: Big Science

• Science gathers data at an ever-increasing rate across all scales and complexities of natural phenomenaSloan Digital Sky Survey in 2000 collected more data in its 1st• Sloan Digital Sky Survey in 2000 collected more data in its 1st

few weeks than had been amassed in the entire history of astronomy– Within a decade, over 140 terabytes of information

collected• Large Hadron Collider generates scores of petabytes a yearg g p y y• The proposed Large Synoptic Survey Telescope (3.3 gigapixel

digital camera) will generate 40 terabytes of data nightlyB 2015 th ld ill t th i l t f• By 2015, the world will generate the equivalent of approximately 93 million Libraries of Congress

Page 5: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

The Big Data Landscape II: Smart Sensing, Reasoning and Decision-making

Situation Awareness: Humans as

Emergency ResponseEnvironment Sensing

Percepts (sensors)

Agent (Reasoning)

Humans as sensors feed multi-modal data streams Credit: Photo by US Geological Survey 

Actions (controllers)

Pervasive     Computing 

People‐Centric Sensing Smart Health CareSocial           Informatics 

SenseEvaluate

Public Sensing

Social

Identify

Assess

Intervene

Source: Sajal Das, Keith Marzullo

Personal Sensing

Social Sensing

Page 6: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

The Big Data Landscape III: New Paradigms for Communications

1988

Today

1988Remarkable

Pace of Innovation

EMAIL VOIP

BLOGSMOBILE SOCIAL NETWORKS

VIDEO

Page 7: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Communications Volume & Traffic DiversityTraffic Diversity

V IP

663M registered Skype users in 2011. Represents 20% of long distance minutes world-wide. If Skype

VoIP were a carrier, it would be the 3rd largest in the world (behind China Mobile and Vodaphone). Largest provider of cross-border communication.Recent estimates as high as 60% of internet traffic is

VideoRecent estimates as high as 60% of internet traffic is video and music sharing; 35 hours of new videos are uploaded every minute in 2011; 2 billion views per day.

Twitter Currently 175 million registered usersTwitter Currently 175 million registered users.

Broadband 20% of global internet users have residential broadband; 68% in US subscribe to broadband. b oadba d; 68% US subsc be to b oadba d

Mobile5.3 billion mobile phone subscribers; 85% of new handsets will be able to access the mobile web; 1 in 5 ;has access to fast service, 3G or better; IM, MMS, SMS expected to exceed 10 trillion message by 2013.

Page 8: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

The Big Data Landscape IV: The Long Tail f S iof Science

• Hundreds of thousands of scientists and engineers work individually or in small distributed disconnected groups all generating data thatin small, distributed, disconnected groups – all generating data that collectively represent an enormous, largely untapped scientific resource

F i i l ti i t t– From running simulations, experiments, etc.

• Making heterogeneous data across many areas of science more homogeneous could give way to breakthroughs across all areas of science and engineering

• Estimated 40 exabytes of unique new information generated worldwide in 2010

• Only 5% of the information created is “structured,” however, in a standard format of words or numbers; the rest are unstructured text, voice images etcvoice, images, etc.

Page 9: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

How Big is Big?

• “Big Data”: “Datasets whose size gare beyond the ability of typical database software tools to capture, store, manage, and analyze”

McKinsey Global Institute Big data: the-McKinsey Global Institute, Big data: the next frontier for innovation, competition, and productivity, May 2011.

Image Credit: Sigrid Knemeyer

Page 10: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

…Not Just Volumes of Data

• The science of big data is not just aboutThe science of big data is not just about volumes and velocity of data, but also– Heterogeneity and diversityg y y

• Levels of granularity• Media formats

S i ifi di i li• Scientific disciplines– Complexity

• Uncertainty• Uncertainty• Incompleteness• Representation types

Page 11: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Why is Big Data Important?Why is Big Data Important?• Critical to transforming how science is done and to

accelerating the pace of discovery in almost everyaccelerating the pace of discovery in almost every science and engineering discipline

• Transformative implications for commerce and economy• Potential for addressing some of society’s most pressing

challenges

Image Credit: Chi Birmingham

Page 12: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Paradigm Shift: from Hypothesis-driven to Data-driven Discoveryto Data driven Discovery

The Fourth Paradigm: Data‐Intensive Scientific

The Economist, The data deluge and how to Data Intensive Scientific 

Discovery (2009, Microsoft Corporation).

deluge and how to handle it: A 14‐page special report (Feb 25, 

2010). 

http://www.sciencemag.org/site/special/data/ http://www.economist.com/node/15579717 http://research.microsoft.com/en‐us/collaboration/fourthparadigm/

Page 13: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

The Age of Data: From Data to Knowledge to ActionFrom Data to Knowledge to Action

• Data-driven discovery is l ti i i i tifi l tirevolutionizing scientific exploration

and engineering innovations

• Automatic extraction of new kno ledge abo t the ph sicalknowledge about the physical, biological and cyber world continues to accelerate

Multi cores concurrent and parallel• Multi-cores, concurrent and parallel algorithms, virtualization and advanced server architectures will enable data mining and machine l i d di dlearning, and discovery and visualization of Big Data

Page 14: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Potential for Transformational Science & Engineering:Engineering:

From Data to Knowledge to Action

I t ti f di i li ( di• Integration of discipline (or media format…) specific data, examine for relationshipsrelationships– Disaster informatics

• 3D toxic fume images• 3D toxic fume images• Simulations of gas spread• Maps of census concentrations• First responder on-the-ground findingsground findings• Evacuation routing

Page 15: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

From Data to Knowledge to Action

• Researchers seek to fundamentally transform understanding of spinningtransform understanding of spinning giants

– The task involves assembling data from more storm variables-such as updraft, downdraft and vorticity or g p , yregions of spin--than what can be observed from ground tornado chasers or even actually produced in the atmosphere. For example, researchers need to better understand how changes in wind direction with height cause the updrafts in a storm to rotate, preceding the formation of a tornado.

– To solve the quandary, Amy McGovern, an associate professor in OU's School of Computer Science, and her team create tornado models with super computers that can process vast amounts of data. McGovern and her colleagues use the models to analyze how storm variables interact in order to identify tornadic and non-tornadic storms.

Page 16: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Examples of Research Challenges

• More data are being collected than we can store • Analyze the data as it becomes available• Decide what to archive and what to discard• Decide what to archive and what to discard

• Many data sets are too large to download• Analyze the data wherever it resides

M d t t t l i d t b bl• Many data sets are too poorly organized to be usable• Better organize and retrieve data

• Many data sets are heterogeneous in type, structure, semantics, organization granularity accessibilityorganization, granularity, accessibility …• Integrate and customize access to federate data

• Utility of data is limited by our ability to interpret and use itExtract and visualize actionable knowledge• Extract and visualize actionable knowledge

• Evaluate results• Large and linked datasets may be exploited to identify individuals

D i t d l i ith b ilt i i• Design management and analysis with built-in privacy preserving characteristics

Page 17: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

A National Imperative

PCAST ll th F d l t• PCAST calls on the Federal government to increase R&D investments for collecting, storing, preserving, managing, analyzing, and sharing the g g, y g, gincreasing quantities of data.

• Furthermore, PCAST observed that the potential to gain new insights to movepotential to gain new insights … to move from data to knowledge to action has tremendous potential to transform all areas of national priority.

Source: PCAST (December 2010), “Report to the President and Congress: Designing a Digital Future…”– a periodic congressionally-mandated review of the Federal Networking and Information Technology Research and Development (NITRD) Program.

Page 18: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Administration’s Big Data Research and D l t I iti tiDevelopment Initiative

• Big Data Senior Steering Group – chartered in spring 2011 under the Networking and Information Technology R&D (NITRD) Program– Members from DARPA, DOD OSD, DHS, DOE-Science, HHS, NARA, NASA NIST NOAA NSA OFRNASA, NIST, NOAA, NSA, OFR, USGS, etc.– Co-chaired by NSF (and NIH)Co chaired by NSF (and NIH)– Initial charge was to come upwith a plan, a strategy

18

Image Credit: Fuqing Zhang and Yonghui Weng, Pennsylvania State University; Frank Marks, NOAA; Gregory P. Johnson, Romy Schneider, John Cazes, Karl Schulz, Bill Barth, The University of Texas at Austin

Page 19: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Big Data MembershipBiven   Laura   DOE Science  [email protected]   Alan   NSFNational Science Foundation   [email protected]  Collica   Leslie   NISTNational Institute of Standards and Technology   [email protected]  Deift   Abby   NSFNational Science Foundation   [email protected]  Downing   Gregory   HHSDepartment of Health and Human Services  [email protected]

Espina   Pedro   OSTPWhite House Office of Science and Technology Policy [email protected]  Policy  

Gerr   Neil   DARPADefense Advanced Research Projects Agency   [email protected]   Linda   USGSU.S. Geological Survey  [email protected]

Hall   Alan   NOAANational Oceanic and Atmospheric Administration   [email protected]  

Iacono   Suzanne   NSFNational Science Foundation  [email protected]   David   OSDOffice of the Secretary of Defense ‐ATL   [email protected]  Kaufman   Daniel   DARPADefense Advanced Research Projects Agency   [email protected]

OSTPWhit H Offi f S i d T h lLarson   Phillip P.   OSTPWhite House Office of Science and Technology Policy   [email protected]  

Lee   Tsengdar  NASANational Aeronautics and Space Administration   [email protected]

Lipman   David   NIHNational Institutes of Health /NLMNIH’s National Library of Medicine /NCBI   [email protected]  

Little   Michael   NASANational Aeronautics and Space Administration   [email protected]

Luker   Mark  NCONational Coordination Office for NITRD /NITRDNetworking and Information Technology Research and Development 

[email protected]  p

Marth   Lisa   NISTNational Institute of Standards and Technology   [email protected]

Muoio   Patricia A.   DNI   [email protected]  

Pantula   Sastry   NSFNational Science Foundation  [email protected]

Preuss   Don   NIHNational Institutes of Health /NLMNIH’s National Library of Medicine /NCBI   [email protected]  

Quade   Brittany   NSFNational Science Foundation   [email protected]  Romine   Charles   NISTNational Institute of Standards and Technology   [email protected] g

Smith   Darren   NOAANational Oceanic and Atmospheric Administration   [email protected]  

Spengler   Sylvia   NSFNational Science Foundation  [email protected]   Tom   NSFNational Science Foundation   [email protected]  

Strawn   George  NCONational Coordination Office for NITRD /NITRDNetworking and Information Technology Research and Development 

[email protected]  

Suskin   Mark   NSFNational Science Foundation  [email protected]   Jennifer   NIHNational Institutes of Health /NIGMS   [email protected]  

Wigen   Wendy  NCONational Coordination Office for NITRD /NITRDNetworking and Information Technology Research and Development  

[email protected]  

Zhao   Fen   NSFNational Science Foundation   [email protected]  Nowell   Lucy   DOE Science  [email protected]

Page 20: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Big Data Membership

B r i s to l     S k y     U S G S U .S .  G e o lo g i c a l  S u r v e y   s b r i s to l@ u s g s .g o vK ie lm a n     J o s e p h     D H S D e p a rt m e n t  o f  H o m e la n d  S e c u r i t y   jo s e p h .k ie lm a n @ d h s . g o vP e t te rs     J o n a th a n   D O E  S c ie n c e   J o n a th a n . P e t t e r s@ s c ie n c e . d o e .g o v

l d d fC a r v e r     D o r i s     N S F N a t io n a l  S c ie n c e  F o u n d a t io n   d c a r ve r @ n s f .g o v  A d o l f ie     L a u r a     O S D O f f i c e  o f   t h e  S e c r e t a r y  o f  D e f e n s e   l a u ra . a d o l f ie@ o s d . m i lC h a d d u c k     R ob e r t     N S F N a t io n a l  S c ie n c e  F o u n d a t io n     r c h a d d u c@ n s f .g o v    C r ow d e r     G ra c e     N S A N a t io n a l  S e c u r i t y  A g e n c y   g a c r o w d @ n s a . g o v  D u n n     M ich e l le     N IH N a t io n a l   In s t i tu t e s  o f  H e a l t h   d u n n m 3@ m a i l .n ih .g o v  F l V l i N IH N t i l I t i t t f H l t h f l @ i l ihF lo r a n ce     V a le r ie     N IH N a t io n a l   In s t i tu t e s  o f  H e a l t h   f lo r a n c e v @ m a i l .n ih .g o v  F r e h i l l     L i s a     O S D O f f i c e  o f   t h e  S e c r e t a r y  o f  D e f e n s e   l i s a .f r e h i l l .C T R @ d a r p a .m i l  H e l la n d     B a rb a ra     D O E D e p a rt m e n t  o f  E n e r g y   ‐ S c ie n c e   b a rb a r a . h e l la n d @ s c ie n c e . d o e . g o v  

H o a n g     T h u c     D O E D e p a rt m e n t  o f  E n e r g y   ‐N N S A   t h u c .h o a n g @ n n s a .d o e .g o v  K a n n a n     N a n d in i     N S F N a t io n a l  S c ie n c e  F o u n d a t io n   n k a n n a n @ n s f . g o v  W T d N S F N t i l S i F d t i t @ fW a r n o w     T a n d y     N S F N a t io n a l  S c ie n c e  F o u n d a t io n   t w a r n o w @ n s f .g o vD e a n     D a v id     D O E  S c ie n c e     d a v id . d e a n @ s c ie n c e .d o e . g o v    L y s te r     P e te r     N IH N a t io n a l   In s t i tu t e s  o f  H e a l t h   l y s t e r p @ m a i l . n ih .g o v  M i l le m a c i     J o h n     O S D O f f i c e  o f   t h e  S e c r e t a r y  o f  D e f e n s e   jo h n . m i l le m a c i .c t r @ o s d .m i lP e a rc e     C la u d ia     N S A N a t io n a l  S e c u r i t y  A g e n c y   c e p e a r c e @ n s a .g o v  A l le n M a r c N A S A N a t io n a l A e ro n a u t ic s a n d S p a c e A d m i n i s t r a t io n m a r c a l le n @ n a s a g o vA l le n     M a r c     N A S A N a t io n a l  A e ro n a u t ic s  a n d  S p a c e  A d m i n i s t r a t io n  m a r c .a l le n @ n a s a .g o v  P e a r l     J e n n i f e r     N S F N a t io n a l  S c ie n c e  F o u n d a t io n   j s l im o w i @ n s f . g o v  B la s z k o w s k y   D a v id     T re a s u r y D e p a rt m e n t  o f   t h e  T r e a s u r y  O F R   d a v id . b la s z k o w s k y @ tr e a s u ry .g o v  

S z y k m a n     J a m e s     E P A E n v i ro n m e n ta l  P ro t e c t io n  A g e n c y   ja m e s . j . s z y k m a n @ n a s a . g o v  T o m p k in s     J e r ry     N S A N a t io n a l  S e c u r i t y  A g e n c y   g s t o m p k @ r a d iu m . n c s c .m i l  F lo o d M a r k T re a s u r y D e p a rt m e n t o f t h e T r e a s u r y O F R m a r k f lo o d @ t r e a s u ry g o vF lo o d     M a r k     T re a s u r y D e p a rt m e n t  o f   t h e  T r e a s u r y  O F R   m a r k .f lo o d @ t r e a s u ry .g o v  H o lm     J e a n n e     D a ta .g o v   je a n n e .m . h o lm .j p l .n a s a .g o v  M is a w a     E d u a r d o     N S F N a t io n a l  S c ie n c e  F o u n d a t io n     e m is a w a @ n s f . g o v     

Page 21: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Big Data Launch

• Federal Big Data R&D Initiative launched by White House OSTP on March 29, 2012 at AAAS

• Federal Announcements:• NSF – Subra Suresh• NIH Francis Collins• NIH – Francis Collins• USGS – Marcia McNutt• DoD – Zach Lemnios• DARPA - Ken Gabriel• DOE – William Brinkman

• Panel Discussion: • Moderator - Steve Lohr, New York Times

Image Credit: National Science Foundation

,• Daphne Koller, Stanford University• James Manyika, McKinsey & Company • Lucila Ohno-Machado, UC San Diego • Alex Szalay Johns Hopkins University• Alex Szalay, Johns Hopkins University

More information available at: http://nsf.gov/news/news_summ.jsp?org=CISE&cntn_id=123607&preview=false

Page 22: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Strategy to Address Big Datagy g

Foundational research to develop new techniques and 

technologies to derive knowledge

New cyberinfrastructure to manage, curate, and serve data to 

h i itechnologies to derive knowledge from data research communities

Policy

New approaches for education and workforce development

New types of inter‐disciplinary collaborations, grand challenges, 

and competitions

Page 23: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIG DATA)

Program Solicitation: NSF 12-499

Foundational research to extract knowledge from data

Foundational research to advance the core techniques

d t h l i fand technologies for managing, analyzing, visualizing, and extracting g, guseful information from large, diverse, distributed and heterogeneous dataand heterogeneous data sets.

C Di t t P NSF Wid

Image Credit: Jurgen Schulze, Calit2, UC‐San Diego

Cross‐Directorate Program: NSF WideMulti‐agency Commitment: NSF and NIH

Page 24: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

BIG DATA Research ThrustsCollection, Storage, and 

Management of “Big Data”

•3 awards

Data Analytics 

• 4 awards

Research in Data Sharing and Collaboration

• 1 award (+1 shared with data collection)

•Foundations of big data management

•Mitigating tradeoffs among speed of data ingestion quicker

• Novel machine learning where multi-dimensional vector data points are replaced by distributions

• Design and test

data collection)

• Open source tools for infrastructure for improving discovery through use of social analytic dataingestion, quicker

answers and the freshness of data through the design of new storage devices with extreme capacities

Design and test mathematical and statistical techniques for large-scale heterogeneous data in DNA repositories

analytic data

• Databridge– linking data, human interactions, and usage practices for the long-tail of science

•Databridge – linking data, human interactions and usage practices for the long-tail of science

• Data analytics problems in next generation sequencing

• Theory and algorithms fro couples tensors and

Credit: Fermilab Photo

fro couples tensors and associated software toolkits to make analysis possible

Eight mid‐scale (up to $1M a year) awards out of over 136 projects announced on Oct. 3.

Page 25: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Award CitationsAward Citations

• DCM:– Dan Suciu – University of WA

• A formal foundation for big data management– Michael Bender – SUNY at Stony Brook &Michael Bender SUNY at Stony Brook &

Martin Farach-Colton – Rutgers University• Eliminating the data ingestion bottleneck in big data

applicationspp– Arcot Rajasekar – University of North Carolina,

Chapel Hill & Gary King – Harvard University & Justin Zhan – North Carolina Agriculture & Technical State U i itUniversity

• Databridge – A sociometric system for long-tail science data collections

25

Page 26: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Award CitationsAward Citations• Data Analytics

Eli Upfal Brown University– Eli Upfal – Brown University• Analytic approaches to massive data computation with

applications to genomicsAarti Singh Carnegie Mellon University– Aarti Singh – Carnegie-Mellon University

• Distribution-based machine learning for high dimensional datasets

Srinvas Aluru Iowa State University & Wuchun– Srinvas Aluru – Iowa State University & Wuchun Feng – Virginia Polytechnic Institute & State University & Oyekunie Olukotun – Stanford UniversityUniversity

• Genomes Galore – Core techniques, libraries, and domain specific languages for high throughput DNA sequencing

Page 27: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Award CitationsAward Citations

• Data Analytics (continued)Data Analytics (continued)– Christos Faloutsos – Carnegie Mellon

University & Nikolaos Sidiropoulos –University & Nikolaos Sidiropoulos University of Minnesota – Twin Cities

• Big Tensor Mining: Theory Scalable Algorithms g g y gand Applications

27

Page 28: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Award CitationsAward Citations

• E-Science Collaboration EnvironmentsE Science Collaboration Environments– Thorsten Joachim – Cornell University & Paul

Kantor – Rutgers UniversityKantor Rutgers University• Discovery and social analysis for large-scale

scientific literature

28

Page 29: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Ideation Contest LaunchIdeation Contest Launch

• Opportunity to expand the innovation ecosystempp y p y• Joint among NASA, NSF and DOE Office of

Science• A contest focused on “How to make• A contest focused on How to make

heterogeneous data seem more homogeneous?”• 5 judges• 5 criteria• Launched on Challenge.gov and the Top Coder

platform on Oct 3 with a two week windowplatform on Oct. 3 with a two week window– http://challenge.gov/NASA/425-big-data-challenge-

serieshttp://community topcoder com/coeci/nitrd/– http://community.topcoder.com/coeci/nitrd/

Page 30: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Ongoing Big Data Programs at NSF

• Dear Colleague Letters:– Encourage CIF21 IGERTs to educate and support a new– Encourage CIF21 IGERTs to educate and support a new

generation of researchers able to address fundamental Big Data challenges: http://www.nsf.gov/pubs/2012/nsf12555/nsf12555.htmD t I t i Ed ti R l t d R h F di– Data-Intensive Education-Related Research Funding Opportunities announcing an Ideas Lab, for which cross disciplinary participation will be solicited, to generate transformative ideas for using large datasets to enhance the effectiveness of teaching and learning environments: http://www.nsf.gov/pubs/2012/nsf12060/nsf12060.jsp

– Data Citation to the Geosciences Community to encourage transparency and increased opportunities for the use and analysis of data sets:use and analysis of data sets: http://www.nsf.gov/pubs/2012/nsf12058/nsf12058.jsp

Page 31: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Earthcube: GEO Science InfrastructureG f• EAGER awards announced as part of White House Big Data Launch

• Integrates geosciences data and high-performance computing technologies in an open, adaptable and sustainable framework to

bl f i h d d i i E h Senable transformative research and education in Earth System Science

• Innovative Model: Community designed, community owned, it dcommunity governed

• Interdisciplinary research: – Building and sustaining “new” communities– Workshops to bring together (GEO, SBE, CISE) communities– EAGER awards to seed new research

Page 32: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

A Complex Policy SettingA Complex Policy Setting

Researchers ant data• Researchers want data.• Public policy requires access to data.• Public policy also requires protection of privacy andPublic policy also requires protection of privacy and

intellectual property and other sensitive information.• Much more to be done: Policy on data management

and data access.

Page 33: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Data PrivacyData Privacy

• Never more important than today• However, not all data contain people’s identities (as in data

landscape III)– Not a Big Brother scenario

Go ernment (NSF) in ests in pri ac research– Government (NSF) invests in privacy research• Values in design research community:

– Identity cloaking, anonymization– Do-not-track cookie management

Obfuscation blurring– Obfuscation, blurring– Privacy preserving data mining, search, payment– Just-in-time crypto– Secure data distribution– ….

Privacy in technology; privacy inspired technology

Page 34: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Opportunities for the Future

• Our investments in research and education have already returned exceptional dividends to the Nationreturned exceptional dividends to the Nation.

• Many of tomorrow’s breakthroughs will occur as a result of t h i d t h l i f d i Bi D tnew techniques and technologies for advancing Big Data

science and engineering.

• In turn, Big Data scientific discovery and technological innovation are at the core of our response to national and societal challenges – from environment, energy,societal challenges from environment, energy, transportation, sustainability, and healthcare to cyber security and national defense.

Page 35: Big Data R&D InitiativeR&D Initiative - DELS Microsite Networknas-sites.org/emergingscience/files/2013/01/IACONO-BigDataNAS-env... · Gerr Neil DARPADefense Advanced Research Projects

Thanks!

[email protected]