Download - The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

Transcript
Page 1: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

Philip E. Bourne Ph.D.Associate Director for Data Science

National Institutes of Health

Page 2: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

A View from the Funding Agencies

Page 3: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

“It was the best of times, it was the worst of times, it was the age of

wisdom, it was the age of foolishness, it was the epoch of belief, it was the

epoch of incredulity, it was the season of Light, it was the season of

Darkness, it was the spring of hope, it was the winter of despair …”

Page 4: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

Roughly translated…

A time of great (unprecedented?) scientific development but limited

funding

A time of upheaval in the way we do science

Page 5: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

From a funders perspective…

A time to squeeze every cent/penny to maximize the amount of research that

can be done

A time for when top down approaches meet bottom up approaches

Page 6: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

Top Down vs Bottom Up

Top Down– Regulations e.g. US:

Common Rule, FISMA, HIPPA

– Data sharing policies• GWAS• Genome data• Clinical trials

– Digital enablement– Moves towards

reproducibility

Bottom Up– Communities emerge

and crowdsource• Collaboration• Data shared• Open source

software • Common principles• Standards

Page 7: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

A Time for New Models

Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830

Page 8: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

And This May Just be the Beginning

Evidence:– Google car– 3D printers– Waze– Robotics

From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee

Page 9: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

Consider This an Opportunity

Look at the value of data

Derive new business models

Look for new efficiencies

Foster best practices Foster collaboration ….

Page 10: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

It is the age when functional annotation is in the greatest demand

for science..

It is the age when the rewards outside academia are greater than the rewards

inside

Page 11: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

Associate Director for Data Science

CommonsTrainingCenter BD2K Modified

Review

Sustainability* Education* Innovation* Process

• Cloud – Data & Compute

• Search• Security • Reproducibility

Standards• App Store

• Coordinate• Hands-on• Syllabus• MOOCs

• Community• Centers• Training Grants• Catalogs• Standards• Analysis

• Data Resource Support

• Metrics• Best

Practices• Evaluation• Portfolio

Analysis

The Biomedical Research Digital Enterprise

Communication

Collaboration

Programmatic Theme

Deliverable

Example Features • IC’s• Researchers• Federal

Agencies• International

Partners• Computer

Scientists

Scientific Data Council External Advisory Board

* Hires made

Page 12: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

Innovation – Big Data to Knowledge BD2K

Centers of excellence Software catalog Data catalog Software initiatives Standards Training

bd2k.nih.gov

Page 13: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

Sustainability and Sharing: The Commons

Data

The Long Tail

Core Facilities/HS Centers

Clinical /Patient

The Why:Data Sharing Plans

TheCommons

Government

The How:

DataDiscoveryIndex

SustainableStorage

Quality

Scientific Discovery

Usability

Security/Privacy

Commons == Extramural NCBI == Research Object Sandbox == Collaborative Environment

The End Game:

KnowledgeNIHAwardees

PrivateSector

Metrics/Standards

Rest ofAcademia

Software StandardsIndex

BD2KCenters

Cloud, Research Objects,Business Models

Page 14: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

What The Commons Is and Is Not

Is Not:– A database– Confined to one physical

location– A new large

infrastructure– Owned by any one group

Is:– A conceptual framework– Analogous to the Internet– A collaboratory– A few shared rules

• All research objects have unique identifiers

• All research objects have limited provenance

Page 15: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

What Does the Commons Enable?

Dropbox like storage

The opportunity to apply quality metrics

Bring compute to the data

A place to collaborate

A place to discover

http://100plus.com/wp-content/uploads/Data-Commons-3-1024x825.png

Page 16: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

[Adapted from George Komatsoulis]

One Possible Commons Business Model

HPC, Institution …

Page 17: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

What Are the Benefits to Those Doing Functional Annotation?

Open environment in which to test new ideas – better for crowdsourcing

Opportunity to gain resources to run annotation pipelines

Opportunity to collaborate through provision of open APIs

Better characterization and accessibility to annotation methods

Page 18: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

Commons Pilots

Define a set of use cases emphasizing:– Openness of the system– Support for basic statistical analysis– Embedding of existing applications– API support into existing resources

Evaluate against the use cases Review results & business model with NIH leadership Design a pilot phase with various groups Conduct pilot for 6-12 months Evaluate outcomes and determine whether a wider

deployment makes sense Report to NIH leadership summer 2015

Page 19: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

Some Acknowledgements

Eric Green & Mark Guyer (NHGRI) Jennie Larkin (NHLBI) Leigh Finnegan (NHGRI) Vivien Bonazzi (NHGRI) Michelle Dunn (NCI) Mike Huerta (NLM) David Lipman (NLM) Jim Ostell (NLM) Andrea Norris (CIT) Peter Lyster (NIGMS) All the over 100 folks on the BD2K team

Page 20: The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health