EPA 2013 Air Sensors Meeting Big Data Talk

BIG DATA (IN BIOLOGY): INTEGRATING LARGE, FAST MOVING,

HETEROGENEOUS DATASETS

Adina Howe

Argonne National Laboratory

Michigan State University

EPA Air Sensors 2013: Data Quality and Applications

March 19, 2013

Introduction – My perspective

Experiment

Design

Data Generation

Workflow / Tools

Data analysis

Applied Solutions Engineering

Microbial EcologyBioinformatics

THE DATA DELUGEAn exponential landscape

Next-generation sequencing growth outpacing computational resources

Stein, Genome Biology, 2010

Next-generation sequencing growth outpacing computational resources

Stein, Genome Biology, 2010

Effects of low cost sequencing…1995 First free-living bacterium sequenced

for billions of dollars and years of analysis

Personal genome can be mapped in a few days and hundreds to few thousand dollars

Effects of low cost sequencing on research

Sboner et al., Genome Biology, 2011

Technology

competencyValue added

RETHINKING

What it takes to deliver

Technical obstacles in the big data deluge

• Access to the data and its value • Access to the resources

Democratization of both data and resource access

“80% of awards and 50% of $$ are for grants < $350,000”

Root causes:• Data volume and velocity “clog”• Data is very heterogeneous• Previous efforts are difficult to integrate• Innovation is necessary but hard

Experiment

Design

Data Generation

Workflow / ToolsData analysis

Applied Solutions

Social obstacles are the most difficult.• Shift of costs do not mean a shift of expectations

• “Give me the answer so I can get back to work.”

• A culture of sharing (data, time, and tools)

• Evolution of necessary training• Creating teams that can communicate across domains

• Incentives are not strong enough• Patterns for success (useful data sharing and

collaboration) are not apparent or well understood.

POSSIBLE SOLUTIONS

Common solutions: been there, done that

http://xkcd.com/927/

What would an ideal solution look like?

• Flexible access to data, tools, and resources

• Cost effective, consistent, reusable (scalable)

• Rapid exploration• Incentives to participate,

share, communicate• Community sandbox (vs

lab-specific)• Painless

Platform which supports an “ecology” of databases, interfaces, and analysis software.

The success of organization: Amazon• > 50 million users, > 1 million product partners, billions of

reviews, dozens of compute services.• Continually changing/updating data sets.• Explicitly adopted a service-oriented architecture that

enables both internal and external use of this data.• For example, the Amazon.com website is itself built from

over 150 independent services…• Amazon routinely deploys new services and functionality.

http://highscalability.com/amazon-architecture

https://plus.google.com/112678702228711889851/posts/eVeouesvaVX

Amazon development guideline:Colloquially said, “You should eat your own dogfood.”

Design and implement the database and database functionality to meet your own needs; only use the functionality you’ve explicitly made available to

everyone.

To adapt to research: database functionality should be designed in tight integration with researchers who are

using it, both at a user interface level and programmatically.

If the “customers” aren’t integrated into the development loop:

http://blog.thingsdesigner.com/uploads/id/tree_swing_development_requirements.jpg

DOE Knowledgebase (KBase)• Emerging software and data environment to enable

researchers• Service oriented architecture where biological data

integrated into single data model with Kbase services loosely coupled to achieve various functions

• Open development environments for community contribution (public data, services, software)

• Provides robust and scalable infrastructure (with some level of support)

https://kbase.us

Kbase uses service oriented architecture

http://kbase.us/files/6913/4990/5274/Infrastructure.pptx.pdf

DOE KBase Investment

“…may also apply for additional supplemental funding of up to $300,000 per year for development of systems biology and –omics data driven applications in collaboration with the DOE Systems Biology Knowledgbase.”

Free tutorials / workshops for the community provided.

Advice for the next round…

Data generator:• Managing expectations and value

Developer:• “Eat your own dogfood”

Data analyzer:• Analyze with reproducibility in mind

} Access

Training

Communication

Platform / Teams

Big data is a community

problem and solution

Resources• Amazon interviews

http://highscalability.com/amazon-architecture

• Titus Brown’s blog post on heterogeneous data integration

http://ivory.idyll.org/blog/software-architecture-for-heterogeneous-data-integration.html

• Kbase website

http://www.kbase.us

• Software carpentry – “helping scientists build better software”

http://software-carpentry.org

Thanks!

Please feel free to contact me:

http://adina.github.com

adina@anl.gov

http://cheezburger.com/6983817216

EPA 2013 Air Sensors Meeting Big Data Talk

Education

Transcript of EPA 2013 Air Sensors Meeting Big Data Talk

Proximity Sensor/Proximity Switch/Proximity Sensors ... · Proximity Sensors Photo Sensors Limit Switches Encoders Current Sensors Pressure Sensors Temperature Sensors Pushbuttons

1 Towards a World of Ubiquitous Sensors and Actuators After Dinner Talk AFOSR Program Review – Nanoelectronics, Negative Index Materials, and Superconducting.

EPA Moderator: Jennifer Bowman Confirmation # 73171028 · 2017. 6. 1. · EPA Moderator: Jennifer Bowman 11-27-12/3:00 p.m. ET Confirmation # 73171028 Page 2 steps where we talk about

Air Sensors – An EPA Perspective€¦ · Kristen Benedict Office of Air Quality Planning & Standards Disclaimer: Material presented is for informational purposes only. EPA does

Sensors: Making Monitoring More Useful Charles S. Spooner US EPA May 20, 2008.

Challenging Nutrients Coalition - US EPA · 2017. 4. 21. · Challenging Nutrients Coalition Part 2: Nutrient Sensors –In Action Deployment and use of nutrient sensors in a range

NEW STRATEGIES OF DIODE LASER ABSORPTION SENSORS · NEW STRATEGIES OF DIODE LASER ABSORPTION SENSORS By Jian Wang Report No. TSD 141 Work Sponsored By AFOSR & EPA High Temperature

Using Optical Flow Sensor technology to meet EPA J/Ja & RSR … · 2018-04-15 · Using Optical Flow Sensors for J/Ja & RSR Requirements 1/9/18 Page 6 OFS EPA compliant EPA 40 CFR

EPA Tools and Resources Webinar: Low Cost Air Quality Sensors · EPA Tools and Resources Webinar: Low Cost Air Quality Sensors Ron Williams US EPA Office of Research and Development

Mitigating EMI Signal Injection Attacks against Analog Sensors · PDF fileGhost Talk: Mitigating EMI Signal Injection Attacks against Analog Sensors Denis Foo Kune, John Backesy, Shane

Six Themes for this Talk - US EPA

SFR3 Sensors 111402 - University of California, San Diegocden.ucsd.edu/internal/Publications/workshop_archive/2002_Nov/talk/S… · 11/14/2002 SFR Workshop - Sensors 19 Ion Flux Uniformity

Sensors - Faculdade de Engenharia da Universidade do Portoapm/CMSW/docs/08Sensors.pdf · Sensors available include movement sensors orientation sensors environment sensors • Sensors

Air Pollution Sensor Technology- US EPA Activities Maine tribes and DEP 2014.pdfAir Pollution Sensor Technology-US EPA Activities ... Laser (scanning) Forward-Looking ... DIY sensors”–Air

EPA Groundwater Standards · EPA Method 8031, 8032A, 8033 14 EPA Method 8041A 15 EPA Method 8061A 16 EPA Method 8070A 17 EPA Method 8080A 17 EPA Method 8081B 18 EPA Method 8082A 19

EPA Tools and Resources Webinar FRMs/FEMs and Sensors ...

Oxford University Talk on Sensors 2011 Lyndsay Williams

Wildland Fire Sensors Challenge - US EPA · Wildland Fire Sensors Challenge ... •$40,000 is considered the upper price point for cost-effectiveness; ... • power requirements

Sensors - SPRINKLER TALK

The U.S. EPA Experience with Low- Cost Sensors · 2018. 11. 11. · TSI AirAssure PM 2.5 indoor Air Quality Monitor. Thousands of sensors are deployed in the US Source: PurpleAir.com.