A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer...
-
date post
20-Jan-2016 -
Category
Documents
-
view
217 -
download
0
Transcript of A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer...
![Page 1: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/1.jpg)
A View From the Trenches: Open-Source, Data-Intensive Software
Chris A. MattmannSenior Computer Scientist, NASA Jet Propulsion Laboratory
Adjunct Assistant Professor, Univ. of Southern CaliforniaMember, Apache Software Foundation
![Page 2: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/2.jpg)
Roadmap• 1st part of the talk
– The importance and concerns of open-source software
• 2nd part of the talk– NASA Earth Science Data Systems– Real-world open-source software and its use in
various projects
9-Mar-11 2ARR-MATTMANN
![Page 3: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/3.jpg)
And you are?
• Apache Member involved in– OODT (VP, PMC), Tika (VP,PMC), Nutch (PMC),
Incubator (PMC), SIS (Mentor), Lucy (Mentor) and Gora (Champion), MRUnit (Mentor)
• Architect/Developer at NASA JPL in Pasadena, CA
• Software Architecture/Engineering Prof at USC
9-Mar-11 3ARR-MATTMANN
![Page 4: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/4.jpg)
Open Source Software Development
• It’s here• Chances are if you build
software nowadays, are youare using some type of open source software
• Let me give you some context…
9-Mar-11 4ARR-MATTMANN
![Page 5: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/5.jpg)
The NASA ESDS Context
5
Where is open source most useful?
Which area should produce open source software?
9-Mar-11 ARR-MATTMANN
![Page 6: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/6.jpg)
Concerns in the Open Source World
• Licensing– GPL(v2, v3?), LGPL(v?), BSD, MIT, ASLv2– Your own custom license approved by OSS
• NASA OSS license?• Caltech license?
– Copy-left versus Copy-right• Redistribution
– Can you take open source product X and use it in your commercially interested software Y?
• If so, do you have to pay for it?– Should others pay for your open source product if they use it in their
commercial application?• Open Source “Help Desk” Syndrome versus Community
– Are you trying to simply make your open source software (releases) available for distribution (aka help desk)?
– Are you trying to get others to “buy in” to your open source software?
69-Mar-11 ARR-MATTMANN
![Page 7: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/7.jpg)
Concerns in the Open Source World
• Intellectual Property– Who owns it?– How does the Open Source Software affect your IP?
• Open Source Ecosystems– Where can you find the “killer app” you need?– Which communities are conducive for longevity?– How relevant are “generic” open source software communities to NASA Earth
Science Data Systems?• Contributing
– Are you even allowed to contribute to a OSS community?– Can you do it on “company” time?– What’s required?– What’s the governance?
• Responsiveness– How response is the OSS community to your projects’ needs?
79-Mar-11 ARR-MATTMANN
![Page 8: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/8.jpg)
Concerns in the Open Source World
89-Mar-11 ARR-MATTMANN
![Page 9: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/9.jpg)
The NASA ESDS Context
9
The aforementioned OSS concerns are cross cutting against the whole ESDS enterprise!
9-Mar-11 ARR-MATTMANN
![Page 10: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/10.jpg)
What is industry doing?• Big Data?
– They got it– Bigger than NASA even! – How about processing
1 PB/day?• How are they
doing it?– Open Source– “Big Data”
technologies• What is JPL doing to be part of this?
– That’s my goal! – Actively representing the science mission needs
9-Mar-11 ARR-MATTMANN 10
![Page 11: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/11.jpg)
Licensing• Relates to: redistribution, intellectual property,
contributing, legal strategies
• There are tons of OSS approved licenses– What’s the difference between them?
• The difference mostly has to do with – Commercialization– Redistribution– Attribution
• Let’s take a few examples
119-Mar-11 ARR-MATTMANN
![Page 12: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/12.jpg)
Some OSS licenses• Apache
LicenseV2
– Allows (unrestricted):• Redistribution (or not)• Commercialization• Must keep ASL
headers andNOTICE file bearingASL attribution
• Contributors to ASFsign Apache CLAs or CCLAs
• BSD License– Allows
• Redistribution (or not)• Commercialization• Must include header
in code or in NOTICE• 0 contrib. restrictions
129-Mar-11 ARR-MATTMANN
![Page 13: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/13.jpg)
Some OSS licenses that aren’t so friendly
• GPL(v2, v3), LGPL– Allows (restricted):
• Redistribution– Based on implied commerciality and copyleft
• Commercialization– May need to pay license fees
• Standard attribution requirements (NOTICE or in source code headers) based on copyleft
– “Copy left” Syndrome• Software must be redistributed under the same original terms of
the any upstream author. • Author is not free to decide redistribution terms, or even source
code inclusion terms (except under fair use which supersedes)
139-Mar-11 ARR-MATTMANN
![Page 14: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/14.jpg)
Why licenses are important• “War Story”
– Amazon EC2, S3• Johnson and Johnson Pharm. R&D
– At a recent conference I met the director of R&D for J&J. He presented a story wherein which J&J needed large bursting processing and limited data storage for some drug tests they were conducted. They decided to use Amazon EC2. After reviewing Amazon’s licensing policy for EC2 J&J’s laywers determined that Amazon claimed IP for any data or computational results produced on its cloud. Since the need for Amazon’s processing and cloud was limited to a few trials, and since the costs were so outrageous to stand up its own cluster for these experiments, J&J decided to forge ahead with Amazon with the understanding that its lawyers would “duke it out.” should the need arise with Amazon’s lawyers based on the FOU restrictions and IP claims induced by Amazon EC2.
149-Mar-11 ARR-MATTMANN
![Page 15: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/15.jpg)
Why licenses are important• “War Story 2”
– Oracle versus Google• Is there really free Java with your free lunch?
– Although there is no definitive answer besides papers filed in court, Oracle’s claims in its lawsuit are based on perceived patents for the Java Virtual Machine and its associated IP. Sun originally filed patents on the Java Virtual Machine and its translation of programming language code into runtime executables. In order for a JVM to be a “certified” (read: trademark) JVM, the JVM must pass a “Test Compatibility Kit” (TCK), which Oracle/Sun license at a cost to JVM vendors. By purchasing the TCK, a JVM builder is given IP knowledge of the JVM patents. If a company builds a JVM but does not purchase the TCK, Oracle/Sun loses licensing dollars.
159-Mar-11 ARR-MATTMANN
![Page 16: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/16.jpg)
What does OSS licensing mean to Software Development?
• Awareness– Know your distribution requirements– Know your contributor expectations– Know your field of use expectations– Know your commercialization expectations
• Leverage licenses that give the most leeway in all important categories above– Allows the decision on some of the above aspects to be
delayed without fear of penalty or prejudice • Friendly licenses: ASLv2, BSD, MIT• Know what you are signing *before* you sign it (or use
it)
169-Mar-11 ARR-MATTMANN
![Page 17: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/17.jpg)
Redistribution
• So you’ve built some awesome piece of software• And you’re wondering
– What are my options for distributing it?– Question: what license are you going to choose?– Hopefully one that supports redistribution under your
own terms!
• How to redistribute the software?– Requires infrastructure– Who’s going to set it up?
179-Mar-11 ARR-MATTMANN
![Page 18: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/18.jpg)
OSS Redistribution Infrastructure
18
Issue tracking
Portals
Repository browsing
Mailing lists
Acceptance testing
Source code repository
9-Mar-11 ARR-MATTMANN
![Page 19: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/19.jpg)
OSS Ecosystems• Where should you go to for your open source project?
• Should NASA have its own?• Should your project (SIPS, DAAC, proposal) have its own?
199-Mar-11 ARR-MATTMANN
![Page 20: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/20.jpg)
OSS Ecosystems• FTR, there are tons of concerns here
– Should the ecosystem impose license restrictions?• Only support license X or Y?
– What are the redistribution policies?– What’s the community?– What’s the IP?– What’s the infrastructure support?– Is it a foundation with rules, or just free for all?
• Elephant in the room: What is the exposure? How many people are going to community C and are going to see your software and perhaps want to use it, improve it, file bugs against it, file patches, etc.?
• Where/how does this matter to you?– Standing up our own OSS ecosystem may make sense for large, coarse-grained
federations– A LOT more difficult to justify OTOH, for fine grained components, and module reuse
209-Mar-11 ARR-MATTMANN
![Page 21: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/21.jpg)
Community versus “Help Desk”• How do you want to have your open source software
project run in the open?– Do you expect folks to come to you and only file bugs?
• Then you and your team are the only ones who can fix them?• Then you and your team are the only ones who can release updates?• ”Help Desk” open source project• Examples: Sourceforge.net, Google Code, etc.
– Do you expect to grow a community of interest where volunteers actively engage in software development?
• Are volunteers empowered to pick up a shovel and help dig the hole?• Can volunteers (including your own paid employees) file issues?• Do you want to give the community a stake/vote in the overall
process?• “Community Building” open source project• Examples: Apache Software Foundation, Eclipse Foundation, etc.
219-Mar-11 ARR-MATTMANN
![Page 22: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/22.jpg)
Communities for your organization• Working on common software
– Measured not in terms of what center contributor works for, but in terms of
• Number of patches contributed (high quality)• Mailing list questions answered• # of releases made, or helped with• Tests written• Documentation added
– Take the politics out of it and just work on “core” common code of mutual interest
• Deciding on the right redistribution mechanism and license– Apache Software Foundation and ASLv2 provide openness and
ability for center-local redistribution, commerciality and other decisions (for internal distributions and beyond)
229-Mar-11 ARR-MATTMANN
![Page 23: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/23.jpg)
Our experience: Apache OODT
• Front Pagearticle on NASA.gov– Free and available
to download
• First true NASA FOSS technology– Distributed at
Apache
• Featured onSlashdot
239-Mar-11 ARR-MATTMANN
![Page 24: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/24.jpg)
Free as in beer• “Something like the Apache foundation is the best place for released government software.
A previous attempt at release and public distribution via a private company was a truly dismal failure. OpenPBS (portable batch system) is supposed to be available to anyone that asks. However when you do ask a sales rep strings you along for more than a month trying to sell you something that they can't actually assure you will fit your requirements (and is no longer under development) even when the free one is documented as doing so. It was a truly stupid waste of the salesperson's time and mine that would have exceeded the price of providing the file for download or sending by email by several orders of magnitude and generated a lot of ill will. I'll go as far as saying it was blatant false advertising using a government funded open source product to do a bait and switch to try to sell me an unmaintained product they picked up in a corporate take over. My experience appears to have been identical to that of many that attempted to obtain this government funded open source software that NASA had declared was available for anyone. Eventually due to this open source project becoming closed the project just had to fork and the compatible Torque batch system was developed by people that had actually get hold of the original OpenPBS.”
– Slashdot user comment
249-Mar-11 ARR-MATTMANN
![Page 25: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/25.jpg)
So, again, why do we care?
• Software Engineering Research Challenges/Questions– Students are using open source– They will inevitably see it in their projects, will use it
to build out next generation infrastructure, etc.• Practitioner challenges
– I still hear the “I just picked one” syndrome• We need to better train software engineering
students and practitioners to understand the dimensions of open source software
9-Mar-11 ARR-MATTMANN 25
![Page 26: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/26.jpg)
2nd part of the Talk
• Some context and real-examples
9-Mar-11 ARR-MATTMANN 26
![Page 27: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/27.jpg)
NASA Ground Data Systems
Credit: D. Woollard
9-Mar-11 27ARR-MATTMANN
![Page 28: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/28.jpg)
Context• NASA develops science data processing systems for
multiple earth science missions• These systems convert the instrument telemetry
delivered to earth from space into useful data for scientific research
• Typical characteristics– Remote sensing instruments that orbit the Earth multiple
times daily– Data are acquired constantly– Complex algorithms convert instrument measurements to
geophysical quantities
9-Mar-11 28ARR-MATTMANN
![Page 29: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/29.jpg)
Some Key Open Source Technologies in Use to Enable Science Data
• Apache Tomcat– Core
AppServerfor many of our Data Web Services and Web Apps
• Apache HTTPD
– Powers front-facing portals, redirects to app servers (like Tomcat), vhosting, as well as fronting for Plone
• Apache Tika– Metadata
extraction, content identification• Apache Hadoop
– Cloud Storage
ARR-MATTMANN 29
Also using Apache OODT to link all of these specific technologies together!
9-Mar-11
![Page 30: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/30.jpg)
Application to planetary science
DJC-30
• Often unique, one of a kind missions– Can drive technological changes
• Instruments are competed and developed by academic and industry
– Highly distributed acquisition and processing across partner organizations
– Highly diverse data sets given heterogeneity of the instruments and the targets (i.e. solar system)
• Missions are required to share science data results:
– Common domain information model used to drive system implementations
– Expert scientific help to the user community on using the data
– Peer-review of data results to ensure quality
• All planetary science data results from NASA (and some international) missions is deposited into the Planetary Data System, a federation of nodes in the USA and other systems internationally
Planetary Data SystemDistributed Planetary Science Archive
Small Bodies NodeUniversity of Maryland
College Park, MD
Planetary Plasma Interactions NodeUniversity of California Los AngelesLos Angeles, CA
Geosciences NodeWashington University
St. Louis, MOImaging NodeJPL and USGSPasadena, CA and Flagstaff, AZ
THEMIS Data NodeArizona State UniversityTempe, AZ
Central NodeJet Propulsion LaboratoryPasadena, CA
Navigation Ancillary Information NodeJet Propulsion LaboratoryPasadena, CA
Rings NodeAmes Research CenterMoffett Field, CA
Atmospheres NodeNew Mexico State UniversityLas Cruces, NM
Apache OODT, Solr
9-Mar-11 ARR-MATTMANN
![Page 31: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/31.jpg)
Application for Earth Missions
DJC-31
• Leveraged OODT software framework for constructing ground data systems for earth science missions
– Used OODT Catalog and Archive Service software
– Focus is on Workflow Management
• Constructed “workflows” – Execution of “processors” based on a set of
rules– Explicit separation of workflow management
from management of computational resources
• Provided “lights out” operations
• Multiple Missions– SeaWinds– QuikSCAT– Orbiting Carbon Observatory (OCO), OCO-2…– NP Sounder PEATE– SMAP
Spacecraft& Ancillary
Files
Pre-Processors
(PP)
ScienceLevel
Processors(LP)
Science Analysis
and Quality
Reporting(SA)
InstrumentCommands File
Transfer (FX)
User Interface (Process Monitoring & Control, Instrument Commanding, Data Verification)
Data Management and Automatic Process Control (PM) using OODT
EngineeringAnalysis
(EA)
Product Delivery (PM)Science
ProductsReleased
toPO.DAAC
SeaWinds on ADEOS II (Launched Dec 2002)
ARR-MATTMANN
Credit: D. Freeborn, C. Mattmann, D. Woollard
Apache OODT, Lucene, Tika, Tomcat, HTTPD
9-Mar-11
![Page 32: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/32.jpg)
Application to Airborne Science Missions
ARR-MATTMANN 3232
Users & Science Community
Modeling &Visualization
AirborneData
Ground Sensors
Spacecraft& Other Data
Sources
A full service stack is deployed for each mission, utilizing any mission-provided resources as well as a cloud computing infrastructure.
Mission proprietary data is presented in a mission-specific secure portal while publicly available data is aggregated in the public portal.
Credit: D. Freeborn, D. Woollard, E. Law, D. Crichton, L. Kay-Im,
Apache OODT, Solr, Tika
9-Mar-11
![Page 33: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/33.jpg)
Earth System Grid Federation
ARR-MATTMANN 33
• DOE-funded federation to distribute climate model output to the climate modeling community• Common services for access to repositories and portals/gateways • Highly decoupled• Open source framework (software packaged and distributed) mandated by DOE SciDAC Program• A Recent question….how do you link observations and climate model output?
Apache Solr
9-Mar-11
![Page 34: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/34.jpg)
Specific Tools (H2O, CO2, …)
34ARR-MATTMANN
Credit: A. Braverman, C. Mattmann, D. Crichton,L. Cinquini, M. Cayanan
Apache OODT, Apache Solr
Application to Climate Research(…Climate Data Exchange)
9-Mar-11
![Page 35: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/35.jpg)
Application to Cancer Research: Early Detection Research Network
Data and Computers interconnected to
f orm a virtual database Integrated Cancer Resources
•Specimens•Images•Assays•Biomarkers•etc
From: distributed research databases
•EDRN has pioneered the use of informatics technologies to support biomarker research
•EDRN has developed a comprehensive infrastructure to support biomarker data management across EDRN’s distributed cancer centers
• It supports capture and access to a diverse collection of distributed sets of information and results based on a core ontology for biomarker research
35ARR-MATTMANN
Credit: D. Crichton, C. Mattmann, S. KellyA. Hart, H. Kincaid, S. Hughes
Apache OODT, Apache Solr
9-Mar-11
![Page 36: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/36.jpg)
Application to Health Informatics: Virtual Pediatric Intensive Care Unit
• Principal collaboration with Childrens Hospital Los Angeles to develop a research network to capture and share data sets from pediatric intensive care units
• CHLA has established a collaboration across 85 hospitals
• Grant funded by the National Library of Medicine to research an informatics infrastructure and validate methods to support data analysis– Ultimate goal is to improve decision support through an
integrated network of information and researchers
ARR-MATTMANN 36
Apache OODT, Apache Solr
9-Mar-11
![Page 37: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/37.jpg)
The Square Kilometer Array
• 1 sq. km ofantennas
• Never-beforeseen resolution looking intothe sky
• 700 TB– Per second!
9-Mar-11 37ARR-MATTMANN
Apache OODT, Apache Solr
![Page 38: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/38.jpg)
Alright, I’ll shut up now
• Any questions?
• THANK YOU!– [email protected]– @chrismattmann on Twitter
9-Mar-11 38ARR-MATTMANN
![Page 39: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/39.jpg)
Acknowledgements
• Some Tika material inspired by Jukka Zitting’s talks– http://www.slideshare.net/jukka/text-and-metadata-
extraction-with-apache-tika– http://www.slideshare.net/jukka/text-and-metadata-
extraction-with-apache-tika-4427630
• NASA Jet Propulsion Laboratory– OODT Team
• Dan Crichton’s keynote from ApacheConNA 2011
9-Mar-11 39ARR-MATTMANN
![Page 40: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/40.jpg)
2011 Workshop on Software Engineering for Cloud Computing
(SECLOUD)• This is the
workshop at ICSE that Craig was talking about
• 10 accepted research papers and 5 “demos”
• Come join us in Hawaii!
9-Mar-11 ARR-MATTMANN 40
https://sites.google.com/site/icsecloud2011/
![Page 41: A View From the Trenches: Open-Source, Data-Intensive Software Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.](https://reader035.fdocuments.us/reader035/viewer/2022081514/56649d4a5503460f94a26781/html5/thumbnails/41.jpg)
Book
• Jukka Zitting and I are writinga book on Tika– Working on Chapters 14
and 15 of 15
• Early Access availablethrough MEAPprogram
• http://manning.com/mattmann/
9-Mar-11 41ARR-MATTMANN