semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web...

196
Section URL Topic Specific Organizat Type Story Story Narrative Semantic MindTouch Slides Slides Graphics Semantic PowerPoin Slide 1 Data Science for NIST Bi http://se Slides Graphics Semantic PowerPoin Slide 2 Introduction http://se Slides Graphics Semantic PowerPoin Slide 3 Federal Big Data Working http://se Slides Graphics Semantic PowerPoin Slide 4 NIST Requests Comments o http://se Slides Graphics Semantic PowerPoin Slide 5 NIST Big Data interopera http://se Slides Graphics Semantic PowerPoin Slide 6 NIST Big Data interopera http://se Slides Graphics Semantic PowerPoin Slide 7 Purpose http://se Slides Graphics Semantic PowerPoin Slide 8 Data Mining Standard Pro http://se Slides Graphics Semantic PowerPoin Slide 9 Method and Results http://se Slides Graphics Semantic PowerPoin Slide 10 Data Mining Standard Re http://se Slides Graphics Semantic PowerPoin Slide 11 Data Science for NIST B http://se Slides Graphics Semantic PowerPoin Slide 12 Data Science for NIST B http://se Slides Graphics Semantic PowerPoin Slide 13 Data Science for NIST B http://se Slides Graphics Semantic PowerPoin Slide 14 Data Science for NIST B http://se Slides Graphics Semantic PowerPoin Slide 15 Data Science for NIST B http://se Slides Graphics Semantic PowerPoin Slide 16 Data Science for NIST B http://se Slides Graphics Semantic PowerPoin Slide 17 Data Science for NIST B http://se Slides Graphics Semantic PowerPoin Slide 18 Data Science for NIST B http://se Slides Graphics Semantic PowerPoin Slide 19 Data Science for NIST B http://se Slides Graphics Semantic PowerPoin Slide 20 Conclusions and Recomme http://se Slides Graphics Semantic PowerPoin Spotfire Dashboard Spotfire Interacti TIBCO Spotfire Research Notes Research Notes Semantic MindTouch Comment Template for SP1500-x (r Research Template NIST MindTouch NIST Requests Comments on NIST B Research Request NIST MindTouch Definitions Definitio Text NIST Word Cover Page Definitio Text NIST Word Inside Cover Page Definitio Text NIST Word National Institute of Standards Definitio Text NIST Word Reports on Computer Systems Tech Definitio Text NIST Word Abstract Definitio Text NIST Word Acknowledgements Definitio Text NIST Word Notice to Readers Definitio Text NIST Word Table of Contents Definitio Text NIST Word Executive Summary Definitio Text NIST Word 1 Introduction Definitio Text NIST Word 1.1 Background Definitio Text NIST Word 1.2 Scope and Objectives of the Definitio Text NIST Word 1.3 Report Production Definitio Text NIST Word 1.4 Report Structure Definitio Text NIST Word 1.5 Future Work on this Volume Definitio Text NIST Word 2 Big Data and Data Science Defi Definitio Text NIST Word 2.1 Big Data Definitions Definitio Text NIST Word http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se http://se

Transcript of semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web...

Page 1: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Section URL Topic Specific Organizati Type URLStory Story Narrative Semantic MindTouchSlides Slides Graphics Semantic PowerPoinSlide 1 Data Science for NIST Big Data http://semSlides Graphics Semantic PowerPoinSlide 2 Introduction http://semSlides Graphics Semantic PowerPoinSlide 3 Federal Big Data Working Group http://semSlides Graphics Semantic PowerPoinSlide 4 NIST Requests Comments on NISThttp://semSlides Graphics Semantic PowerPoinSlide 5 NIST Big Data interoperability http://semSlides Graphics Semantic PowerPoinSlide 6 NIST Big Data interoperability http://semSlides Graphics Semantic PowerPoinSlide 7 Purpose http://semSlides Graphics Semantic PowerPoinSlide 8 Data Mining Standard Process http://semSlides Graphics Semantic PowerPoinSlide 9 Method and Results http://semSlides Graphics Semantic PowerPoinSlide 10 Data Mining Standard Results http://semSlides Graphics Semantic PowerPoinSlide 11 Data Science for NIST Big Dat http://semSlides Graphics Semantic PowerPoinSlide 12 Data Science for NIST Big Dat http://semSlides Graphics Semantic PowerPoinSlide 13 Data Science for NIST Big Dat http://semSlides Graphics Semantic PowerPoinSlide 14 Data Science for NIST Big Dat http://semSlides Graphics Semantic PowerPoinSlide 15 Data Science for NIST Big Data http://semSlides Graphics Semantic PowerPoinSlide 16 Data Science for NIST Big Data http://semSlides Graphics Semantic PowerPoinSlide 17 Data Science for NIST Big Data http://semSlides Graphics Semantic PowerPoinSlide 18 Data Science for NIST Big Data http://semSlides Graphics Semantic PowerPoinSlide 19 Data Science for NIST Big Data http://semSlides Graphics Semantic PowerPoinSlide 20 Conclusions and Recommendathttp://semSlides Graphics Semantic PowerPoinSpotfire Dashboard Spotfire D InteractiveTIBCO SpotfireResearch Notes Research NNotes Semantic MindTouchComment Template for SP1500-x (repla Research NTemplate NIST MindTouchNIST Requests Comments on NIST Big Da Research NRequest NIST MindTouchDefinitions DefinitionsText NIST WordCover Page DefinitionsText NIST WordInside Cover Page DefinitionsText NIST WordNational Institute of Standards and Tec DefinitionsText NIST WordReports on Computer Systems Technolo DefinitionsText NIST WordAbstract DefinitionsText NIST WordAcknowledgements DefinitionsText NIST WordNotice to Readers DefinitionsText NIST WordTable of Contents DefinitionsText NIST WordExecutive Summary DefinitionsText NIST Word1 Introduction DefinitionsText NIST Word1.1 Background DefinitionsText NIST Word1.2 Scope and Objectives of the Defini DefinitionsText NIST Word1.3 Report Production DefinitionsText NIST Word1.4 Report Structure DefinitionsText NIST Word1.5 Future Work on this Volume DefinitionsText NIST Word2 Big Data and Data Science Definitions DefinitionsText NIST Word2.1 Big Data Definitions DefinitionsText NIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Storyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Acknowledgementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Slideshttp://semanticommunity.info/@api/deki/files/33792/BrandNiemann05212015.pptx

http://semanticommunity.info/http://semanticommunity.info/@api/deki/files/33794/BrandNiemann05212015Slide2.PNGhttp://www.meetup.com/Federal-Big-Data-Working-Group/events/222458479/http://bigdatawg.nist.gov/V1_output_docs.phphttp://semanticommunity.info/@api/deki/files/33796/BrandNiemann05212015Slide5.PNGhttp://semanticommunity.info/@api/deki/files/33799/BrandNiemann05212015Slide6.PNGhttp://semanticommunity.info/@api/deki/files/33800/BrandNiemann05212015Slide7.PNGhttp://semanticommunity.info/Data_Science/Data_Science_for_Data_Mininghttp://semanticommunity.info/@api/deki/files/33803/BrandNiemann05212015Slide9.PNGhttp://semanticommunity.info/@api/deki/files/33802/BrandNiemann05212015Slide10.PNGhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Frameworkhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Frameworkhttp://semanticommunity.info/@api/deki/files/33791/NISTBigData.xlsxhttp://semanticommunity.info/@api/deki/files/33791/NISTBigData.xlsxhttp://semanticommunity.info/@api/deki/files/33808/BrandNiemann05212015Slide15.PNGhttp://semanticommunity.info/@api/deki/files/33810/BrandNiemann05212015Slide16.PNGhttp://semanticommunity.info/@api/deki/files/33809/BrandNiemann05212015Slide17.PNGhttp://semanticommunity.info/@api/deki/files/33812/BrandNiemann05212015Slide18.PNGhttp://semanticommunity.info/@api/deki/files/33811/BrandNiemann05212015Slide19.PNGhttp://semanticommunity.info/@api/deki/files/33795/BrandNiemann05212015Slide20.PNG

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Spotfire_Dashboardhttps://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvchttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Research_Noteshttp://www.meetup.com/Federal-Big-Data-Working-Group/events/222458479/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Comment_Template_for_SP1500-x_(replace_x_with_volume_number)http://bigdatawg.nist.gov/_uploadfiles/SP1500-1-to-7_comment_template.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#NIST_Requests_Comments_on_NIST_Big_Data_interoperability_Frameworkhttp://bigdatawg.nist.gov/_uploadfiles/SP1500-1-to-7_comment_template.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Definitionshttp://bigdatawg.nist.gov/_uploadfiles/M0392_v1_3022325181.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Cover_Pagehttp://dx.doi.org/10.6028/NIST.SP.1500-1http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Inside_Cover_Pagehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#National_Institute_of_Standards_and_Technology_NIST_Special_Publication_1500-1http://www.nist.gov/publication-portal.cfmhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Reports_on_Computer_Systems_Technologyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Abstracthttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Acknowledgementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Notice_to_Readershttp://bigdatawg.nist.gov/_uploadfiles/SP1500-1-to-7_comment_template.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Table_of_Contentshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Executive_Summaryhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1_Introductionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.1_Backgroundhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B1.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.2_Scope_and_Objectives_of_the_Definitions_and_Taxonomies_Subgrouphttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.3_Report_Productionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.4_Report_Structurehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.5_Future_Work_on_this_Volumehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2_Big_Data_and_Data_Science_Definitionshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B2.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.1_Big_Data_Definitions

Page 2: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

2.2 Data Science Definitions DefinitionsText NIST WordFigure 1: Skills Needed in Data Science DefinitionsFigure NIST Word2.3 Other Big Data Definitions DefinitionsText NIST WordTable 1: Sampling of Concepts Attribute DefinitionsTable NIST Word3 Big Data Features DefinitionsText NIST Word3.1 Data Elements and Metadata DefinitionsText NIST Word3.2 Data Records and Non-Relational M DefinitionsText NIST Word3.3 Dataset Characteristics and Storage DefinitionsText NIST Word3.4 Data in Motion DefinitionsText NIST Word3.5 Data Science Lifecycle Model for Big DefinitionsText NIST Word3.6 Big Data Analytics DefinitionsText NIST Word3.7 Big Data Metrics and Benchmarks DefinitionsText NIST Word3.8 Big Data Security and Privacy DefinitionsText NIST Word3.9 Data Governance DefinitionsText NIST Word4 Big Data Engineering Patterns (Funda DefinitionsText NIST WordAppendix A:Index of Terms DefinitionsAppendix NIST WordA DefinitionsAppendix NIST WordB DefinitionsAppendix NIST WordC DefinitionsAppendix NIST WordD DefinitionsAppendix NIST WordF DefinitionsAppendix NIST WordM DefinitionsAppendix NIST WordN DefinitionsAppendix NIST WordO DefinitionsAppendix NIST WordP DefinitionsAppendix NIST WordR DefinitionsAppendix NIST WordS DefinitionsAppendix NIST WordT DefinitionsAppendix NIST WordU DefinitionsAppendix NIST WordV DefinitionsAppendix NIST WordAppendix B: Terms and Definitions DefinitionsAppendix NIST WordAppendix C: Acronyms DefinitionsAppendix NIST WordAppendix D: References DefinitionsAppendix NIST Word[1] DefinitionsReferencesNIST Word[2] DefinitionsReferencesNIST Word[3] DefinitionsReferencesNIST Word[4] DefinitionsReferencesNIST Word[5] DefinitionsReferencesNIST Word[6] DefinitionsReferencesNIST Word[7] DefinitionsReferencesNIST Word[8] DefinitionsReferencesNIST Word[9] DefinitionsReferencesNIST Word[10] DefinitionsReferencesNIST Word[11] DefinitionsReferencesNIST Word[12] DefinitionsReferencesNIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.2_Data_Science_Definitionshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_1:_Skills_Needed_in_Data_Sciencehttp://semanticommunity.info/@api/deki/files/33566/Volume1Figure1.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.3_Other_Big_Data_Definitionshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B3.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Table_1:_Sampling_of_Concepts_Attributed_to_Big_Datahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B7.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3_Big_Data_Featureshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.1_Data_Elements_and_Metadatahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B14.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.2_Data_Records_and_Non-Relational_Modelshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.3_Dataset_Characteristics_and_Storagehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B20.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.4_Data_in_Motionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B21.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.5_Data_Science_Lifecycle_Model_for_Big_Datahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.6_Big_Data_Analyticshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.7_Big_Data_Metrics_and_Benchmarkshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.8_Big_Data_Security_and_Privacyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.9_Data_Governancehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#4_Big_Data_Engineering_Patterns_(Fundamental_Concepts)http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Appendix_A:Index_of_Termshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Ahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Bhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Chttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Fhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Mhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Nhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Ohttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Phttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Rhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Shttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Thttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Uhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Vhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Appendix_B:_Terms_and_Definitionshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Appendix_C:_Acronymshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Appendix_D:_Referenceshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B1.5Dhttp://www.whitehouse.gov/blog/2012/03/29/big-data-big-dealhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B2.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B3.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B4.5Dhttp://www.technologyreview.com/view/519851/the-big-data-conundrum-how-to-define-it/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://jtc1bigdatasg.nist.gov/_uploadfiles/N0095_Final_SGBD_Report_to_JTC1.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B6.5Dhttp://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B7.5Dhttp://jtc1bigdatasg.nist.gov/_uploadfiles/N0095_Final_SGBD_Report_to_JTC1.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B8.5Dhttp://www.gartner.com/it-glossary/big-datahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B9.5Dhttp://datascience.berkeley.edu/what-is-big-data/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B10.5Dhttp://www.oed.com/view/Entry/18833#eid301162178http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B11.5Dhttp://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B12.5D

Page 3: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

[13] DefinitionsReferencesNIST Word[14] DefinitionsReferencesNIST Word[15] DefinitionsReferencesNIST Word[16] DefinitionsReferencesNIST Word[17] DefinitionsReferencesNIST Word[18] DefinitionsReferencesNIST Word[19] DefinitionsReferencesNIST Word[20] DefinitionsReferencesNIST Word[21] DefinitionsReferencesNIST Word[22] DefinitionsReferencesNIST WordTaxonomies TaxonomieText NIST WordCover Page TaxonomieText NIST WordInside Cover Page TaxonomieText NIST WordNational Institute of Standards and Tec TaxonomieText NIST WordReports on Computer Systems Technolo TaxonomieText NIST WordAbstract TaxonomieText NIST WordAcknowledgements TaxonomieText NIST WordNotice to Readers TaxonomieText NIST WordTable of Contents TaxonomieText NIST WordExecutive Summary TaxonomieText NIST Word1 Introduction TaxonomieText NIST Word1.1 Background TaxonomieText NIST Word1.2 Scope and Objectives of the Defini TaxonomieText NIST Word1.3 Report Production TaxonomieText NIST Word1.4 Report Structure TaxonomieText NIST Word1.5 Future Work on this Volume TaxonomieText NIST Word2 Reference Architecture Taxonomy TaxonomieText NIST Word2.1 Actors and Roles TaxonomieText NIST WordFigure 1: NIST Big Data Reference Archi TaxonomieFigure NIST WordFigure 2: Roles and a Sampling of Acto TaxonomieFigure NIST Word2.2 System Orchestrator TaxonomieText NIST WordFigure 3: System Orchestrator Actors and TaxonomieFigure NIST Word2.3 Data Provider TaxonomieText NIST WordFigure 4: Data Provider Actors and Activi TaxonomieFigure NIST Word2.4 Big Data Application Provider TaxonomieText NIST WordFigure 5: Big Data Application Provider A TaxonomieFigure NIST Word2.5 Big Data Framework Provider TaxonomieText NIST WordFigure 6: Big Data Framework Provider Ac TaxonomieFigure NIST Word2.6 Data Consumer TaxonomieText NIST WordFigure 7: Data Consumer Actors and Acti TaxonomieFigure NIST Word2.7 Management Fabric TaxonomieText NIST WordFigure 8: Big Data Management Actors an TaxonomieFigure NIST Word2.8 Security and Privacy Fabric TaxonomieText NIST WordFigure 9: Big Data Security and Privacy A TaxonomieFigure NIST Word3 Data Characteristic Hierarchy TaxonomieText NIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B13.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B14.5Dhttp://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=39479http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B15.5Dhttp://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=35646http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B16.5Dhttp://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=35343http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B17.5Dhttp://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=53798http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B18.5Dhttp://www.w3.org/2013/data/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B19.5Dhttp://www.w3.org/2001/sw/interest/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B20.5Dhttp://www.emc.com/leadership/programs/digital-universe.htmhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B21.5Dhttp://dx.doi.org/10.6028/NIST.SP.500-293http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B22.5Dhttp://csrc.nist.gov/publications/nistpubs/800-146/sp800-146.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Taxonomieshttp://bigdatawg.nist.gov/_uploadfiles/M0393_v1_3613775223.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Cover_Page_2http://dx.doi.org/10.6028/NIST.SP.1500-2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Inside_Cover_Page_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#National_Institute_of_Standards_and_Technology_Special_Publication_1500-2http://www.nist.gov/publication-portal.cfmhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Reports_on_Computer_Systems_Technology_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Abstract_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Acknowledgements_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Notice_to_Readers_2http://bigdatawg.nist.gov/_uploadfiles/SP1500-1-to-7_comment_template.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Table_of_Contents_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Executive_Summary_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1_Introduction_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.1_Background_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B1.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.2_Scope_and_Objectives_of_the_Definitions_and_Taxonomies_Subgroup_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.3_Report_Production_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.4_Report_Structure_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.5_Future_Work_on_this_Volume_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2_Reference_Architecture_Taxonomyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.1_Actors_and_Roleshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_1:_NIST_Big_Data_Reference_Architecturehttp://semanticommunity.info/@api/deki/files/33575/Volume2Figure1.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_2:_Roles_and_a_Sampling_of_Actors_in_the_NBDRA_Taxonomyhttp://semanticommunity.info/@api/deki/files/33579/Volume2Figure2.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.2_System_Orchestratorhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_3:_System_Orchestrator_Actors_and_Activitieshttp://semanticommunity.info/@api/deki/files/33577/Volume2Figure3.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.3_Data_Providerhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B2.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_4:_Data_Provider_Actors_and_Activitieshttp://semanticommunity.info/@api/deki/files/33580/Volume2Figure4.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.4_Big_Data_Application_Providerhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_5:_Big_Data_Application_Provider_Actors_and_Activitieshttp://semanticommunity.info/@api/deki/files/33578/Volume2Figure5.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.5_Big_Data_Framework_Providerhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_6:_Big_Data_Framework_Provider_Actors_and_Activitieshttp://semanticommunity.info/@api/deki/files/33581/Volume2Figure6.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.6_Data_Consumerhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_7:_Data_Consumer_Actors_and_Activitieshttp://semanticommunity.info/@api/deki/files/33582/Volume2Figure7.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.7_Management_Fabrichttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_8:_Big_Data_Management_Actors_and_Activitieshttp://semanticommunity.info/@api/deki/files/33584/Volume2Figure8.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.8_Security_and_Privacy_Fabrichttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_9:_Big_Data_Security_and_Privacy_Actors_and_Activitieshttp://semanticommunity.info/@api/deki/files/33583/Volume2Figure9.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3_Data_Characteristic_Hierarchy

Page 4: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Figure 10: Data Characteristic Hierarchy TaxonomieFigure NIST Word3.1 Data Elements TaxonomieText NIST Word3.2 Records TaxonomieText NIST Word3.3 Datasets TaxonomieText NIST Word3.4 Multiple Datasets TaxonomieText NIST Word4 Summary TaxonomieText NIST WordAppendix A: Acronyms TaxonomieAppendix NIST WordAppendix B: References TaxonomieAppendix NIST Word[1] TaxonomieReferencesNIST Word[2] TaxonomieReferencesNIST Word[3] TaxonomieReferencesNIST WordUse Case & Requirements Use Case &Text NIST WordCover Page Use Case &Text NIST WordInside Cover Page Use Case &Text NIST WordNational Institute of Standards and Tec Use Case &Text NIST WordReports on Computer Systems Technolo Use Case &Text NIST WordAbstract Use Case &Text NIST WordAcknowledgements Use Case &Text NIST WordNotice to Readers Use Case &Text NIST WordTable of Contents Use Case &Text NIST WordExecutive Summary Use Case &Text NIST Word1 Introduction Use Case &Text NIST Word1.1 Background Use Case &Text NIST Word1.2 Scope and Objectives of the Use Ca Use Case &Text NIST Word1.3 Report Production Use Case &Text NIST Word1.4 Report Structure Use Case &Text NIST Word1.5 Future Work on this Volume Use Case &Text NIST Word2 Use Case Summaries Use Case &Text NIST Word2.1 Use Case Development Process Use Case &Text NIST Word2.2 Government Operation Use Case &Text NIST Word2.2.1 Use Case 1: Census 2010 and 2000 Use Case &Text NIST Word2.2.2 Use Case 2: NARA Accession, Searc Use Case &Text NIST Word2.2.3 Use Case 3: Statistical Survey Re Use Case &Text NIST Word2.2.4 Use Case 4: Non-Traditional Data Use Case &Text NIST Word2.3 Commercial Use Case &Text NIST Word2.3.1 Use Case 5: Cloud Eco-System for F Use Case &Text NIST Word2.3.2 Use Case 6: Mendeley – An Intern Use Case &Text NIST Word2.3.3 Use Case 7: Netflix Movie Service Use Case &Text NIST Word2.3.4 Use Case 8: Web Search Use Case &Text NIST Word2.3.5 Use Case 9: Big Data Business Con Use Case &Text NIST Word2.3.6 Use Case 10: Cargo Shipping Use Case &Text NIST WordFigure 1: Cargo Shipping Scenario Use Case &Figure NIST Word2.3.7 Use Case 11: Materials Data for M Use Case &Text NIST Word2.3.8 Use Case 12: Simulation-Driven M Use Case &Text NIST Word2.4 Defense Use Case &Text NIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_10:_Data_Characteristic_Hierarchyhttp://semanticommunity.info/@api/deki/files/33576/Volume2Figure10.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.1_Data_Elementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B3.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.2_Recordshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.3_Datasetshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.4_Multiple_Datasetshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#4_Summaryhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Appendix_A:_Acronymshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Appendix_B:_Referenceshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B1.5D_2http://www.whitehouse.gov/blog/2012/03/29/big-data-big-dealhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B2.5D_2http://www.data.gov/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B3.5D_2http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=40874http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Use_Case_.26_Requirementshttp://bigdatawg.nist.gov/_uploadfiles/M0394_v1_4746659136.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Cover_Page_3http://dx.doi.org/10.6028/NIST.http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Inside_Cover_Page_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#National_Institute_of_Standards_and_Technology_Special_Publication_1500-3http://www.nist.gov/publication-portal.cfmhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Reports_on_Computer_Systems_Technology_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Abstract_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Acknowledgements_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Notice_to_Readers_3http://bigdatawg.nist.gov/_uploadfiles/SP1500-1-to-7_comment_template.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Table_of_Contents_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Executive_Summary_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1_Introduction_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.1_Background_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B1.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.2_Scope_and_Objectives_of_the_Use_Cases_and_Requirements_Subgrouphttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.3_Report_Production_3http://bigdatawg.nist.gov/usecases.phphttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.4_Report_Structure_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#1.5_Future_Work_on_this_Volume_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B2.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2_Use_Case_Summarieshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.1_Use_Case_Development_Processhttp://bigdatawg.nist.gov/usecases.phphttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.2_Government_Operationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.2.1_Use_Case_1:_Census_2010_and_2000.E2.80.94Title_13_Big_Datahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.2.2_Use_Case_2:_NARA_Accession.2C_Search.2C_Retrieve.2C_Preservationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.2.3_Use_Case_3:_Statistical_Survey_Response_Improvementhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.2.4_Use_Case_4:_Non-Traditional_Data_in_Statistical_Survey_Response_Improvement_(Adaptive_Design)http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.3_Commercialhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.3.1_Use_Case_5:_Cloud_Eco-System_for_Financial_Industrieshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.3.2_Use_Case_6:_Mendeley_.E2.80.93_An_International_Network_of_Researchhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.3.3_Use_Case_7:_Netflix_Movie_Servicehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.3.4_Use_Case_8:_Web_Searchhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.3.5_Use_Case_9:_Big_Data_Business_Continuity_and_Disaster_Recovery_Within_a_Cloud_Eco-Systemhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.3.6_Use_Case_10:_Cargo_Shippinghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_1:_Cargo_Shipping_Scenariohttp://semanticommunity.info/@api/deki/files/33586/Volume3Figure1.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.3.7_Use_Case_11:_Materials_Data_for_Manufacturinghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.3.8_Use_Case_12:_Simulation-Driven_Materials_Genomicshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.4_Defense

Page 5: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

2.4.1 Use Case 13: Cloud Large-Scale Geo Use Case &Text NIST Word2.4.2 Use Case 14: Object Identificatio Use Case &Text NIST Word2.4.3 Use Case 15: Intelligence Data Pro Use Case &Text NIST Word2.5 Health Care and Life Sciences Use Case &Text NIST Word2.5.1 Use Case 16: Electronic Medical R Use Case &Text NIST Word2.5.2 Use Case 17: Pathology Imaging/Di Use Case &Text NIST WordFigure 2: Pathology Imaging/Digital Pa Use Case &Figure NIST WordFigure 3: Pathology Imaging/Digital Pat Use Case &Figure NIST Word2.5.3 Use Case 18: Computational Bioim Use Case &Text NIST Word2.5.4 Use Case 19: Genomic Measureme Use Case &Text NIST Word2.5.5 Use Case 20: Comparative Analys Use Case &Text NIST Word2.5.6 Use Case 21: Individualized Diab Use Case &Text NIST Word2.5.7 Use Case 22: Statistical Relational A Use Case &Text NIST Word2.5.8 Use Case 23: World Population-Sca Use Case &Text NIST Word2.5.9 Use Case 24: Social Contagion Mod Use Case &Text NIST Word2.5.10 Use Case 25: Biodiversity and Li Use Case &Text NIST Word2.6 Deep Learning and Social Media Use Case &Text NIST Word2.6.1 Use Case 26: Large-Scale Deep Lea Use Case &Text NIST Word2.6.2 Use Case 27: Organizing Large-Sca Use Case &Text NIST Word2.6.3 Use Case 28: Truthy—Information Use Case &Text NIST Word2.6.4 Use Case 29: Crowd Sourcing in th Use Case &Text NIST Word2.6.5 Use Case 30: CINET—Cyberinfrastr Use Case &Text NIST Word2.6.6 Use Case 31: NIST Information Ac Use Case &Text NIST Word2.7 The Ecosystem for Research Use Case &Text NIST Word2.7.1 Use Case 32: DataNet Federation Use Case &Text NIST WordFigure 4: DataNet Federation Consortiu Use Case &Figure NIST Word2.7.2 Use Case 33: The Discinnet Proces Use Case &Text NIST Word2.7.3 Use Case 34: Semantic Graph Searc Use Case &Text NIST Word2.7.4 Use Case 35: Light Source Beamlin Use Case &Text NIST Word2.8 Astronomy and Physics Use Case &Text NIST Word2.8.1 Use Case 36: Catalina Real-Time Tr Use Case &Text NIST WordFigure 5: Catalina CRTS: A Digital, Pano Use Case &Figure NIST Word2.8.2 Use Case 37: DOE Extreme Data fr Use Case &Text NIST Word2.8.3 Use Case 38: Large Survey Data f Use Case &Text NIST Word2.8.4 Use Case 39: Particle Physics—Anal Use Case &Text NIST WordFigure 6: Particle Physics: Analysis of L Use Case &Figure NIST WordFigure 7: Particle Physics: Analysis of L Use Case &Figure NIST Word2.8.5 Use Case 40: Belle II High Energy Use Case &Text NIST Word2.9 Earth, Environmental, and Polar Sci Use Case &Text NIST Word2.9.1 Use Case 41: EISCAT 3D Incoheren Use Case &Text NIST WordFigure 8: EISCAT 3D Incoherent Scatter Use Case &Text NIST Word2.9.2 Use Case 42: ENVRI, Common Opera Use Case &Text NIST WordFigure 9: ENVRI Common Architecture Use Case &Figure NIST WordFigure 10(a): ICOS Architecture Use Case &Figure NIST WordFigure 10(b): LifeWatch Architecture Use Case &Figure NIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.4.1_Use_Case_13:_Cloud_Large-Scale_Geospatial_Analysis_and_Visualizationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.4.2_Use_Case_14:_Object_Identification_and_Tracking_from_Wide-Area_Large_Format_Imagery_or_Full_Motion_Video.E2.80.94Persistent_Surveillancehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.4.3_Use_Case_15:_Intelligence_Data_Processing_and_Analysishttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.5_Health_Care_and_Life_Scienceshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.5.1_Use_Case_16:_Electronic_Medical_Record_(EMR)_Datahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.5.2_Use_Case_17:_Pathology_Imaging.2FDigital_Pathologyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_2:_Pathology_Imaging.2FDigital_Pathology.E2.80.94Examples_of_2-D_and_3-D_Pathology_Imageshttp://semanticommunity.info/@api/deki/files/33589/Volume3Figure2.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_3:_Pathology_Imaging.2FDigital_Pathologyhttp://semanticommunity.info/@api/deki/files/33585/Volume3Figure3.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.5.3_Use_Case_18:_Computational_Bioimaginghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.5.4_Use_Case_19:_Genomic_Measurementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.5.5_Use_Case_20:_Comparative_Analysis_for_Metagenomes_and_Genomeshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.5.6_Use_Case_21:_Individualized_Diabetes_Managementhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.5.7_Use_Case_22:_Statistical_Relational_Artificial_Intelligence_for_Health_Carehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.5.8_Use_Case_23:_World_Population-Scale_Epidemiological_Studyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.5.9_Use_Case_24:_Social_Contagion_Modeling_for_Planning.2C_Public_Health.2C_and_Disaster_Managementhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.5.10_Use_Case_25:_Biodiversity_and_LifeWatchhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.6_Deep_Learning_and_Social_Mediahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.6.1_Use_Case_26:_Large-Scale_Deep_Learninghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.6.2_Use_Case_27:_Organizing_Large-Scale.2C_Unstructured_Collections_of_Consumer_Photoshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.6.3_Use_Case_28:_Truthy.E2.80.94Information_Diffusion_Research_from_Twitter_Datahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.6.4_Use_Case_29:_Crowd_Sourcing_in_the_Humanities_as_Source_for_Big_and_Dynamic_Datahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.6.5_Use_Case_30:_CINET.E2.80.94Cyberinfrastructure_for_Network_(Graph)_Science_and_Analyticshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.6.6_Use_Case_31:_NIST_Information_Access_Division.E2.80.94Analytic_Technology_Performance_Measurements.2C_Evaluations.2C_and_Standardshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.7_The_Ecosystem_for_Researchhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.7.1_Use_Case_32:_DataNet_Federation_Consortium_(DFC)http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_4:_DataNet_Federation_Consortium_DFC_.E2.80.93_iRODS_Architecturehttp://semanticommunity.info/@api/deki/files/33587/Volume3Figure4.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.7.2_Use_Case_33:_The_Discinnet_Processhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.7.3_Use_Case_34:_Semantic_Graph_Search_on_Scientific_Chemical_and_Text-Based_Datahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B8.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.7.4_Use_Case_35:_Light_Source_Beamlineshttp://vsg3d.com/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.8_Astronomy_and_Physicshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.8.1_Use_Case_36:_Catalina_Real-Time_Transient_Survey:_A_Digital.2C_Panoramic.2C_Synoptic_Sky_Surveyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_5:_Catalina_CRTS:_A_Digital.2C_Panoramic.2C_Synoptic_Sky_Surveyhttp://semanticommunity.info/@api/deki/files/33590/Volume3Figure5.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.8.2_Use_Case_37:_DOE_Extreme_Data_from_Cosmological_Sky_Survey_and_Simulationshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.8.3_Use_Case_38:_Large_Survey_Data_for_Cosmologyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.8.4_Use_Case_39:_Particle_Physics.E2.80.94Analysis_of_Large_Hadron_Collider_Data:_Discovery_of_Higgs_Particlehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_6:_Particle_Physics:_Analysis_of_LHC_Data:_Discovery_of_Higgs_Particle_.E2.80.93_CERN_LHC_Locationhttp://semanticommunity.info/@api/deki/files/33591/Volume3Figure6.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_7:_Particle_Physics:_Analysis_of_LHC_Data:_Discovery_of_Higgs_Particle_.E2.80.93_The_Multi-tier_LHC_Computing_Infrastructurehttp://semanticommunity.info/@api/deki/files/33593/Volume3Figure7.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.8.5_Use_Case_40:_Belle_II_High_Energy_Physics_Experimenthttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.9_Earth.2C_Environmental.2C_and_Polar_Sciencehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.9.1_Use_Case_41:_EISCAT_3D_Incoherent_Scatter_Radar_Systemhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_8:_EISCAT_3D_Incoherent_Scatter_Radar_System_.E2.80.93_System_Architecturehttp://semanticommunity.info/@api/deki/files/33592/Volume3Figure8.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.9.2_Use_Case_42:_ENVRI.2C_Common_Operations_of_Environmental_Research_Infrastructurehttp://www.envri.eu/rmhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_9:_ENVRI_Common_Architecturehttp://semanticommunity.info/@api/deki/files/33594/Volume3Figure9.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_10(a):_ICOS_Architecturehttp://semanticommunity.info/@api/deki/files/33595/Volume3Figure10a.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_10(b):_LifeWatch_Architecturehttp://semanticommunity.info/@api/deki/files/33598/Volume3Figure10b.png

Page 6: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Figure 10(c): EMSO Architecture Use Case &Figure NIST WordFigure 10(d): EURO-Argo Architecture Use Case &Figure NIST WordFigure 10(e): EISCAT 3D Architecture Use Case &Figure NIST Word2.9.3 Use Case 43: Radar Data Analysis f Use Case &Text NIST WordFigure 11: Typical CReSIS Radar Data Aft Use Case &Figure NIST WordFigure 12: Radar Data Analysis for CReS Use Case &Figure NIST WordFigure 13: Typical echogram with detec Use Case &Figure NIST Word2.9.4 Use Case 44: Unmanned Air Vehicl Use Case &Text NIST WordFigure 14: Combined Unwrapped Coseis Use Case &Figure NIST Word2.9.5 Use Case 45: NASA Langley Resear Use Case &Text NIST Word2.9.6 Use Case 46: MERRA Analytic Ser Use Case &Text NIST WordFigure 15: Typical MERRA/AS Output Use Case &Figure NIST Word2.9.7 Use Case 47: Atmospheric Turbulen Use Case &Text NIST WordFigure 16: Typical NASA Image of Turbu Use Case &Figure NIST Word2.9.8 Use Case 48: Climate Studies Usi Use Case &Text NIST Word2.9.9 Use Case 49: DOE Biological and E Use Case &Text NIST Word2.9.10 Use Case 50: DOE BER AmeriFlu Use Case &Text NIST Word2.10 Energy Use Case &Text NIST Word2.10.1 Use Case 51: Consumption Foreca Use Case &Text NIST Word3 Use Case Requirements Use Case &Text NIST Word3.1 Use Case Specific Requirements Use Case &Text NIST Word3.2 General Requirements Use Case &Text NIST WordAppendix A: Use Case Study Source Mate Use Case &Appendix NIST WordNBD-PWG Use Case Studies Template Use Case &Appendix NIST WordComments on fields Use Case &Appendix NIST WordSubmitted Use Case Studies Use Case &Appendix NIST WordGovernment Operation> Use Case 1: Big Use Case &Appendix NIST WordGovernment Operation> Use Case 2: NARA Use Case &Appendix NIST WordGovernment Operation> Use Case 3: Sta Use Case &Appendix NIST WordGovernment Operation> Use Case 4: Non T Use Case &Appendix NIST WordCommercial> Use Case 5: Cloud Computin Use Case &Appendix NIST WordCommercial> Use Case 6: Mendeley—An Use Case &Appendix NIST WordCommercial> Use Case 7: Netflix Movie Use Case &Appendix NIST WordCommercial> Use Case 8: Web Search Use Case &Appendix NIST WordCommercial> Use Case 9: Cloud-based Co Use Case &Appendix NIST WordCommercial> Use Case 10: Cargo Shippi Use Case &Appendix NIST WordCommercial> Use Case 11: Materials Da Use Case &Appendix NIST WordCommercial> Use Case 12: Simulation D Use Case &Appendix NIST WordDefense> Use Case 13: Large Scale Geosp Use Case &Appendix NIST WordDefense> Use Case 14: Object Identificat Use Case &Appendix NIST WordDefense> Use Case 15: Intelligence Data Use Case &Appendix NIST WordHealthcare and Life Sciences> Use Case Use Case &Appendix NIST WordHealthcare and Life Sciences> Use Case Use Case &Appendix NIST WordHealthcare and Life Sciences> Use Case Use Case &Appendix NIST WordHealthcare and Life Sciences> Use Cas Use Case &Appendix NIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_10(c):_EMSO_Architecturehttp://semanticommunity.info/@api/deki/files/33596/Volume3Figure10c.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_10(d):_EURO-Argo_Architecturehttp://semanticommunity.info/@api/deki/files/33597/Volume3Figure10d.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_10(e):_EISCAT_3D_Architecturehttp://semanticommunity.info/@api/deki/files/33599/Volume3Figure10e.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.9.3_Use_Case_43:_Radar_Data_Analysis_for_the_Center_for_Remote_Sensing_of_Ice_Sheetshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_11:_Typical_CReSIS_Radar_Data_After_Analysishttp://semanticommunity.info/@api/deki/files/33600/Volume3Figure11.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_12:_Radar_Data_Analysis_for_CReSIS_Remote_Sensing_of_Ice_Sheetshttp://semanticommunity.info/@api/deki/files/33602/Volume3Figure12.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_13:_Typical_echogram_with_detected_boundarieshttp://semanticommunity.info/@api/deki/files/33601/Volume3Figure13.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.9.4_Use_Case_44:_Unmanned_Air_Vehicle_Synthetic_Aperture_Radar_(UAVSAR)_Data_Processing.2C_Data_Product_Delivery.2C_and_Data_Serviceshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_14:_Combined_Unwrapped_Coseismic_Interferogramshttp://semanticommunity.info/@api/deki/files/33604/Volume3Figure14.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.9.5_Use_Case_45:_NASA_Langley_Research_Center.2F_Goddard_Space_Flight_Center_iRODS_Federation_Test_Bedhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.9.6_Use_Case_46:_MERRA_Analytic_Services_(MERRA.2FAS)http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_15:_Typical_MERRA.2FAS_Outputhttp://semanticommunity.info/@api/deki/files/33603/Volume3Figure15.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.9.7_Use_Case_47:_Atmospheric_Turbulence_.E2.80.93_Event_Discovery_and_Predictive_Analyticshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Figure_16:_Typical_NASA_Image_of_Turbulent_Waveshttp://semanticommunity.info/@api/deki/files/33588/Volume3Figure16.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.9.8_Use_Case_48:_Climate_Studies_Using_the_Community_Earth_System_Model_at_the_U.S._Department_of_Energy_(DOE)_NERSC_Centerhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.9.9_Use_Case_49:_DOE_Biological_and_Environmental_Research_(BER)_Subsurface_Biogeochemistry_Scientific_Focus_Areahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.9.10_Use_Case_50:_DOE_BER_AmeriFlux_and_FLUXNET_Networkshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.10_Energyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.10.1_Use_Case_51:_Consumption_Forecasting_in_Smart_Gridshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3_Use_Case_Requirementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.1_Use_Case_Specific_Requirementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#3.2_General_Requirementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Appendix_A:_Use_Case_Study_Source_Materialshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#NBD-PWG_Use_Case_Studies_Templatehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Comments_on_fieldshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Submitted_Use_Case_Studieshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Government_Operation.3E_Use_Case_1:_Big_Data_Archival:_Census_2010_and_2000http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Government_Operation.3E_Use_Case_2:_NARA_Accession.2C_Search.2C_Retrieve.2C_Preservationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Government_Operation.3E_Use_Case_3:_Statistical_Survey_Response_Improvementcavan.paul.capps@census.govhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Government_Operation.3E_Use_Case_4:_Non_Traditional_Data_in_Statistical_Surveycavan.paul.capps@census.govhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Commercial.3E_Use_Case_5:_Cloud_Computing_in_Financial_Industriespwc.pwcarey@email.comhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Commercial.3E_Use_Case_6:_Mendeley.E2.80.94An_International_Network_of_Researchwilliam.gunn@mendeley.comhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Commercial.3E_Use_Case_7:[email protected]://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Commercial.3E_Use_Case_8:[email protected]://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Commercial.3E_Use_Case_9:_Cloud-based_Continuity_and_Disaster_Recoverypwc.pwcarey@email.comhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Commercial.3E_Use_Case_10:[email protected]://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Commercial.3E_Use_Case_11:[email protected]://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Commercial.3E_Use_Case_12:[email protected]://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Defense.3E_Use_Case_13:_Large_Scale_Geospatial_Analysis_and_Visualizationdboyd@data-tactics.comhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Defense.3E_Use_Case_14:_Object_Identification_and_Tracking_.E2.80.93_Persistent_Surveillancedboyd@data-tactics.comhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Defense.3E_Use_Case_15:_Intelligence_Data_Processing_and_Analysisdboyd@data-tactics.comhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Healthcare_and_Life_Sciences.3E_Use_Case_16:_Electronic_Medical_Record_(EMR)[email protected]://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Healthcare_and_Life_Sciences.3E_Use_Case_17:[email protected]://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Healthcare_and_Life_Sciences.3E_Use_Case_18:_Computational_Bioimaginghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Healthcare_and_Life_Sciences.3E_Use_Case_19:_Genomic_Measurements

Page 7: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Healthcare and Life Sciences> Use Case Use Case &Appendix NIST WordHealthcare and Life Sciences> Use Case Use Case &Appendix NIST WordHealthcare and Life Sciences> Use Case 22 Use Case &Appendix NIST WordHealthcare and Life Sciences> Use Case Use Case &Appendix NIST WordHealthcare and Life Sciences> Use Case Use Case &Appendix NIST WordHealthcare and Life Sciences> Use Case Use Case &Appendix NIST WordDeep Learning and Social Media> Use Ca Use Case &Appendix NIST WordDeep Learning and Social Media> Use Ca Use Case &Appendix NIST WordDeep Learning and Social Media> Use Cas Use Case &Appendix NIST WordDeep Learning and Social Media> Use Ca Use Case &Appendix NIST WordDeep Learning and Social Media> Use Ca Use Case &Appendix NIST WordDeep Learning and Social Media> Use C Use Case &Appendix NIST WordThe Ecosystem for Research> Use Case Use Case &Appendix NIST WordThe Ecosystem for Research> Use Case 33 Use Case &Appendix NIST WordThe Ecosystem for Research> Use Case 34 Use Case &Appendix NIST WordThe Ecosystem for Research> Use Case 3 Use Case &Appendix NIST WordAstronomy and Physics> Use Case 36: Cat Use Case &Appendix NIST WordAstronomy and Physics> Use Case 37: Co Use Case &Appendix NIST WordAstronomy and Physics> Use Case 38: La Use Case &Appendix NIST WordAstronomy and Physics> Use Case 39: Ana Use Case &Appendix NIST WordNote: See Table Below Use Case &Appendix NIST WordAstronomy and Physics> Use Case 40: Be Use Case &Appendix NIST WordEarth, Environmental and Polar Science Use Case &Appendix NIST WordEarth, Environmental and Polar Science Use Case &Appendix NIST WordEarth, Environmental and Polar Science> Use Case &Appendix NIST WordNote: See Table Below Use Case &Appendix NIST WordEarth, Environmental and Polar Science Use Case &Appendix NIST WordEarth, Environmental and Polar Scienc Use Case &Appendix NIST WordEarth, Environmental and Polar Science Use Case &Appendix NIST WordEarth, Environmental and Polar Scienc Use Case &Appendix NIST WordEarth, Environmental and Polar Science Use Case &Appendix NIST WordEarth, Environmental and Polar Science Use Case &Appendix NIST WordEarth, Environmental and Polar Science Use Case &Appendix NIST WordEnergy> Use Case 51: Consumption Forec Use Case &Appendix NIST WordAppendix B: Summary of Key Properties Use Case &Appendix NIST WordTable B-1: Use Case Specific Information Use Case &Table NIST WordAppendix C: Use Case Requirements S Use Case &Appendix NIST WordTable C-1: Use Case Specific Requiremen Use Case &Table NIST WordAppendix D: Use Case Detail Requireme Use Case &Appendix NIST WordTable D-1: Data Sources Requirements Use Case &Table NIST WordGeneral Requirements Use Case &Appendix NIST WordUse Case Specific Requirements for Dat Use Case &Appendix NIST WordTable D-2: Data Transformation Use Case &Table NIST WordGeneral Requirements Use Case &Appendix NIST WordUse Case Specific Requirements for Dat Use Case &Appendix NIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Healthcare_and_Life_Sciences.3E_Use_Case_20:_Comparative_Analysis_for_(meta)_Genomeshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Healthcare_and_Life_Sciences.3E_Use_Case_21:_Individualized_Diabetes_Managementhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Healthcare_and_Life_Sciences.3E_Use_Case_22:_Statistical_Relational_AI_for_Health_Carehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Healthcare_and_Life_Sciences.3E_Use_Case_23:_World_Population_Scale_Epidemiologyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Healthcare_and_Life_Sciences.3E_Use_Case_24:_Social_Contagion_Modelinghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Healthcare_and_Life_Sciences.3E_Use_Case_25:_LifeWatch_Biodiversityhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Deep_Learning_and_Social_Media.3E_Use_Case_26:_Large-scale_Deep_Learninghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Deep_Learning_and_Social_Media.3E_Use_Case_27:_Large_Scale_Consumer_Photos_Organizationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Deep_Learning_and_Social_Media.3E_Use_Case_28:_Truthy_Twitter_Data_Analysishttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Deep_Learning_and_Social_Media.3E_Use_Case_29:_Crowd_Sourcing_in_the_Humanitieshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Deep_Learning_and_Social_Media.3E_Use_Case_30:_CINET_Network_Science_Cyberinfrastructurehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Deep_Learning_and_Social_Media.3E_Use_Case_31:_NIST_Analytic_Technology_Measurement_and_Evaluationshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#The_Ecosystem_for_Research.3E_Use_Case_32:_DataNet_Federation_Consortium_(DFC)http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#The_Ecosystem_for_Research.3E_Use_Case_33:_The_.E2.80.98Discinnet_Process.E2.80.99http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#The_Ecosystem_for_Research.3E_Use_Case_34:_Graph_Search_on_Scientific_Datahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#The_Ecosystem_for_Research.3E_Use_Case_35:_Light_Source_Beamlineshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Astronomy_and_Physics.3E_Use_Case_36:_Catalina_Digital_Sky_Survey_for_Transientshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Astronomy_and_Physics.3E_Use_Case_37:_Cosmological_Sky_Survey_and_Simulationshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Astronomy_and_Physics.3E_Use_Case_38:_Large_Survey_Data_for_Cosmologyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Astronomy_and_Physics.3E_Use_Case_39:_Analysis_of_LHC_(Large_Hadron_Collider)_Datahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Note:_See_Table_Belowhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Astronomy_and_Physics.3E_Use_Case_40:_Belle_II_Experimenthttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Earth.2C_Environmental_and_Polar_Science.3E_Use_Case_41:_EISCAT_3D_Incoherent_Scatter_Radar_Systemhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Earth.2C_Environmental_and_Polar_Science.3E_Use_Case_42:_ENVRI.2C_Common_Environmental_Research_Infrastructurehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Earth.2C_Environmental_and_Polar_Science.3E_Use_Case_43:_Radar_Data_Analysis_for_CReSIShttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Note:_See_Table_Below_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Earth.2C_Environmental_and_Polar_Science.3E_Use_Case_44:_UAVSAR_Data_Processinghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Earth.2C_Environmental_and_Polar_Science.3E_Use_Case_45:_NASA_LARC.2FGSFC_iRODS_Federation_Testbedhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Earth.2C_Environmental_and_Polar_Science.3E_Use_Case_46:_MERRA_Analytic_Serviceshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Earth.2C_Environmental_and_Polar_Science.3E_Use_Case_47:_Atmospheric_Turbulence.E2.80.94Event_Discoveryhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Earth.2C_Environmental_and_Polar_Science.3E_Use_Case_48:_Climate_Studies_using_the_Community_Earth_System_Modelhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Earth.2C_Environmental_and_Polar_Science.3E_Use_Case_49:_Subsurface_Biogeochemistryhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Earth.2C_Environmental_and_Polar_Science.3E_Use_Case_50:_AmeriFlux_and_FLUXNEThttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Energy.3E_Use_Case_51:_Consumption_Forecasting_in_Smart_Gridshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Appendix_B:_Summary_of_Key_Propertieshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Table_B-1:_Use_Case_Specific_Information_by_Key_Propertieshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Appendix_C:_Use_Case_Requirements_Summaryhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Table_C-1:_Use_Case_Specific_Requirementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Appendix_D:_Use_Case_Detail_Requirementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Table_D-1:_Data_Sources_Requirementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#General_Requirementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Use_Case_Specific_Requirements_for_Data_Sourceshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Table_D-2:_Data_Transformationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#General_Requirements_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Use_Case_Specific_Requirements_for_Data_Transformation

Page 8: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Table D-3: Capabilities Use Case &Table NIST WordGeneral Requirements Use Case &Appendix NIST WordUse Case Specific Requirements for Capab Use Case &Appendix NIST WordTable D-4: Data Consumer Use Case &Table NIST WordGeneral Requirements Use Case &Appendix NIST WordUse Case Specific Requirements for Da Use Case &Appendix NIST WordTable D-5: Security and Privacy Use Case &Table NIST WordGeneral Requirements Use Case &Appendix NIST WordUse Case Specific Requirements for Secu Use Case &Appendix NIST WordTable D-6: Lifecycle Management Use Case &Table NIST WordGeneral Requirements Use Case &Appendix NIST WordUse Case Specific Requirements for Lif Use Case &Appendix NIST WordTable D-7: Others Use Case &Table NIST WordGeneral Requirements Use Case &Appendix NIST WordUse Case Specific Requirements for Oth Use Case &Appendix NIST WordAppendix E: Acronyms Use Case &Appendix NIST WordAppendix F: References Use Case &Appendix NIST WordDocument References Use Case &ReferencesNIST Word[1] Use Case &ReferencesNIST Word[2] Use Case &ReferencesNIST Word[3] Use Case &ReferencesNIST Word[4] Use Case &ReferencesNIST Word[5] Use Case &ReferencesNIST Word[6] Use Case &ReferencesNIST Word[7] Use Case &ReferencesNIST Word[8] Use Case &ReferencesNIST Word[9] Use Case &ReferencesNIST Word[10] Use Case &ReferencesNIST Word[11] Use Case &ReferencesNIST WordSecurity and Privacy Security anText NIST WordCover Page Security anText NIST WordInside Cover Page Security anText NIST WordNational Institute of Standards and Tec Security anText NIST WordReports on Computer Systems Technolo Security anText NIST WordAbstract Security anText NIST WordAcknowledgements Security anText NIST WordNotice to Readers Security anText NIST WordTable of Contents Security anText NIST WordExecutive Summary Security anText NIST Word1 Introduction Security anText NIST Word1.1 Background Security anText NIST Word1.2 Scope and Objectives of the Securit Security anText NIST Word1.3 Report Production Security anText NIST Word1.4 Report Structure Security anText NIST Word1.5 Future Work on this Volume Security anText NIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Table_D-3:_Capabilitieshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#General_Requirements_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Use_Case_Specific_Requirements_for_Capabilitieshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Table_D-4:_Data_Consumerhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#General_Requirements_4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Use_Case_Specific_Requirements_for_Data_Consumershttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Table_D-5:_Security_and_Privacyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#General_Requirements_5http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Use_Case_Specific_Requirements_for_Security_and_Privacyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Table_D-6:_Lifecycle_Managementhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#General_Requirements_6http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Use_Case_Specific_Requirements_for_Lifecycle_Managementhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Table_D-7:_Othershttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#General_Requirements_7http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Use_Case_Specific_Requirements_for_Othershttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Appendix_E:_Acronymshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Appendix_F:_Referenceshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Document_Referenceshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B1.5D_3http://www.whitehouse.gov/blog/2012/03/29/big-data-big-dealhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B2.5D_3http://arxiv.org/abs/1403.1528http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B3.5D_3http://grids.ucs.indiana.edu/ptliupages/publications/nist-hpc-abds.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B4.5D_2http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/whitepapers/fox.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5D_2http://grids.ucs.indiana.edu/ptliupages/publications/OgrePaperv9.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B6.5D_2http://grids.ucs.indiana.edu/ptliupages/publications/NISTUseCase.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B7.5D_2http://bigdataopensourceprojects.soic.indiana.edu/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B8.5D_2http://www.whitehouse.gov/mgihttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B9.5D_2http://www.whitehouse.gov/openhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B10.5D_2http://xpdb.nist.gov/nike/term.plhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B11.5D_2https://rd-alliance.org/group/metadata-standards-directory-working-group.htmlhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Security_and_Privacyhttp://bigdatawg.nist.gov/_uploadfiles/M0395_v1_4717582962.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Cover_Pagehttp://dx.doi.org/10.6028/NIST.SP.1500-4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Inside_Cover_Pagehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#National_Institute_of_Standards_and_Technology_Special_Publication_1500-4http://www.nist.gov/publication-portal.cfmhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Reports_on_Computer_Systems_Technologyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Abstracthttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Acknowledgementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Notice_to_Readershttp://bigdatawg.nist.gov/_uploadfiles/SP1500-1-to-7_comment_template.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_of_Contentshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Executive_Summaryhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1_Introductionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.1_Backgroundhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.2_Scope_and_Objectives_of_the_Security_and_Privacy_Subgrouphttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.3_Report_Productionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.4_Report_Structurehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.5_Future_Work_on_this_Volumehttp://1.usa.gov/1wQuti1

Page 9: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

2 Big Data Security and Privacy Security anText NIST Word2.1 Overview Security anText NIST Word2.2 Effects of Big Data Characteristics o Security anText NIST Word2.2.1 Variety Security anText NIST Word2.2.2 Volume Security anText NIST Word2.2.3 Velocity Security anText NIST Word2.2.4 Veracity Security anText NIST Word2.2.5 Volatility Security anText NIST Word2.3 Relation to Cloud Security anText NIST Word3 Example Use Cases for Security and Pr Security anText NIST Word3.1 Retail/Marketing Security anText NIST Word3.1.1 Consumer Digital Media Usage Security anText NIST Word3.1.2 Nielsen Homescan: Project Apollo Security anText NIST Word3.1.3 Web Traffic Analytics Security anText NIST Word3.2 Healthcare Security anText NIST Word3.2.1 Health Information Exchange Security anText NIST Word3.2.2 Genetic Privacy Security anText NIST Word3.2.3 Pharma Clinical Trial Data Sharing[ Security anText NIST Word3.3 Cybersecurity Security anText NIST Word3.3.1 Network Protection Security anText NIST Word3.4 Government Security anText NIST Word3.4.1 Military: Unmanned Vehicle Senso Security anText NIST Word3.4.2 Education: Common Core Student Security anText NIST Word3.5 Industrial: Aviation Security anText NIST Word3.5.1 Sensor Data Storage and Analytics Security anText NIST Word3.6 Transportation Security anText NIST Word3.6.1 Cargo Shipping Security anText NIST WordFigure 1: Cargo Shipping Scenario Security anFigure NIST Word4 Taxonomy of Security and Privacy Topi Security anText NIST Word4.1 Conceptual Taxonomy of Security an Security anText NIST WordFigure 2: Security and Privacy Concept Security anFigure NIST Word4.1.1 Data Confidentiality Security anText NIST Word4.1.2 Provenance Security anText NIST Word4.1.3 System Health Security anText NIST Word4.1.4 Public Policy, Social and Cross-Org Security anText NIST Word4.2 Operational Taxonomy of Security an Security anText NIST WordFigure 3: Security and Privacy Operatio Security anFigure NIST Word4.2.1 Device and Application Registratio Security anText NIST Word4.2.2 Identity and Access Management Security anText NIST Word4.2.3 Data Governance Security anText NIST Word4.2.4 Infrastructure Management Security anText NIST Word4.2.5 Risk and Accountability Security anText NIST Word4.3 Roles Related to Security and Privac Security anText NIST Word4.3.1 Infrastructure Management Security anText NIST Word4.3.2 Governance, Risk Management, a Security anText NIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2_Big_Data_Security_and_Privacyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.2_Effects_of_Big_Data_Characteristics_on_Security_and_Privacyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B2.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.2_Effects_of_Big_Data_Characteristics_on_Security_and_Privacyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.2.1_Varietyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B6.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.2.2_Volumehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B11.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.2.3_Velocityhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B12.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.2.4_Veracityhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B13.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.2.5_Volatilityhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B16.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.3_Relation_to_Cloudhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3_Example_Use_Cases_for_Security_and_Privacyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.1_Retail.2FMarketinghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.1.1_Consumer_Digital_Media_Usagehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B17.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.1.2_Nielsen_Homescan:_Project_Apollohttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.1.3_Web_Traffic_Analyticshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.2_Healthcarehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.2.1_Health_Information_Exchangehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.2.2_Genetic_Privacyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.2.3_Pharma_Clinical_Trial_Data_Sharing.5B18.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B18.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.3_Cybersecurityhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.3.1_Network_Protectionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.4_Governmenthttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.4.1_Military:_Unmanned_Vehicle_Sensor_Datahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B19.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.4.2_Education:_Common_Core_Student_Performance_Reportinghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B20.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.5_Industrial:_Aviationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.5.1_Sensor_Data_Storage_and_Analyticshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.6_Transportationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.6.1_Cargo_Shippinghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_1:_Cargo_Shipping_Scenariohttp://semanticommunity.info/@api/deki/files/33784/Volume4Figure1.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4_Taxonomy_of_Security_and_Privacy_Topicshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B24.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.1_Conceptual_Taxonomy_of_Security_and_Privacy_Topicshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_2:_Security_and_Privacy_Conceptual_Taxonomyhttp://semanticommunity.info/@api/deki/files/33781/Volume4Figure2.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.1.1_Data_Confidentialityhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B25.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.1.2_Provenancehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.1.3_System_Healthhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.1.4_Public_Policy.2C_Social_and_Cross-Organizational_Topicshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B27.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.2_Operational_Taxonomy_of_Security_and_Privacy_Topicshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B28.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_3:_Security_and_Privacy_Operational_Taxonomyhttp://semanticommunity.info/@api/deki/files/33782/Volume4Figure3.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.2.1_Device_and_Application_Registrationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.2.2_Identity_and_Access_Managementhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B33.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.2.3_Data_Governancehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B34.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.2.4_Infrastructure_Managementhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B36.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.2.5_Risk_and_Accountabilityhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B37.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.3_Roles_Related_to_Security_and_Privacy_Topicshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.3.1_Infrastructure_Managementhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.3.2_Governance.2C_Risk_Management.2C_and_Compliance

Page 10: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

4.3.3 Information Worker Security anText NIST Word4.4 Relation of Roles to the Security a Security anText NIST Word4.4.1 Data Confidentiality Security anText NIST Word4.4.2 Provenance Security anText NIST Word4.4.3 System Health management Security anText NIST Word4.4.4 Public Policy, Social, and Cross-Or Security anText NIST Word4.5 Additional Taxonomy Topics Security anText NIST Word4.5.1 Provisioning, Metering, and Billing Security anText NIST Word4.5.2 Data Syndication Security anText NIST Word5 Security and Privacy Fabric Security anText NIST WordFigure 4: NIST Big Data Reference Archi Security anFigure NIST Word5.1 Security and Privacy Fabric in the Security anText NIST WordFigure 5: Notional Security and Privacy Security anFigure NIST Word5.2 Privacy Engineering Principles Security anText NIST Word5.3 Relation of the Big Data Security Security anText NIST WordTable 1: Draft Security Operational T Security anTable NIST Word6 Mapping Use Cases to NBDRA Security anText NIST Word6.1 Consumer Digital Media Use Security anText NIST WordTable 2: Mapping Consumer Digital Medi Security anTable NIST Word6.2 Nielsen Homescan: Project Apollo Security anText NIST WordTable 3: Mapping Nielsen Homescan to t Security anTable NIST Word6.3 Web Traffic Analytics Security anText NIST WordTable 4: Mapping Web Traffic Analytics Security anTable NIST Word6.4 Health Information Exchange Security anText NIST WordTable 5: Mapping HIE to the Reference A Security anTable NIST Word6.5 Genetic Privacy Security anText NIST Word6.6 Pharmaceutical Clinical Trial Data Sh Security anText NIST WordTable 6: Mapping Pharmaceutical Clinical Security anTable NIST Word6.7 Network Protection Security anText NIST WordTable 7: Mapping Network Protection to Security anTable NIST Word6.8 Military: Unmanned Vehicle Sensor Security anText NIST WordTable 8: Mapping Military Unmanned Veh Security anTable NIST Word6.9 Education: Common Core Student P Security anText NIST WordTable 9: Mapping Common Core K–12 Stu Security anTable NIST Word6.10 Sensor Data Storage and Analytics Security anText NIST Word6.11Cargo Shipping Security anText NIST WordTable 10: Mapping Cargo Shipping to th Security anTable NIST WordAppendix A: Candidate Security and Priv Security anAppendix NIST WordAppendix B: Internal Security Considera Security anAppendix NIST WordFigure B-1: Composite Cloud Ecosystem S Security anFigure NIST WordAppendix C: Big Data Actors and Roles: Security anAppendix NIST WordAppendix D: Acronyms Security anAppendix NIST WordAppendix E: References Security anAppendix NIST WordDocument References Security anAppendix NIST Word[1] Security anReferencesNIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.3.3_Information_Workerhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4_Relation_of_Roles_to_the_Security_and_Privacy_Conceptual_taxonomyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.1_Data_Confidentialityhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.2_Provenancehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B38.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.3_System_Health_managementhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.4_Public_Policy.2C_Social.2C_and_Cross-Organizational_Topicshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.5_Additional_Taxonomy_Topicshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.5.1_Provisioning.2C_Metering.2C_and_Billinghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.5.2_Data_Syndicationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#5_Security_and_Privacy_Fabrichttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_4:_NIST_Big_Data_Reference_Architecturehttp://semanticommunity.info/@api/deki/files/33783/Volume4Figure4.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#5.1_Security_and_Privacy_Fabric_in_the_NBDRAhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_5:_Notional_Security_and_Privacy_Fabric_Overlay_to_the_NBDRAhttp://semanticommunity.info/@api/deki/files/33785/Volume4Figure5.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#5.2_Privacy_Engineering_Principleshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B39.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#5.3_Relation_of_the_Big_Data_Security_Operational_Taxonomy_to_the_NBDRAhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_1:_Draft_Security_Operational_Taxonomy_Mapping_to_the_NBDRA_Componentshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6_Mapping_Use_Cases_to_NBDRAhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6.1_Consumer_Digital_Media_Usehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_2:_Mapping_Consumer_Digital_Media_Usage_to_the_Reference_Architecturehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B49.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6.2_Nielsen_Homescan:_Project_Apollohttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B50.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_3:_Mapping_Nielsen_Homescan_to_the_Reference_Architecturehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6.3_Web_Traffic_Analyticshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_4:_Mapping_Web_Traffic_Analytics_to_the_Reference_Architecturehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6.4_Health_Information_Exchangehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_5:_Mapping_HIE_to_the_Reference_Architecturehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B51.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6.5_Genetic_Privacyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6.6_Pharmaceutical_Clinical_Trial_Data_Sharinghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_6:_Mapping_Pharmaceutical_Clinical_Trial_Data_Sharing_to_the_Reference_Architecturehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6.7_Network_Protectionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_7:_Mapping_Network_Protection_to_the_Reference_Architecturehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B52.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6.8_Military:_Unmanned_Vehicle_Sensor_Datahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B53.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_8:_Mapping_Military_Unmanned_Vehicle_Sensor_Data_to_the_Reference_Architecturehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6.9_Education:_Common_Core_Student_Performance_Reportinghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_9:_Mapping_Common_Core_K.E2.80.9312_Student_Reporting_to_the_Reference_Architecturehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B55.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6.10_Sensor_Data_Storage_and_Analyticshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6.11Cargo_Shippinghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_10:_Mapping_Cargo_Shipping_to_the_Reference_Architecturehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Appendix_A:_Candidate_Security_and_Privacy_Topics_for_Big_Data_Adaptationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B56.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Appendix_B:_Internal_Security_Considerations_within_Cloud_Ecosystemshttps://www.isc2.org/cissp/default.aspxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_B-1:_Composite_Cloud_Ecosystem_Security_Architecture.5B57.5Dhttp://semanticommunity.info/@api/deki/files/33780/Volume4FigureB1.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Appendix_C:_Big_Data_Actors_and_Roles:_Adaptation_to_Big_Data_Scenarioshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B59.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Appendix_D:_Acronymshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Appendix_E:_Referenceshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Document_Referenceshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5Dhttp://www.whitehouse.gov/blog/2012/03/29/big-data-big-deal

Page 11: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

[2] Security anReferencesNIST Word[3] Security anReferencesNIST Word[4] Security anReferencesNIST Word[5] Security anReferencesNIST Word[6] Security anReferencesNIST Word[7] Security anReferencesNIST Word[8] Security anReferencesNIST Word[9] Security anReferencesNIST Word[10] Security anReferencesNIST Word[11] Security anReferencesNIST Word[12] Security anReferencesNIST Word[13] Security anReferencesNIST Word[14] Security anReferencesNIST Word[15] Security anReferencesNIST Word[16] Security anReferencesNIST Word[17] Security anReferencesNIST Word[18] Security anReferencesNIST Word[19] Security anReferencesNIST Word[20] Security anReferencesNIST Word[21] Security anReferencesNIST Word[22] Security anReferencesNIST Word[23] Security anReferencesNIST Word[24] Security anReferencesNIST Word[25] Security anReferencesNIST Word[26] Security anReferencesNIST Word[27] Security anReferencesNIST Word[28] Security anReferencesNIST Word[29] Security anReferencesNIST Word[30] Security anReferencesNIST Word[31] Security anReferencesNIST Word[32] Security anReferencesNIST Word[33] Security anReferencesNIST Word[34] Security anReferencesNIST Word[35] Security anReferencesNIST Word[36] Security anReferencesNIST Word[37] Security anReferencesNIST Word[38] Security anReferencesNIST Word[39] Security anReferencesNIST Word[40] Security anReferencesNIST Word[41] Security anReferencesNIST Word[42] Security anReferencesNIST Word[43] Security anReferencesNIST Word[44] Security anReferencesNIST Word[45] Security anReferencesNIST Word[46] Security anReferencesNIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B2.5Dhttp://www.emc.com/leadership/programs/digital-universe.htmhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B3.5Dhttp://www.emc.com/leadership/programs/digital-universe.htmhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B4.5Dhttps://downloads.cloudsecurityalliance.org/initiatives/bdwg/Expanded_Top_Ten_Big_Data_Security_and_Privacy_Challenges.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B6.5Dhttp://www.isaca.org/Groups/Professional-English/big-data/GroupDocuments/Big_Data_Top_Ten_v1.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B7.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B8.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B9.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B10.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B11.5Dhttp://www.isaca.org/Groups/Professional-English/big-data/GroupDocuments/Big_Data_Top_Ten_v1.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B12.5Dhttp://www.isaca.org/Groups/Professional-English/big-data/GroupDocuments/Big_Data_Top_Ten_v1.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B13.5Dhttp://dx.doi.org/10.1109/MIC.2008.86http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B14.5Dhttps://www.exchangewire.com/blog/2014/10/29/appnexus-cto-on-the-fight-against-ad-fraud/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B15.5Dhttp://dx.doi.org/10.1126/science.1248506http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B16.5Dhttp://dx.doi.org/10.1016/j.future.2013.09.032http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B17.5Dhttp://bit.ly/1y3Y1P1http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B18.5Dhttp://phrma.org/sites/default/files/pdf/PhRMAPrinciplesForResponsibleClinicalTrialDataSharing.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B19.5Dhttp://www.apd.army.mil/jw2/xmldemo/r25_2/main.asphttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B20.5Dhttp://lohud.us/1mV9U2Uhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B21.5Dhttp://blogs.wsj.com/metropolis/2013/04/15/before-tougher-state-tests-officials-prepare-parents/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B22.5Dhttp://www.informationweek.com/big-data/news/common-core-meets-aging-education-techno/240158684http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B23.5Dhttp://www.civitaslearning.com/about/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B24.5Dhttp://www.isaca.org/Groups/Professional-English/big-data/GroupDocuments/Big_Data_Top_Ten_v1.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B25.5Dhttp://dx.doi.org/10.6028/NIST.IR.7956http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B26.5Dhttp://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B27.5Dhttp://www.acm.org/about/class/ccs98-html#K.4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B28.5Dhttp://csrc.nist.gov/publications/nistpubs/800-37-rev1/sp800-37-rev1-final.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B29.5Dhttp://www.isaca.org/Knowledge-Center/Research/ResearchDeliverables/Pages/The-Risk-IT-Framework.aspxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B30.5Dhttp://1.usa.gov/1wQuti1http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B31.5Dhttp://bit.ly/1wQByithttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B32.5Dhttp://resources.sei.cmu.edu/asset_files/TechnicalNote/2010_004_001_15200.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B33.5Dhttp://bit.ly/1wQByithttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B34.5Dhttp://bit.ly/1x2HSUehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B35.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B36.5Dhttp://bit.ly/1x2HSUehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B37.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B38.5Dhttp://www.hhs.gov/news/press/2013pres/01/20130117b.htmlhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B39.5Dhttp://docs.oasis-open.org/pmrm/PMRM/v1.0/csd01/PMRM-v1.0-csd01.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B40.5Dhttp://www.nist.gov/nstic/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B41.5Dhttp://csrc.nist.gov/publications/nistpubs/800-144/SP800-144.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B42.5Dhttp://csrc.nist.gov/publications/nistpubs/800-144/SP800-144.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B43.5Dhttp://doi.acm.org/10.1145/1073001.1073005http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B44.5Dhttp://dl.acm.org/citation.cfm?id=2028026.2028029http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B45.5Dhttp://doi.acm.org/10.1145/2683467.2683475http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B46.5Dhttp://doi.acm.org/10.1145/1968613.1968645

Page 12: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

[47] Security anReferencesNIST Word[48] Security anReferencesNIST Word[49] Security anReferencesNIST Word[50] Security anReferencesNIST Word[51] Security anReferencesNIST Word[52] Security anReferencesNIST Word[53] Security anReferencesNIST Word[54] Security anReferencesNIST Word[55] Security anReferencesNIST Word[56] Security anReferencesNIST Word[57] Security anReferencesNIST Word[58] Security anReferencesNIST Word[59] Security anReferencesNIST Word[60] Security anReferencesNIST Word[61] Security anReferencesNIST WordArchitecture White Paper Survey Architectu Text NIST WordCover Page Architectu Text NIST WordInside Cover Page Architectu Text NIST WordNational Institute of Standards and Tec Architectu Text NIST WordReports on Computer Systems Technolo Architectu Text NIST WordAbstract Architectu Text NIST WordAcknowledgements Architectu Text NIST WordNotice to Readers Architectu Text NIST WordTable of Contents Architectu Text NIST WordExecutive Summary Architectu Text NIST Word1 Introduction Architectu Text NIST Word1.1 Background Architectu Text NIST Word1.2 Scope and Objectives of the Refere Architectu Text NIST Word1.3 Report Production Architectu Text NIST Word1.4 Report Structure Architectu Text NIST Word1.5 Future Work on this Volume Architectu Text NIST Word2 Big Data Architecture Proposals Recei Architectu Text NIST Word2.1 Bob Marcus Architectu Text NIST Word2.1.1 General Architecture Description Architectu Text NIST Word2.1.2 Architecture Model Architectu Text NIST WordFigure 1: Components of the High Level Architectu Figure NIST Word2.1.3 Key Components Architectu Text NIST WordFigure 2: Description of the Component Architectu Figure NIST Word2.2 Microsoft Architectu Text NIST Word2.2.1 General Architecture Description Architectu Text NIST Word2.2.2 Architecture Model Architectu Text NIST WordFigure 3: Big Data Ecosystem Reference Architectu Figure NIST Word2.2.3 Key Components Architectu Text NIST Word2.3 University of Amsterdam Architectu Text NIST Word2.3.1 General Architecture Description Architectu Text NIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B47.5Dhttp://www.nist.gov/nstic/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B48.5Dhttp://collaborate.nist.gov/twiki-cloud-computing/pub/CloudComputing/CloudSecurity/NIST_Security_Reference_Architecture_2013.05.15_v1.0.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B49.5Dhttp://technet.microsoft.com/en-us/library/dd277323.aspxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B50.5Dhttp://www.nielsen.com/us/en/nielsen-solutions/nielsen-measurement/nielsen-retail-measurement.htmlhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B51.5Dhttp://www.safe-biopharma.org/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B52.5Dhttp://support.microsoft.com/kb/323076http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B53.5Dhttp://gcn.com/articles/2013/04/12/disa-plans-exabytes-large-data-objects.aspxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B54.5Dhttp://defensesystems.com/articles/2012/10/31/agg-drone-video-encryption-lags.aspxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B55.5Dhttp://www.nature.com/ncomms/2014/140121/ncomms4074/full/ncomms4074.htmlhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B56.5Dhttp://www.isaca.org/Groups/Professional-English/big-data/GroupDocuments/Big_Data_Top_Ten_v1.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B57.5Dhttp://www.nist.gov/customcf/get_pdf.cfm?pub_id=909505http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B58.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B59.5Dhttp://dx.doi.org/10.1007/978-3-540-78999-4_8http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B60.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B61.5Dhttp://dx.doi.org/10.1109/ms.2011.28http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Architecture_White_Paper_Surveyhttp://bigdatawg.nist.gov/_uploadfiles/M0396_v1_7656223932.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Cover_Page_2http://dx.doi.org/10.6028/NIST.http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Inside_Cover_Page_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#National_Institute_of_Standards_and_Technology_Special_Publication_1500-5http://www.nist.gov/publication-portal.cfmhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Reports_on_Computer_Systems_Technology_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Abstract_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Acknowledgements_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Notice_to_Readers_2http://bigdatawg.nist.gov/_uploadfiles/SP1500-1-to-7_comment_template.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_of_Contents_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Executive_Summary_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1_Introduction_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.1_Background_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.2_Scope_and_Objectives_of_the_Reference_Architecture_Subgrouphttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B2.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.3_Report_Production_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.4_Report_Structure_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.5_Future_Work_on_this_Volume_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2_Big_Data_Architecture_Proposals_Receivedhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.1_Bob_Marcushttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.1.1_General_Architecture_Descriptionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.1.2_Architecture_Modelhttp://semanticommunity.info/@api/deki/files/33760/Volume5Figure1.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_1:_Components_of_the_High_Level_Reference_Modelhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.1.3_Key_Componentshttp://semanticommunity.info/@api/deki/files/33761/Volume5Figure2.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_2:_Description_of_the_Components_of_the_Low-Level_Reference_Modelhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.2_Microsofthttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.2.1_General_Architecture_Descriptionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.2.2_Architecture_Modelhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_3:_Big_Data_Ecosystem_Reference_Architecturehttp://semanticommunity.info/@api/deki/files/33762/Volume5Figure3.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.2.3_Key_Componentshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B3.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.3_University_of_Amsterdamhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.2.1_General_Architecture_Descriptionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B5.5D_2

Page 13: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

2.3.2 Architecture Model Architectu Text NIST WordFigure 4: Big Data Architecture Framew Architectu Figure NIST Word2.3.3 Key Components Architectu Text NIST Word2.4 IBM Architectu Text NIST Word2.4.1 General Architecture Description Architectu Text NIST WordFigure 5: IBM Big Data Platform Architectu Figure NIST Word2.4.2 Architecture Model Architectu Text NIST Word2.4.3 Key Components Architectu Text NIST Word2.5 Oracle Architectu Text NIST Word2.5.1 General Architecture Description Architectu Text NIST Word2.5.2 Architecture Model Architectu Text NIST WordFigure 6: High level, Conceptual View Architectu Figure NIST Word2.5.3 Key Components Architectu Text NIST WordFigure 7: Oracle Big Data Reference Arch Architectu Figure NIST Word2.6 Pivotal Architectu Text NIST Word2.6.1 General Architecture Description Architectu Text NIST Word2.6.2 Architecture Model Architectu Text NIST WordFigure 8: Pivotal Architecture Model Architectu Figure NIST Word2.6.3 Key Components Architectu Text NIST WordFigure 9: Pivotal Data Fabric and Analyti Architectu Figure NIST Word2.7 SAP Architectu Text NIST Word2.7.1 General Architecture Description Architectu Text NIST Word2.7.2 Architecture Model Architectu Text NIST WordFigure 10: SAP Big Data Reference Archi Architectu Figure NIST Word2.7.3 Key Components Architectu Text NIST Word2.8 9Sight Architectu Text NIST Word2.8.1 General Architecture Description Architectu Text NIST WordFigure 11: 9Sight General Architecture Architectu Figure NIST Word2.8.2 Architecture Model Architectu Text NIST Word2.8.3 Key Components Architectu Text NIST WordFigure 12: 9Sight Architecture Model Architectu Figure NIST Word2.9 LexisNexis Architectu Text NIST Word2.9.1 General Architecture Description Architectu Text NIST WordFigure 13: Lexis Nexis General Architect Architectu Figure NIST Word2.9.2 Architecture Model Architectu Text NIST Word2.9.3 Key Components Architectu Text NIST WordFigure 14: Lexis Nexis High Performanc Architectu Figure NIST Word3 Survey of Big Data Architectures Architectu Text NIST Word3.1 Bob Marcus Architectu Text NIST WordTable 1: Databases and Interfaces in th Architectu Table NIST WordFigure 15: Big Data Layered Architecture Architectu Figure NIST Word3.2 Microsoft Architectu Text NIST WordTable 2: Microsoft Data Transformation Architectu Text NIST Word3.3 University of Amsterdam Architectu Text NIST Word3.4 IBM Architectu Text NIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.3.2_Architecture_Modelhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_4:_Big_Data_Architecture_Frameworkhttp://semanticommunity.info/@api/deki/files/33764/Volume5Figure4.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.4.3_Key_Componentshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B7.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.4_IBMhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.4.1_General_Architecture_Descriptionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_5:_IBM_Big_Data_Platformhttp://semanticommunity.info/@api/deki/files/33763/Volume5Figure5.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.4.2_Architecture_Modelhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.4.3_Key_Componentshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.5_Oraclehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.5.1_General_Architecture_Descriptionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.5.2_Architecture_Modelhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_6:_High_level.2C_Conceptual_View_of_the_Information_Management_Ecosystemhttp://semanticommunity.info/@api/deki/files/33765/Volume5Figure6.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.5.3_Key_Componentshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_7:_Oracle_Big_Data_Reference_Architecturehttp://semanticommunity.info/@api/deki/files/33766/Volume5Figure7.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.6_Pivotalhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.6.1_General_Architecture_Descriptionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.6.2_Architecture_Modelhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_8:_Pivotal_Architecture_Modelhttp://semanticommunity.info/@api/deki/files/33767/Volume5Figure8.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.6.3__Key_Componentshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_9:_Pivotal_Data_Fabric_and_Analyticshttp://semanticommunity.info/@api/deki/files/33768/Volume5Figure9.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.7_SAPhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.7.1_General_Architecture_Descriptionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.7.2_Architecture_Modelhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_10:_SAP_Big_Data_Reference_Architecturehttp://semanticommunity.info/@api/deki/files/33769/Volume5Figure10.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.7.3_Key_Componentshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.8_9Sighthttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.8.1_General_Architecture_Descriptionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_11:_9Sight_General_Architecturehttp://semanticommunity.info/@api/deki/files/33770/Volume5Figure11.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.8.2_Architecture_Modelhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.8.3_Key_Componentshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_12:_9Sight_Architecture_Modelhttp://semanticommunity.info/@api/deki/files/33771/Volume5Figure12.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.9_LexisNexishttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.9.1_General_Architecture_Descriptionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_13:_Lexis_Nexis_General_Architecturehttp://semanticommunity.info/@api/deki/files/33774/Volume5Figure13.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.9.2_Architecture_Modelhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.9.3_Key_Componentshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_14:_Lexis_Nexis_High_Performance_Computing_Clusterhttp://semanticommunity.info/@api/deki/files/33772/Volume5Figure14.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3_Survey_of_Big_Data_Architectureshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.1_Bob_Marcushttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_1:_Databases_and_Interfaces_in_the_Layered_Architecture_from_Bob_Marcushttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_15:_Big_Data_Layered_Architecturehttp://semanticommunity.info/@api/deki/files/33773/Volume5Figure15.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.2_Microsofthttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_2:_Microsoft_Data_Transformation_Stepshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.3_University_of_Amsterdamhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.4_IBM

Page 14: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Figure 16: Data Discovery and Explorati Architectu Figure MISNIST Word3.5 Oracle Architectu Text NIST Word3.6 Pivotal Architectu Text NIST Word3.7 SAP Architectu Text NIST Word3.8 9Sight Architectu Text NIST Word3.9 LexisNexis Architectu Text NIST Word3.10 Comparative view of surveyed arch Architectu Text NIST WordFigure 17(a): Stacked View of Surveyed http://semArchitectu Figure NIST WordFigure 17(b): Stacked View of Surveyed Architectu Figure NIST WordFigure 17(c): Stacked View of Surveyed Architectu Figure NIST Word4 Conclusions Architectu Text NIST WordFigure 18: Big Data Reference Architect Architectu Figure NIST WordAppendix A: Acronyms Architectu AppendiceNIST WordAppendix B: References Architectu AppendiceNIST WordDocument References Architectu ReferencesNIST Word[1] Architectu ReferencesNIST Word[2] Architectu ReferencesNIST Word[3] Architectu ReferencesNIST Word[4] Architectu ReferencesNIST Word[5] Architectu ReferencesNIST Word[6] Architectu ReferencesNIST Word[7] Architectu ReferencesNIST Word[8] Architectu ReferencesNIST Word[9] Architectu ReferencesNIST WordReference Architecture Reference Text NIST WordCover Page Reference Text NIST WordInside Cover Page Reference Text NIST WordNational Institute of Standards and Tec Reference Text NIST WordReports on Computer Systems Technolo Reference Text NIST WordAbstract Reference Text NIST WordAcknowledgements Reference Text NIST WordNotice to Readers Reference Text NIST WordTable of Contents Reference Text NIST WordExecutive Summary Reference Text NIST Word1 Introduction Reference Text NIST Word1.1 Background Reference Text NIST Word1.2 Scope and Objectives of the Refere Reference Text NIST Word1.3 Report Production Reference Text NIST Word1.4 Report Structure Reference Text NIST Word1.5 Future Work on this Volume Reference Text NIST Word2 High Level Reference Architecture Re Reference Text NIST Word2.1 Use Cases and Requirements Reference Text NIST WordTable 1: Mapping Use Case Characteriza Reference Table NIST Word2.2 Reference Architecture Survey Reference Text NIST Word2.3 Taxonomy Reference Text NIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_16:_Data_Discovery_and_Explorationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.5_Oraclehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.6_Pivotalhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.7_SAPhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.8_9Sighthttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.9_LexisNexishttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.10_Comparative_view_of_surveyed_architectures

http://semanticommunity.info/@api/deki/files/33775/Volume5Figure17a.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_17(b):_Stacked_View_of_Surveyed_Architecture_(continued)http://semanticommunity.info/@api/deki/files/33776/Volume5Figure17b.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_17(c):_Stacked_View_of_Surveyed_Architecture_(continued)http://semanticommunity.info/@api/deki/files/33777/Volume5Figure17c.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4_Conclusionshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_18:_Big_Data_Reference_Architecturehttp://semanticommunity.info/@api/deki/files/33778/Volume5Figure18a.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Appendix_A:_Acronymshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Appendix_B:_Referenceshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Document_References_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5D_2http://www.whitehouse.gov/blog/2012/03/29/big-data-big-dealhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B2.5D_2http://dodcio.defense.gov/Portals/0/Documents/DIEA/Ref_Archi_Description_Final_v1_18Jun10.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B3.5D_2http://www.gartner.com/it/page.jsp?id=1731916http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B4.5D_2http://www.dbta.com/Articles/Editorial/Trends-and-Applications/What-is-Data-Analysis-and-Data-Mining-73503.aspxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B5.5D_2http://www.uazone.org/demch/worksinprogress/sne-2013-02-techreport-bdaf-draft02.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B6.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B7.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B8.5D_2http://www.ietf.org/id/draft-khasnabish-cloud-reference-framework-05.txthttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B9.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Reference_Architecturehttp://bigdatawg.nist.gov/_uploadfiles/M0397_v1_2395481670.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Cover_Page_3http://dx.doi.org/10.6028/NIST.SP.1500-6http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Inside_Cover_Page_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#National_Institute_of_Standards_and_Technology_Special_Publication_1500-6http://www.nist.gov/publication-portal.cfmhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Reports_on_Computer_Systems_Technology_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Abstract_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Acknowledgements_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Notice_to_Readers_3http://bigdatawg.nist.gov/_uploadfiles/SP1500-1-to-7_comment_template.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_of_Contents_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Executive_Summary_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1_Introduction_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.1_Background_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.2_Scope_and_Objectives_of_the_Reference_Architectures_Subgrouphttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B2.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.3_Report_Production_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B3.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.4_Report_Structure_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.5_Future_Work_on_this_Volume_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2_High_Level_Reference_Architecture_Requirementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.1_Use_Cases_and_Requirementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_1:_Mapping_Use_Case_Characterization_Categories_to_Reference_Architecture_Components_and_Fabricshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.2_Reference_Architecture_Surveyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.3_Taxonomy

Page 15: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Figure 1: NBDRA Taxonomy Reference Figure NIST Word3 NBDRA Conceptual Model Reference Text NIST WordFigure 2: NIST Big Data Reference Archi Reference Figure NIST Word4 Functional Components of the NBDRA Reference Text NIST Word4.1 System Orchestrator Reference Text NIST Word4.2 Data Provider Reference Text NIST Word4.3 Big Data Application Provider Reference Text NIST Word4.3.1 Collection Reference Text NIST Word4.3.2 Preparation Reference Text NIST Word4.3.3 Analytics Reference Text NIST Word4.3.4 Visualization Reference Text NIST Word4.3.5 Access Reference Text NIST Word4.4 Big Data Framework Provider Reference Text NIST Word4.4.1 Infrastructure Frameworks Reference Text NIST Word4.4.1.1 Networking Reference Text NIST Word4.4.1.1.1 Software Defined Networks Reference Text NIST Word4.4.1.1.2 Network Function Virtualizati Reference Text NIST Word4.4.1.2 Computing Reference Text NIST Word4.4.1.3 Storage Reference Text NIST Word4.4.1.4 Environmental Resources Reference Text NIST Word4.4.2 Data Platform Frameworks Reference Text NIST WordFigure 3: Data Organization Approaches Reference Figure NIST Word1.4.2.1 In-memory Reference Text NIST Word1.4.2.2 File Systems Reference Text NIST Word1.4.2.2.1 File System Organization Reference Text NIST Word1.4.2.2.2 In File Data Organization Reference Text NIST Word1.4.2.3 Indexed Storage Organization Reference Text NIST WordFigure 4: Data Storage Technologies Reference Figure NIST Word4.4.3 Processing Frameworks Reference Text NIST WordFigure 5: Information Flow Reference Figure NIST Word4.4.3.1 Batch Frameworks Reference Text NIST WordTable 2: 13 Dwarfs—Algorithms for Simul Reference Table NIST Word4.4.3.1.1 Map/Reduce Reference Text NIST Word4.4.3.1.2 Bulk Synchronous Parallel Reference Text NIST Word4.4.3.2 Streaming Frameworks Reference Text NIST Word4.4.3.2.1 Event Ordering and Processin Reference Text NIST Word4.4.3.2.2 State Management Reference Text NIST Word4.4.3.2.3 Partitioning and Parallelism Reference Text NIST Word4.4.4 Messaging/Communications Fram Reference Text NIST Word4.4.5 Resource Management Framewor Reference Text NIST Word4.5 Data Consumer Reference Text NIST Word5 Management Fabric of the NBDRA Reference Text NIST Word5.1 System Management Reference Text NIST Word5.2 Big Data Lifecycle Management Reference Text NIST Word6 Security and Privacy Fabric of the NB Reference Text NIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_1:_NBDRA_Taxonomyhttp://semanticommunity.info/@api/deki/files/33755/Volume6Figure1.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3_NBDRA_Conceptual_Modelhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_2:_NIST_Big_Data_Reference_Architecture_(NBDRA)http://semanticommunity.info/@api/deki/files/33754/Volume6Figure2.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4_Functional_Components_of_the_NBDRAhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.1_System_Orchestratorhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.2_Data_Providerhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.3_Big_Data_Application_Providerhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.3.1_Collectionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.3.2_Preparationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.3.3_Analyticshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.3.4_Visualizationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.3.5_Accesshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4_Big_Data_Framework_Providerhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.1_Infrastructure_Frameworkshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.1.1_Networkinghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.1.1.1_Software_Defined_Networkshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.1.1.2__Network_Function_Virtualizationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.1.2_Computinghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.1.3_Storagehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.1.4_Environmental_Resourceshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.2_Data_Platform_Frameworkshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_3:_Data_Organization_Approacheshttp://semanticommunity.info/@api/deki/files/33751/Volume6Figure3.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.4.2.1_In-memoryhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.4.2.2_File_Systemshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.4.2.2.1_File_System_Organizationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.4.2.2.2_In_File_Data_Organizationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.4.2.3_Indexed_Storage_Organizationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_4:_Data_Storage_Technologieshttp://semanticommunity.info/@api/deki/files/33753/Volume6Figure4.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.3_Processing_Frameworkshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_5:_Information_Flowhttp://semanticommunity.info/@api/deki/files/33752/Volume6Figure5.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.3.1__Batch_Frameworkshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B4.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_2:_13_Dwarfs.E2.80.94Algorithms_for_Simulation_in_the_Physical_Scienceshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.3.1.1_Map.2FReducehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.3.1.2_Bulk_Synchronous_Parallelhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B6.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.3.2_Streaming_Frameworkshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.3.2.1_Event_Ordering_and_Processing_Guaranteeshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.3.2.2_State_Managementhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.3.2.3_Partitioning_and_Parallelismhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.4_Messaging.2FCommunications_Frameworkshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.4.5_Resource_Management_Frameworkhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.5_Data_Consumerhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#5_Management_Fabric_of_the_NBDRAhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#5.1_System_Managementhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#5.2_Big_Data_Lifecycle_Managementhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B7.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6_Security_and_Privacy_Fabric_of_the_NBDRA

Page 16: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

7 Conclusion Reference Text NIST WordAppendix A: Deployment Considerations Reference Appendix NIST WordIntroduction Reference Appendix NIST WordFigure A-1: Big Data Framework Deploy Reference Figure NIST WordCloud Service Providers Reference Appendix NIST WordCloud Service Component Reference Appendix NIST WordResource Abstraction and Control Com Reference Appendix NIST WordSecurity and Privacy and Management F Reference Appendix NIST WordPhysical Resource Deployments Reference Appendix NIST WordAppendix B: Terms and Definitions Reference Appendix NIST WordAppendix C: Examples of Big Data Organ Reference Appendix NIST WordRelational Storage Models Reference Appendix NIST WordKey-Value Storage Models Reference Appendix NIST WordColumnar Storage Models Reference Appendix NIST WordFigure B-1: Differences Between Row Or Reference Figure NIST WordFigure B-2: Column Family Segmentatio Reference Figure NIST WordDocument Reference Appendix NIST WordGraph Reference Appendix NIST WordFigure B-3: Object Nodes and Relations Reference Figure NIST WordAppendix D: Acronyms Reference Appendix NIST WordAppendix E: Resources and References Reference Appendix NIST WordDocument References Reference ReferencesNIST Word[1] Reference ReferencesNIST Word[2] Reference ReferencesNIST Word[3] Reference ReferencesNIST Word[4] Reference ReferencesNIST Word[5] Reference ReferencesNIST Word[6] Reference ReferencesNIST Word[7] Reference ReferencesNIST Word[8] Reference ReferencesNIST WordStandards Roadmap Standards Text NIST WordCover Page Standards Text NIST WordInside Cover Page Standards Text NIST WordNational Institute of Standards and Tech Standards Text NIST WordReports on Computer Systems Technolo Standards Text NIST WordAbstract Standards Text NIST WordAcknowledgements Standards Text NIST WordNotice to Readers Standards Text NIST WordTable of Contents Standards Table NIST WordExecutive Summary Standards Text NIST Word1 Introduction Standards Text NIST Word1.1 Background Standards Text NIST Word1.2 NIST Big Data Public Working Group Standards Text NIST Word1.3 Scope and Objectives of the Techn Standards Text NIST Word1.4 Report Production Standards Text NIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#7_Conclusionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Appendix_A:_Deployment_Considerationshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Introductionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_A-1:_Big_Data_Framework_Deployment_Optionshttp://semanticommunity.info/@api/deki/files/33756/Volume6FigureA1.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Cloud_Service_Providershttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Cloud_Service_Componenthttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Resource_Abstraction_and_Control_Componenthttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Security_and_Privacy_and_Management_Functionshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Physical_Resource_Deploymentshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Appendix_B:_Terms_and_Definitionshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B8.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Appendix_C:_Examples_of_Big_Data_Organization_Approacheshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Relational_Storage_Modelshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Key-Value_Storage_Modelshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Columnar_Storage_Modelshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_B-1:_Differences_Between_Row_Oriented_and_Column_Oriented_Storeshttp://semanticommunity.info/@api/deki/files/33758/Volume6FigureB1.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_B-2:_Column_Family_Segmentation_of_the_Columnar_Stores_Modelhttp://semanticommunity.info/@api/deki/files/33757/Volume6FigureB2.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Documenthttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Graphhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_B-3:_Object_Nodes_and_Relationships_of_Graph_Databaseshttp://semanticommunity.info/@api/deki/files/33750/Volume6FigureB3.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Appendix_D:_Acronyms_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Appendix_E:_Resources_and_Referenceshttp://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Document_References_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5D_3http://www.whitehouse.gov/blog/2012/03/29/big-data-big-dealhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B2.5D_3http://dodcio.defense.gov/Portals/0/Documents/DIEA/Ref_Archi_Description_Final_v1_18Jun10.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B3.5D_3http://www.lanl.gov/orgs/hpc/salishan/salishan2005/davidpatterson.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B4.5D_3http://view.eecs.berkeley.edu/wiki/Dwarf_Minehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B5.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B6.5D_3https://www.census.gov/history/www/genealogy/decennial_census_records/the_72_year_rule_1.htmlhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B7.5D_3http://www.nist.gov/customcf/get_pdf.cfm?pub_id=909505http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B8.5D_3http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Standards_Roadmaphttp://bigdatawg.nist.gov/_uploadfiles/M0398_v1_1449826642.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Cover_Page_4http://dx.doi.org/10.6028/NIST.SP.1500-7http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Inside_Cover_Page_4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#National_Institute_of_Standards_and_Technology_(NIST)_Special_Publication_1500-7http://www.nist.gov/publication-portal.cfmhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Reports_on_Computer_Systems_Technology_4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Abstract_4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Acknowledgements_4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Notice_to_Readers_4http://bigdatawg.nist.gov/_uploadfiles/SP1500-1-to-7_comment_template.docxhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_of_Contents_4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Executive_Summary_4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1_Introduction_4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.1_Background_4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5D_4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.2_NIST_Big_Data_Public_Working_Grouphttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.3_Scope_and_Objectives_of_the_Technology_Roadmap_Subgrouphttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.4_Report_Production

Page 17: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

1.5 Future Work on this Volume Standards Text NIST Word2 Big Data Definition Standards Text NIST Word2.1 Big Data Definitions Standards Text NIST Word2.2 Data Science Definitions Standards Text NIST Word3 Investigating the Big Data Ecosystem Standards Text NIST Word3.1 Use Cases Standards Text NIST Word3.2 Reference Architecture Survey Standards Text NIST Word3.3 Taxonomy Standards Text NIST WordFigure 1: NIST Big Data Reference Arch Standards Figure NIST Word4 Big Data Reference Architecture Standards Text NIST Word4.1 Overview Standards Text NIST WordTable 1: Mapping of Use Case Categori Standards Table NIST Word4.2 NBDRA Conceptual Model Standards Text NIST WordFigure 2: NBDRA Conceptual Model Standards Figure NIST Word5 Big Data Security and Privacy Standards Text NIST Word6 Big Data Standards Standards Text NIST Word6.1 Existing Standards Standards Text NIST WordTable 2: Existing Big Data Standards Standards Table NIST Word6.2 Gap in Standards Standards Text NIST Word6.3 Pathway to Address Standards Gaps Standards Text NIST WordAcronyms A: Acronyms Standards Acronyms NIST WordAppendix B: References Standards Appendix NIST WordDocument References Standards ReferencesNIST Word[1] Standards ReferencesNIST Word[2] Standards ReferencesNIST Word[3] Standards ReferencesNIST Word

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#1.5_Future_Work_on_this_Volume_4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2_Big_Data_Definitionhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.1_Big_Data_Definitionshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#2.2_Data_Science_Definitionshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3_Investigating_the_Big_Data_Ecosystemhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.1_Use_Caseshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.2_Reference_Architecture_Surveyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#3.3_Taxonomyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_1:_NIST_Big_Data_Reference_Architecture_Taxonomyhttp://semanticommunity.info/@api/deki/files/33749/Volume7Figure1.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4_Big_Data_Reference_Architecturehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.1_Overviewhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_1:_Mapping_of_Use_Case_Categories_to_the_NBDRA_Componentshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#4.2_NBDRA_Conceptual_Modelhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Figure_2:_NBDRA_Conceptual_Modelhttp://semanticommunity.info/@api/deki/files/33748/Volume7Figure2.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#5_Big_Data_Security_and_Privacyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B2.5D_4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6_Big_Data_Standardshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6.1_Existing_Standardshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Table_2:_Existing_Big_Data_Standardshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6.2_Gap_in_Standardshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B3.5D_4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#6.3_Pathway_to_Address_Standards_Gapshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Acronyms_A:_Acronymshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Appendix_B:_References_2https://www.ieee.org/index.htmlhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Document_References_4http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5D_4http://www.whitehouse.gov/blog/2012/03/29/big-data-big-dealhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B2.5D_4https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Expanded_Top_Ten_Big_Data_Security_and_Privacy_Challenges.pdfhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B3.5D_4http://www.iso.org/iso/big_data_report-jtc1.pdf

Page 18: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

URL URL URL URL URL URL URL URL URL URLhttp://www.meetup.com/Federal-Big-Data-Working-Group/events/222458479/http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Comment_Template_for_SP1500-x_(replace_x_with_volume_number)http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Definitionshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Taxonomieshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Use_Case_.26_Requirementshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Security_and_Privacyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Architecture_White_Paper_Surveyhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Reference_Architecturehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#Standards_Roadmaphttp://semanticommunity.info/Data_Science/Data_Science_for_Data_Mining

http://semanticommunity.info/@api/deki/files/33792/BrandNiemann05212015.pptxhttp://www.meetup.com/Federal-Big-Data-Working-Group/http://www.meetup.com/Virginia-Big-Data-Meetup/http://www.meetup.com/Northern-Virginia-Semantic-Web-Meetup/http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetuphttp://semanticommunity.info/@api/deki/files/33793/BrandNiemann05212015Slide1.PNG

http://semanticommunity.info/@api/deki/files/33794/BrandNiemann05212015Slide2.PNGhttp://semanticommunity.info/@api/deki/files/33798/BrandNiemann05212015Slide3.PNGhttp://semanticommunity.info/@api/deki/files/33797/BrandNiemann05212015Slide4.PNG

http://semanticommunity.info/@api/deki/files/33796/BrandNiemann05212015Slide5.PNGhttp://semanticommunity.info/@api/deki/files/33799/BrandNiemann05212015Slide6.PNGhttp://semanticommunity.info/@api/deki/files/33800/BrandNiemann05212015Slide7.PNG

http://semanticommunity.info/@api/deki/files/33801/BrandNiemann05212015Slide8.PNGhttp://semanticommunity.info/@api/deki/files/33803/BrandNiemann05212015Slide9.PNGhttp://semanticommunity.info/@api/deki/files/33802/BrandNiemann05212015Slide10.PNG

http://semanticommunity.info/@api/deki/files/33804/BrandNiemann05212015Slide11.PNGhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Frameworkhttp://semanticommunity.info/@api/deki/files/33806/BrandNiemann05212015Slide13.PNGhttp://semanticommunity.info/@api/deki/files/33807/BrandNiemann05212015Slide14.PNGhttps://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvchttps://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvchttps://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvchttps://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvchttps://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvchttps://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvc

https://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvchttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Research_Noteshttp://semanticommunity.info/@api/deki/files/33570/SP1500-1-to-7_comment_template.docxbniemann@[email protected]@nist.govhttp://bigdatawg.nist.gov/newuser.phphttp://semanticommunity.info/@api/deki/files/33567/M0392_v1_3022325181.docx

http://dx.doi.org/10.6028/NIST.SP.1500-1

[email protected]

[email protected]@nist.govhttp://bigdatawg.nist.gov/

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B1.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B2.5D

Page 19: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/@api/deki/files/33566/Volume1Figure1.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B4.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B6.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B8.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B9.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B10.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B11.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B12.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B13.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B16.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B15.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B16.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B17.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B18.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B19.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B20.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B22.5D

http://www.whitehouse.gov/blog/2012/03/29/big-data-big-deal

http://www.technologyreview.com/view/519851/the-big-data-conundrum-how-to-define-it/http://jtc1bigdatasg.nist.gov/_uploadfiles/N0095_Final_SGBD_Report_to_JTC1.docxhttp://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/http://jtc1bigdatasg.nist.gov/_uploadfiles/N0095_Final_SGBD_Report_to_JTC1.docxhttp://www.gartner.com/it-glossary/big-datahttp://datascience.berkeley.edu/what-is-big-data/http://www.oed.com/view/Entry/18833#eid301162178http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf

Page 20: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=39479http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=35646http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=35343http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=53798http://www.w3.org/2013/data/http://www.w3.org/2001/sw/interest/http://www.emc.com/leadership/programs/digital-universe.htmhttp://dx.doi.org/10.6028/NIST.SP.500-293http://csrc.nist.gov/publications/nistpubs/800-146/sp800-146.pdf

http://semanticommunity.info/@api/deki/files/33568/M0393_v1_3613775223.docxhttp://dx.doi.org/10.6028/NIST.SP.1500-2

[email protected]

[email protected]@nist.govhttp://bigdatawg.nist.gov/

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B1.5D_2

http://semanticommunity.info/@api/deki/files/33575/Volume2Figure1.pnghttp://semanticommunity.info/@api/deki/files/33579/Volume2Figure2.png

http://semanticommunity.info/@api/deki/files/33577/Volume2Figure3.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B2.5D_2http://semanticommunity.info/@api/deki/files/33580/Volume2Figure4.png

http://semanticommunity.info/@api/deki/files/33578/Volume2Figure5.png

http://semanticommunity.info/@api/deki/files/33581/Volume2Figure6.png

http://semanticommunity.info/@api/deki/files/33582/Volume2Figure7.png

http://semanticommunity.info/@api/deki/files/33584/Volume2Figure8.png

http://semanticommunity.info/@api/deki/files/33583/Volume2Figure9.png

Page 21: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/@api/deki/files/33576/Volume2Figure10.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B3.5D_2

http://www.whitehouse.gov/blog/2012/03/29/big-data-big-dealhttp://www.data.gov/http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=40874

http://semanticommunity.info/@api/deki/files/33574/M0394_v1_4746659136.docxhttp://dx.doi.org/10.6028/NIST.

[email protected]

[email protected]@nist.govhttp://bigdatawg.nist.gov/

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B1.5D_3

http://bigdatawg.nist.gov/uc_reqs_summary.phphttp://bigdatawg.nist.gov/uc_reqs_gen.phphttp://bigdatawg.nist.gov/uc_reqs_gen_ref.phphttp://bigdatawg.nist.gov/uc_reqs_gen_detail.phphttp://bigdatawg.nist.gov/uc_reqs_gen.php

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B2.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B4.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B6.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B7.5D_2

http://bigdatawg.nist.gov/usecases.php

http://semanticommunity.info/@api/deki/files/33586/Volume3Figure1.png

Page 22: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/@api/deki/files/33589/Volume3Figure2.pnghttp://semanticommunity.info/@api/deki/files/33585/Volume3Figure3.png

http://semanticommunity.info/@api/deki/files/33587/Volume3Figure4.png

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B9.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B10.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B11.5D_2http://vsg3d.com/

http://semanticommunity.info/@api/deki/files/33590/Volume3Figure5.png

http://semanticommunity.info/@api/deki/files/33591/Volume3Figure6.pnghttp://semanticommunity.info/@api/deki/files/33593/Volume3Figure7.png

http://semanticommunity.info/@api/deki/files/33592/Volume3Figure8.pnghttp://www.envri.eu/rmhttp://semanticommunity.info/@api/deki/files/33594/Volume3Figure9.pnghttp://semanticommunity.info/@api/deki/files/33595/Volume3Figure10a.pnghttp://semanticommunity.info/@api/deki/files/33598/Volume3Figure10b.png

Page 23: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/@api/deki/files/33596/Volume3Figure10c.pnghttp://semanticommunity.info/@api/deki/files/33597/Volume3Figure10d.pnghttp://semanticommunity.info/@api/deki/files/33599/Volume3Figure10e.png

http://semanticommunity.info/@api/deki/files/33600/Volume3Figure11.pnghttp://semanticommunity.info/@api/deki/files/33602/Volume3Figure12.pnghttp://semanticommunity.info/@api/deki/files/33601/Volume3Figure13.png

http://semanticommunity.info/@api/deki/files/33604/Volume3Figure14.png

http://semanticommunity.info/@api/deki/files/33603/Volume3Figure15.png

http://semanticommunity.info/@api/deki/files/33588/Volume3Figure16.png

[email protected]@[email protected]

http://mendeley.com/http://dev.mendeley.com/http://www.slideshare.net/xamat/building-largescale-realworld-recommender-systems-recsys2012-tutorialhttp://techblog.netflix.com/http://www.slideshare.net/kleinerperkins/kpcb-internet-trends-2013http://webcourse.cs.technion.ac.il/236621/Winter2011-2012/en/ho_Lectures.htmlhttp://www.ifis.cs.tu-bs.de/teaching/ss-11/irwshttp://www.slideshare.net/beechung/recommender-systems-tutorialpart1introhttp://www.worldwidewebsize.com/http://www.disasterrecovery.org/http://www.dincloud.com/http://www.coso.org/http://www.itil-officialsite.com/http://www.isaca.org/http://www.opengroup.org/http://www.standards.iso.org/http://www.pcaobus.org/

[email protected]@earthlink.net

http://www.materialsproject.org/http://www.opengeospatial.org/standardshttp://geojson.org/http://earth-info.nga.mil/publications/specs/printed/CADRG/cadrg.htmlhttp://www.gwg.nga.mil/misb/http://www.dabi.temple.edu/~hbling/publication/SPIE12_Dismount_Formatted_v2_BW.pdfhttp://csce.uark.edu/~jgauch/library/Tracking/Orten.2005.pdfhttp://www.militaryaerospace.com/topics/m/video/79088650/persistent-surveillance-relies-on-extracting-relevant-data-points-and-connecting-the-dots.htmhttp://www.defencetalk.com/wide-area-persistent-surveillance-revolutionizes-tactical-isr-45745/http://www.defencetalk.com/wide-area-persistent-surveillance-revolutionizes-tactical-isr-45745/http://www.defencetalk.com/wide-area-persistent-surveillance-revolutionizes-tactical-isr-45745/

[email protected]://www.regenstrief.org/http://www.loinc.org/http://www.ihie.org/http://www.iom.edu/Activities/Quality/LearningHealthcare.aspxhttps://web.cci.emory.edu/confluence/display/PAIShttps://web.cci.emory.edu/confluence/display/HadoopGIS

Page 24: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://www.whitehouse.gov/blog/2012/03/29/big-data-big-dealhttp://arxiv.org/abs/1403.1528http://grids.ucs.indiana.edu/ptliupages/publications/nist-hpc-abds.pdfhttp://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/whitepapers/fox.pdfhttp://grids.ucs.indiana.edu/ptliupages/publications/OgrePaperv9.pdfhttp://grids.ucs.indiana.edu/ptliupages/publications/NISTUseCase.pdfhttp://bigdataopensourceprojects.soic.indiana.edu/http://www.whitehouse.gov/mgihttp://www.whitehouse.gov/openhttp://xpdb.nist.gov/nike/term.plhttps://rd-alliance.org/group/metadata-standards-directory-working-group.html

http://semanticommunity.info/@api/deki/files/33569/M0395_v1_4717582962.docxhttp://dx.doi.org/10.6028/NIST.SP.1500-4

[email protected]

[email protected]@nist.govhttp://bigdatawg.nist.gov/

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5D

http://1.usa.gov/1wQuti1

Page 25: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B3.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B4.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B5.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B7.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B8.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B9.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B10.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B11.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B12.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B14.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B15.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B16.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B17.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B18.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B19.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B21.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B22.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B23.5D

http://semanticommunity.info/@api/deki/files/33784/Volume4Figure1.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B24.5D

http://semanticommunity.info/@api/deki/files/33781/Volume4Figure2.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B26.5D

http://www.aiim.org/http://mike2.openmethodology.org/wiki/MIKE2.0_Governance_Associationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B29.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B30.5D

http://semanticommunity.info/@api/deki/files/33782/Volume4Figure3.png

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B33.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B35.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B36.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B37.5D

Page 26: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B38.5D

http://semanticommunity.info/@api/deki/files/33783/Volume4Figure4.png

http://semanticommunity.info/@api/deki/files/33785/Volume4Figure5.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B40.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B41.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B42.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B43.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B44.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B45.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B46.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B47.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B48.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B49.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B50.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B51.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B52.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B53.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B55.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B56.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B58.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B57.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B60.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B61.5D

http://www.whitehouse.gov/blog/2012/03/29/big-data-big-deal

Page 27: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://www.emc.com/leadership/programs/digital-universe.htmhttp://www.emc.com/leadership/programs/digital-universe.htmhttps://downloads.cloudsecurityalliance.org/initiatives/bdwg/Expanded_Top_Ten_Big_Data_Security_and_Privacy_Challenges.pdf

http://www.isaca.org/Groups/Professional-English/big-data/GroupDocuments/Big_Data_Top_Ten_v1.pdf

http://www.isaca.org/Groups/Professional-English/big-data/GroupDocuments/Big_Data_Top_Ten_v1.pdfhttp://www.isaca.org/Groups/Professional-English/big-data/GroupDocuments/Big_Data_Top_Ten_v1.pdfhttp://dx.doi.org/10.1109/MIC.2008.86https://www.exchangewire.com/blog/2014/10/29/appnexus-cto-on-the-fight-against-ad-fraud/http://dx.doi.org/10.1126/science.1248506http://dx.doi.org/10.1016/j.future.2013.09.032http://bit.ly/1y3Y1P1http://phrma.org/sites/default/files/pdf/PhRMAPrinciplesForResponsibleClinicalTrialDataSharing.pdfhttp://www.apd.army.mil/jw2/xmldemo/r25_2/main.asphttp://lohud.us/1mV9U2Uhttp://blogs.wsj.com/metropolis/2013/04/15/before-tougher-state-tests-officials-prepare-parents/http://www.informationweek.com/big-data/news/common-core-meets-aging-education-techno/240158684http://www.civitaslearning.com/about/http://www.isaca.org/Groups/Professional-English/big-data/GroupDocuments/Big_Data_Top_Ten_v1.pdfhttp://dx.doi.org/10.6028/NIST.IR.7956http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdfhttp://www.acm.org/about/class/ccs98-html#K.4http://csrc.nist.gov/publications/nistpubs/800-37-rev1/sp800-37-rev1-final.pdfhttp://www.isaca.org/Knowledge-Center/Research/ResearchDeliverables/Pages/The-Risk-IT-Framework.aspxhttp://1.usa.gov/1wQuti1http://bit.ly/1wQByithttp://resources.sei.cmu.edu/asset_files/TechnicalNote/2010_004_001_15200.pdfhttp://bit.ly/1wQByithttp://bit.ly/1x2HSUe

http://bit.ly/1x2HSUe

http://www.hhs.gov/news/press/2013pres/01/20130117b.htmlhttp://docs.oasis-open.org/pmrm/PMRM/v1.0/csd01/PMRM-v1.0-csd01.pdfhttp://www.nist.gov/nstic/http://csrc.nist.gov/publications/nistpubs/800-144/SP800-144.pdfhttp://csrc.nist.gov/publications/nistpubs/800-144/SP800-144.pdfhttp://doi.acm.org/10.1145/1073001.1073005http://dl.acm.org/citation.cfm?id=2028026.2028029http://doi.acm.org/10.1145/2683467.2683475http://doi.acm.org/10.1145/1968613.1968645

Page 28: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://www.nist.gov/nstic/http://collaborate.nist.gov/twiki-cloud-computing/pub/CloudComputing/CloudSecurity/NIST_Security_Reference_Architecture_2013.05.15_v1.0.pdfhttp://technet.microsoft.com/en-us/library/dd277323.aspxhttp://www.nielsen.com/us/en/nielsen-solutions/nielsen-measurement/nielsen-retail-measurement.htmlhttp://www.safe-biopharma.org/http://support.microsoft.com/kb/323076http://gcn.com/articles/2013/04/12/disa-plans-exabytes-large-data-objects.aspxhttp://defensesystems.com/articles/2012/10/31/agg-drone-video-encryption-lags.aspxhttp://www.nature.com/ncomms/2014/140121/ncomms4074/full/ncomms4074.htmlhttp://www.isaca.org/Groups/Professional-English/big-data/GroupDocuments/Big_Data_Top_Ten_v1.pdfhttp://www.nist.gov/customcf/get_pdf.cfm?pub_id=909505

http://dx.doi.org/10.1007/978-3-540-78999-4_8

http://dx.doi.org/10.1109/ms.2011.28http://semanticommunity.info/@api/deki/files/33571/M0396_v1_7656223932.docx

http://dx.doi.org/10.6028/NIST.

[email protected]

[email protected]://bigdatawg.nist.gov/

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B2.5D_2

http://semanticommunity.info/@api/deki/files/33760/Volume5Figure1.png

http://semanticommunity.info/@api/deki/files/33761/Volume5Figure2.png

http://semanticommunity.info/@api/deki/files/33762/Volume5Figure3.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B4.5D_2

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B6.5D_2

Page 29: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/@api/deki/files/33764/Volume5Figure4.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B8.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B9.5D_2

http://semanticommunity.info/@api/deki/files/33763/Volume5Figure5.png

http://semanticommunity.info/@api/deki/files/33765/Volume5Figure6.png

http://semanticommunity.info/@api/deki/files/33766/Volume5Figure7.png

http://semanticommunity.info/@api/deki/files/33767/Volume5Figure8.png

http://semanticommunity.info/@api/deki/files/33768/Volume5Figure9.png

http://semanticommunity.info/@api/deki/files/33769/Volume5Figure10.png

http://semanticommunity.info/@api/deki/files/33770/Volume5Figure11.png

http://semanticommunity.info/@api/deki/files/33771/Volume5Figure12.png

http://semanticommunity.info/@api/deki/files/33774/Volume5Figure13.png

http://semanticommunity.info/@api/deki/files/33772/Volume5Figure14.png

http://semanticommunity.info/@api/deki/files/33773/Volume5Figure15.png

Page 30: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/@api/deki/files/33775/Volume5Figure17a.pnghttp://semanticommunity.info/@api/deki/files/33776/Volume5Figure17b.pnghttp://semanticommunity.info/@api/deki/files/33777/Volume5Figure17c.png

http://semanticommunity.info/@api/deki/files/33779/Volume5Figure18b.pnghttp://semanticommunity.info/@api/deki/files/33759/Volume5Figure18c.png

http://www.whitehouse.gov/blog/2012/03/29/big-data-big-dealhttp://dodcio.defense.gov/Portals/0/Documents/DIEA/Ref_Archi_Description_Final_v1_18Jun10.pdfhttp://www.gartner.com/it/page.jsp?id=1731916http://www.dbta.com/Articles/Editorial/Trends-and-Applications/What-is-Data-Analysis-and-Data-Mining-73503.aspxhttp://www.uazone.org/demch/worksinprogress/sne-2013-02-techreport-bdaf-draft02.pdf

http://www.ietf.org/id/draft-khasnabish-cloud-reference-framework-05.txt

http://semanticommunity.info/@api/deki/files/33573/M0397_v1_2395481670.docxhttp://dx.doi.org/10.6028/NIST.SP.1500-6

[email protected]

[email protected]@nist.govhttp://bigdatawg.nist.gov/

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B2.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B3.5D_3

Page 31: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/@api/deki/files/33755/Volume6Figure1.png

http://semanticommunity.info/@api/deki/files/33754/Volume6Figure2.png

http://semanticommunity.info/@api/deki/files/33751/Volume6Figure3.png

http://semanticommunity.info/@api/deki/files/33753/Volume6Figure4.png

http://semanticommunity.info/@api/deki/files/33752/Volume6Figure5.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B5.5D_3

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B6.5D_3

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B7.5D_3

Page 32: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/@api/deki/files/33756/Volume6FigureA1.png

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B8.5D_3

http://semanticommunity.info/@api/deki/files/33758/Volume6FigureB1.pnghttp://semanticommunity.info/@api/deki/files/33757/Volume6FigureB2.png

http://semanticommunity.info/@api/deki/files/33750/Volume6FigureB3.png

http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_fact_sheet_final_1.pdfhttp://www.nist.gov/itl/ssd/is/big-data.cfmhttp://bigdatawg.nist.gov/http://www.nist.gov/itl/ssd/is/upload/NIST-BD-Platforms-05-Big-Data-Wactlar-slides.pdfhttp://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdfhttp://www.gartner.com/DisplayDocument?id=2057415&ref=clientFriendlyUrlhttp://dodcio.defense.gov/Portals/0/Documents/DIEA/Ref_Archi_Description_Final_v1_18Jun10.pdfhttp://www.iso.org/iso/catalogue_detail.htm?csnumber=50508

http://www.whitehouse.gov/blog/2012/03/29/big-data-big-dealhttp://dodcio.defense.gov/Portals/0/Documents/DIEA/Ref_Archi_Description_Final_v1_18Jun10.pdfhttp://www.lanl.gov/orgs/hpc/salishan/salishan2005/davidpatterson.pdfhttp://view.eecs.berkeley.edu/wiki/Dwarf_Mine

https://www.census.gov/history/www/genealogy/decennial_census_records/the_72_year_rule_1.htmlhttp://www.nist.gov/customcf/get_pdf.cfm?pub_id=909505http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

http://semanticommunity.info/@api/deki/files/33572/M0398_v1_1449826642.docxhttp://dx.doi.org/10.6028/NIST.SP.1500-7

[email protected]

[email protected]@nist.govhttp://bigdatawg.nist.gov/

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5D_4

Page 33: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/@api/deki/files/33749/Volume7Figure1.png

http://semanticommunity.info/@api/deki/files/33748/Volume7Figure2.pnghttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B2.5D_4

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B3.5D_4

http://www.incits.org/http://www.iec.ch/http://www.iso.org/iso/home.htmlhttp://www.opengeospatial.org/https://www.ogf.org/ogf/doku.phphttps://www.oasis-open.org/http://www.w3.org/

http://www.whitehouse.gov/blog/2012/03/29/big-data-big-dealhttps://downloads.cloudsecurityalliance.org/initiatives/bdwg/Expanded_Top_Ten_Big_Data_Security_and_Privacy_Challenges.pdfhttp://www.iso.org/iso/big_data_report-jtc1.pdf

Page 34: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

URL URL URL URL URL URL URL URL URL URLhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Slideshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Spotfire_Dashboardhttp://semanticommunity.info/@api/deki/files/33791/NISTBigData.xlsx

http://semanticommunity.info/@api/deki/files/33793/BrandNiemann05212015Slide1.PNG

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework

https://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvchttps://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvchttps://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvchttps://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvchttps://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvchttps://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvc

https://spotfire.cloud.tibco.com/spotfire/wp/render/20388084797/analysis?file=/users/bniemann/Public/NISTBigData-Spotfire&waid=V7aaLnbIJEakQRIBWbdpR-1905521da9uNvchttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#Research_Notes

Page 35: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B6.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B16.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B19.5D

Page 36: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B7.5D_2

Page 37: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B11.5D_2

Page 38: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://www.defencetalk.com/wide-area-persistent-surveillance-revolutionizes-tactical-isr-45745/

Page 39: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5D

Page 40: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B5.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B10.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B11.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B12.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B15.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B16.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B17.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B18.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B19.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B23.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B24.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B26.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B30.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B33.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B35.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B36.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B37.5D

Page 41: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B38.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B48.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B49.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B50.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B51.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B52.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B53.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B55.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B56.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B58.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B57.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B61.5D

Page 42: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Expanded_Top_Ten_Big_Data_Security_and_Privacy_Challenges.pdf

Page 43: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://collaborate.nist.gov/twiki-cloud-computing/pub/CloudComputing/CloudSecurity/NIST_Security_Reference_Architecture_2013.05.15_v1.0.pdf

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5D_2http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B2.5D_2

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B4.5D_2

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B6.5D_2

Page 44: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B9.5D_2

Page 45: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://www.dbta.com/Articles/Editorial/Trends-and-Applications/What-is-Data-Analysis-and-Data-Mining-73503.aspx

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B2.5D_3http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B3.5D_3

Page 46: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B5.5D_3

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B6.5D_3

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B7.5D_3

Page 47: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B8.5D_3

http://www.iso.org/iso/catalogue_detail.htm?csnumber=50508

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B1.5D_4

Page 48: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B2.5D_4

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B3.5D_4

https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Expanded_Top_Ten_Big_Data_Security_and_Privacy_Challenges.pdf

Page 49: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework/NIST_Big_Data_Framework#.5B48.5D

Page 50: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Category Title ApplicationCurrent ApFutures URLGovernmenCensus 201ApplicationCurrent Approach: 380 terabytes of scanned documentsGovernmenNational A ApplicationCurrent AppFutures: There are distributed data sources from federal agencies where current solution requires transfer of those data to a centralized storage. In the future, those data sources may reside in multiple Cloud environments. In this case, physical custody should avoid transferring big data from Cloud to Cloud or from Cloud to Data Center.GovernmenStatistica ApplicationCurrent AppFutures: Need to improve recommendation systems similar to those used in e-commerce (see Netflix use case) that reduce costs and improve quality while providing confidentiality safeguards that are reliable and publically auditable. Data visualization is useful for data review, operational activity and general analysis. It continues to evolve; mobile access important.GovernmenNon-TraditApplicationCurrent AppFutures: Analytics needs to be developed which give statistical estimations that provide more detail, on a more near real time basis for less cost. The reliability of estimated statistics from such “mashed up” sources still must be evaluated.CommerciaCloud Eco-SApplicationCurrent AppFutures: One must address GRC (Governance, Risk & Compliance) as well as CIA (Confidentiality, Integrity & Availability) issues, which can even be impacted by the SEC’s mandated use of XBRL (extensible Business Related Markup Language). While at the same time, addressing the influence these same issues can and will have within a Cloud Eco-system/Big Data environment, and their global impact across all Financial sectors.CommerciaMendeley –ApplicationCurrent AppFutures: Currently Hadoop batch jobs are scheduled daily, but work has begun on real-time recommendation. The database contains ~400M documents, roughly 80M unique documents, and receives 5-700k new uploads on a weekday. Thus a major challenge is clustering matching documents together in a computationally efficient way (scalable and parallelized) when they’re uploaded from different sources and have been slightly modified via third-part annotation tools or publisher watermarks and cover pages.CommerciaNetflix MovApplicationCurrent AppFutures: Very competitive business. Need to aware of other companies and trends in both content (which Movies are hot) and technology. Need to investigate new business initiatives such as Netflix sponsored contentCommerciaWeb Search;ApplicationCurrent AppFutures: A very competitive field where continuous innovation needed. Two important areas are addressing mobile clients which are a growing fraction of users and increasing sophistication of responses and layout to maximize total benefit of clients, advertisers and Search Company. The “deep web” (that behind user interfaces to databases etc.) and multimedia search of increasing importance. 500M photos uploaded each day and 100 hours of video uploaded to YouTube each minuteCommerciaIaaS (Infra ApplicationCurrent AppFutures: The complexities associated with migrating from a Primary Site to either a Replication Site or a Backup Site is not fully automated at this point in time. The goal is to enable the user to automatically initiate the Failover sequence. Both organizations must know which servers have to be restored and what are the dependencies and inter-dependencies between the Primary Site servers and Replication and/or Backup Site servers. This requires a continuous monitoring of both.CommerciaCargo ShipApplicationCurrent AppFutures: This Internet of Things application needs to track items in real time. A new aspect will be status condition of the items which will include sensor information, GPS coordinates, and a unique identification schema based upon a new ISO 29161 standards under development within ISO JTC1 SC31 WG2. CommerciaMaterials ApplicationCurrent AppFutures: Materials informatics is an area in which the new tools of data science can have major impact by predicting the performance of real materials (gram to ton quantities) starting at the atomistic, nanometer, and/or micrometer level of description. One must establish materials data repositories beyond the existing ones that focus on fundamental data; one must develop internationally-accepted data recording standards that can be used by a very diverse materials community, including developers materials test standards (such as ASTM and ISO), testing companies, materials producers, and R&D labs; one needs tools and procedures to help organizations wishing to deposit proprietary materials in data repositories to mask proprietary information, yet to maintain the usability of data; one needs multi-variable materials data visualization tools, in which the number of variables can be quite highCommerciaSimulationApplicationCurrent ApFutures: Need large scale computing at scale for simulation science. Flexible data methods at scale for messy data. Machine learning and knowledge systems that integrate data from publications, experiments, and simulations to advance goal-driven thinking in materials design. The current 100TB of data will become 500TB in 5 years.Defense Large ScaleApplicationCurrent AppFutures: Today’s intelligence systems often contain trillions of geospatial objects and need to be able to visualize and interact with millions of objects. Critical issues are Indexing, retrieval and distributed analysis; Visualization generation and transmission; Visualization of data at the end of low bandwidth wireless connections; Data is sensitive and must be completely secure in transit and at rest (particularly on handhelds); Geospatial data requires unique approaches to indexing and distributed analysis.Defense Object idenApplicationCurrent AppFutures: Typical problem is integration of this processing into a large (GPU) cluster capable of processing data from several sensors in parallel and in near real time. Transmission of data from sensor to system is also a major challenge.Defense Intelligenc ApplicationCurrent AppFutures: Data currently exists in disparate silos which must be accessible through a semantically integrated data space. Wide variety of data types, sources, structures, and quality which will span domains and requires integrated search and reasoning. Most critical data is either unstructured or imagery/video which requires significant processing to extract entities and information. Network quality, Provenance and security essential.Healthcare Electronic ApplicationCurrent AppFutures: Teradata, PostgreSQL, MongoDB running on Indiana University supercomputer supporting information retrieval methods to identify relevant clinical features (tf-idf, latent semantic analysis, mutual information). Natural Language Processing techniques to extract relevant clinical features. Validated features will be used to parameterize clinical phenotype decision models based on maximum likelihood estimators and Bayesian networks. Decision models will be used to identify a variety of clinical phenotypes such as diabetes, congestive heart failure, and pancreatic cancerHealthcare Pathology ApplicationCurrent AppFutures: Recently, 3D pathology imaging is made possible through 3D laser technologies or serially sectioning hundreds of tissue sections onto slides and scanning them into digital images. Segmenting 3D microanatomic objects from registered serial images could produce tens of millions of 3D objects from a single image. This provides a deep “map” of human tissues for next generation diagnosis. 1TB raw image data + 1TB analytical results per 3D image and 1PB data per moderated hospital per year.Healthcare ComputatioApplicationCurrent ApFutures: Our goal is to solve that bottleneck with extreme scale computing with community-focused science gateways to support the application of massive data analysis toward massive imaging data sets. Workflow components include data acquisition, storage, enhancement, minimizing noise, segmentation of regions of interest, crowd-based selection and extraction of features, and object classification, and organization, and search. Use ImageJ, OMERO, VolRover, advanced segmentation and feature detection software. Healthcare Genomic MeApplicatio Current AppFutures: DNA sequencers can generate ~300GB compressed data/day which volume has increased much faster than Moore’s Law. Future data could include other ‘omics’ measurements, which will be even larger than DNA sequencing. Clouds have been exploredHealthcare ComparativApplicationCurrent ApFutures: Management of heterogeneity of biological data is currently performed by relational database management system (Oracle). Unfortunately, it does not scale for even the current volume 50TB of data. NoSQL solutions aim at providing an alternative but unfortunately they do not always lend themselves to real time interactive use, rapid and parallel bulk loading, and sometimes have issues regarding robustness. Healthcare Individuali ApplicationCurrent ApFutures: Identify similar patients from a large Electronic Health Record (EHR) database, i.e. an individualized cohort, and evaluate their respective management outcomes to formulate most appropriate solution suited for a given patient with diabetes. Use efficient parallel retrieval algorithms, suitable for cloud or HPC, using open source Hbase with both indexed and custom search to identify patients of possible interest. Use Semantic Linking for Property Values method to convert an existing data warehouse at Mayo Clinic, called the Enterprise Data Trust (EDT), into RDF triples that enables one to find similar patients through linking of both vocabulary-based and continuous values. The time dependent properties need to be processed before query to allow matching based on derivatives and other derived properties.Healthcare Statistical ApplicationCurrent AppFutures: A cohort of millions of patient can involve petabyte datasets. Issues include availability of too much data (as images, genetic sequences etc) that can make the analysis complicated. A major challenge lies in aligning the data and merging from multiple sources in a form that can be made useful for a combined analysis. Another issue is that sometimes, large amount of data is available about a single subject but the number of subjects themselves is not very high (i.e., data imbalance). This can result in learning algorithms picking up random correlations between the multiple data types as important features in analysis.Healthcare World PopuApplicationCurrent AppFutures: Use large social contagion models to study complex global scale issuesHealthcare Social Con ApplicationCurrent AppFutures: Data fusion a big issue; how should one combine data from different sources and how to deal with missing or incomplete data? take into account heterogeneous features of 100s of millions or billions of individuals, models of cultural variations across countries that are assigned to individual agents? How to validate these large models? Healthcare Biodiversi Application: Research Futures: LifeWatch initiative will provide integrated access to a variety of data, analytical and modeling tools as served by a variety of collaborating initiatives. Another service is offered with data and tools in selected workflows for specific scientific communities. In addition, LifeWatch will provide opportunities to construct personalized ‘virtual labs', also allowing one to enter new data and analytical tools. New data will be shared with the data facilities cooperating with LifeWatch. LifeWatch operates the Global Biodiversity Information facility and Biodiversity Catalogue that is Biodiversity Science Web Services Catalogue. Data includes ‘omics, species information, ecological information (such as biomass, population density etc.), ecosystem data (such as CO2 fluxes. Algal blooming, water and soil characteristics)Deep LearnLarge-scaleApplicationCurrent AppFutures: Large datasets of 100TB or more may be necessary in order to exploit the representational power of the larger models. Training a self-driving car could take 100 million images at megapixel resolution. Deep Learning shares many characteristics with the broader field of machine learning. The paramount requirements are high computational throughput for mostly dense linear algebra operations, and extremely high productivity for researcher exploration. One needs integration of high performance libraries with high level (python) prototyping environments.Deep LearnOrganizing ApplicationCurrent AppFutures: Need many analytics including feature extraction, feature matching, and large-scale probabilistic inference, which appear in many or most computer vision and image processing problems, including recognition, stereo resolution, and image denoising. Need to visualize large-scale 3-d reconstructions, and navigate large-scale collections of images that have been aligned to maps.Deep LearnTruthy: Inf ApplicationCurrent AppFutures: Truthy plans to expand incorporating Google+ and Facebook. Need to move towards Hadoop/IndexedHBase & HDFS distributed storage. Use Redis as a in-memory database as a buffer for real-time analysis. Need streaming clustering, anomaly detection and online learning.Deep LearnCrowd SourcApplicationCurrent AppFutures: Crowd sourcing has been barely started to be used on a larger scale but with the availability of mobile devices, now there is a huge potential for collecting much data from many individuals, also making use of sensors in mobile devices. This has not been explored on a large scale so far; existing projects of crowd sourcing are usually of a limited scale and web-based. Privacy issues may be involved (A/V from individuals), anonymization may be necessary but not always possible. Data management and curation critical. Size could be hundreds of terabytes with multimedia.Deep LearnCINET: CybeApplicationCurrent AppFutures: As the repository grows, we expect a rapid growth to lead to over 1000-5000 networks and methods in about a year. As more fields use graphs of increasing size, parallel algorithms will be important. Data manipulation and bookkeeping of the derived data for users is a challenge there are no well-defined and effective models and tools for management of various graph data in a unified fashionDeep LearnNIST Infor ApplicationCurrent AppFutures: Even larger data collections are being planned for future evaluations of analytics involving multiple data streams and very heterogeneous data. As well as larger datasets, future includes testing of streaming algorithms with multiple heterogeneous data. Use of clouds being explored.The EcosysDataNet FeApplicationCurrent Approach: Currently 25 science and engineering domains have projects that rely on the iRODS (Integrated Rule Oriented Data System) policy-based data management system including major NSF projects such as Ocean Observatories Initiative (sensor archiving); Temporal Dynamics of Learning Center (Cognitive science data grid); the iPlant Collaborative (plant genomics); Drexel engineering digital library; Odum Institute for social science research (data grid federation with Dataverse). iRODS currently manages petabytes of data, hundreds of millions of files, hundreds of millions of metadata attributes, tens of thousands of users, and a thousand storage resources. It interoperates with workflow systems (NCSA Cyberintegrator, Kepler, Taverna), cloud and more traditional storage models and different transport protocols. The full description has a diagram of the iRODS architecture.The EcosysThe ‘DiscinApplicationCurrent AppFutures: Discinnet itself would not be Bigdata but rather will generate metadata when applied to a cluster that involves Bigdata. In interdisciplinary integration of several fields, the process would reconcile metadata from many complexity levels.The EcosysSemantic GApplicationCurrent appFutures: Create a cloud infrastructure for social media of scientific information where many scientists from various parts of the world can participate and deposit results of their experiment. Some of the issues that one has to resolve prior to establishing a scientific social media are: a) How to minimize challenges related to establishing re-usable, inter-disciplinary, scalable, on-demand, use-case and user-friendly vocabulary? b) How to adopt a existing or create new on-demand ‘data-graph’ to place an information in an intuitive way such that it would easily integrate with existing ‘data-graphs’ in a federated environment without knowing too much about the data management? c) How to find relevant scientific data without spending too much time on the internet? Start with resources like the Open Government movement, Material genome Initiative and Protein Databank. This effort includes many local and networked resources. Developing an infrastructure to automatically integrate information from all these resources using data-graphs is a challenge that we are trying to solve. Good database tools and servers for data-graph manipulation are needed.The EcosysLight sourcApplicationCurrent AppFutures: Camera resolution is continually increasing. Data transfer to large-scale computing facilities is becoming necessary because of the computational power required to conduct the analysis on time scales useful to the experiment. Large number of beamlines (e.g. 39 at LBNL ALS) means that aggregate data load is likely to increase significantly over the coming years and need for a generalized infrastructure for analyzing gigabytes per second of data from many beamline detectors at multiple facilities. Astronomy Catalina ReApplicationCurrent AppFutures: CRTS is a scientific and methodological testbed and precursor of larger surveys to come, notably the Large Synoptic Survey Telescope (LSST), expected to operate in 2020’s and selected as the highest-priority ground-based instrument in the 2010 Astronomy and Astrophysics Decadal Survey. LSST will gather about 30 TB per night. The schematic architecture for a cyber-infrastructure for time domain astronomy illustrated by a figure in full description.Astronomy DOE ExtremApplication: A cosmoloFutures: Data sizes are Dark Energy Survey (DES) 4 PB in 2015; Zwicky Transient Factory (ZTF) 1 PB/year in 2015; Large Synoptic Sky Survey (LSST see CRTS description) 7 PB/year in 2019; Simulations > 10 PB in 2017. Huge amounts of supercomputer time (over 200M hours) will be used.Astronomy Large Surv ApplicationCurrent AppFutures: Techniques for handling Cholesky decompostion for thousands of simulations with matrices of order 1M on a side and parallel image storage would be important. LSST will generate 60PB of imaging data and 15PB of catalog data and a correspondingly large (or larger) amount of simulation data. Over 20TB of data per night.Astronomy Particle PhApplicationCurrent AppFutures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.Astronomy Belle II H Application The Belle Futures: An upgraded experiment Belle II and accelerator SuperKEKB will start operation in 2015 with a factor of 50 increased data with total integrated RAW data ~120PB and physics data ~15PB and ~100PB MC samples. Move to a distributed computing model requiring continuous RAW data transfer of ~20Gbps at designed luminosity between Japan and US. Will need Open Science Grid, Geant4, DIRAC, FTS, Belle II framework software.Earth, Env EISCAT 3D iApplicationCurrent AppFutures: The design of the next generation radar, EISCAT_3D, will consist of a core site with a transmitting and receiving radar arrays and four sites with receiving antenna arrays at some 100 km from the core. The fully operational 5-site system will generate several thousand times data of current EISCAT system with 40 PB/year in 2022 and is expected to operate for 30 years. EISCAT 3D data e-Infrastructure plans to use the high performance computers for central site data processing and high throughput computers for mirror sites data processing. Downloading the full data is not time critical, but operations require real-time information about certain pre-defined events to be sent from the sites to the operation center and a real-time link from the operation center to the sites to set the mode of radar operation on with immediate action. See figure in full description.Earth, Env ENVRI, ComApplicationCurrent ApFutures: ENVRI’s common environment will empower the users of the collaborating environmental research infrastructures and enable multidisciplinary scientists to access, study and correlate data from multiple domains for "system level" research. It provides Bigdata requirements coming from interdisciplinary research. As shown in Figure 1 in full description, analysis of the computational characteristics of the 6 ESFRI Environmental Research infrastructure, 5 common subsystems has been identified. The definition of them are given in the ENVRI Reference Model, www.envri.eu/rm: Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system. Data curation: facilitates quality control and preservation of scientific data. It is typically operated at a data centre. Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem. Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments. Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities. As shown in Figure 2 of full description, the 5 sub-system map well to the architectures of the ESFRI Environmental Research Infrastructures.Earth, Env Radar Data ApplicationCurrent AppFutures: An order of magnitude more data (petabyte per mission) is projected with improved instrumentation. Demands of processing increasing field data in an environment with more data but still constrained power budget, suggests low power/performance architectures such as GPU systems. The full descriptions gives workflows for different parts of problem and pictures of operation.Earth, Env UAVSAR DatApplicationCurrent AppFutures: The data size would increase dramatically if Earth Radar Mission launched. Clouds are suitable hosts but are not used today in production.Earth, Env NASA LARC/ApplicationCurrent AppFutures: The improved access will be enabled through the use of iRODS that enables parallel downloads of datasets from selected replica servers that can be geographically dispersed, but still accessible by users worldwide. iRODS operation will be enhanced with semantically organized metadata, and managed via a highly precise Earth Science ontology. Cloud solutions will also be explored.Earth, Env MERRA AnalApplicationCurrent AppFutures: Clouds are being investigated. The data is growing by one TB a month.Earth, Env AtmospheriApplicationCurrent AppFutures: The dataset will reach 500TB in 5 years. The initial turbulence case can be extended to other ocean/atmosphere phenomena but the analytics would be different in each case.

Page 51: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Earth, Env Climate St Applicatio Current AppFutures: Rapid growth of data with 30 PB produced at NERSC (assuming 15 end-to-end climate change experiments) in 2017 and many times more this worldwide.Earth, Env DOE-BER SuApplicatio Current AppFutures: Little effort to date has been devoted to developing a framework for systematically connecting scales, as is needed to identify key controls and to simulate important feedbacks. GEWaSC will develop a simulation framework that formally scales from genomes to watersheds and will synthesize diverse and disparate field, laboratory, and simulation datasets across different semantic, spatial, and temporal scales.Earth, Env DOE-BER AmApplicationCurrent AppFutures: Field experiment data taking would be improved by access to existing data and automated entry of new data via mobile devices. Need to support interdisciplinary studies integrating diverse data sources.Energy ConsumptioApplicationCurrent AppFutures: Wide spread deployment of Smart Grids with new analytics integrating diverse data and supporting curtailment requests. Mobile applications for client interactions.

Page 52: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: There are distributed data sources from federal agencies where current solution requires transfer of those data to a centralized storage. In the future, those data sources may reside in multiple Cloud environments. In this case, physical custody should avoid transferring big data from Cloud to Cloud or from Cloud to Data Center.Futures: Need to improve recommendation systems similar to those used in e-commerce (see Netflix use case) that reduce costs and improve quality while providing confidentiality safeguards that are reliable and publically auditable. Data visualization is useful for data review, operational activity and general analysis. It continues to evolve; mobile access important.Futures: Analytics needs to be developed which give statistical estimations that provide more detail, on a more near real time basis for less cost. The reliability of estimated statistics from such “mashed up” sources still must be evaluated.Futures: One must address GRC (Governance, Risk & Compliance) as well as CIA (Confidentiality, Integrity & Availability) issues, which can even be impacted by the SEC’s mandated use of XBRL (extensible Business Related Markup Language). While at the same time, addressing the influence these same issues can and will have within a Cloud Eco-system/Big Data environment, and their global impact across all Financial sectors.Futures: Currently Hadoop batch jobs are scheduled daily, but work has begun on real-time recommendation. The database contains ~400M documents, roughly 80M unique documents, and receives 5-700k new uploads on a weekday. Thus a major challenge is clustering matching documents together in a computationally efficient way (scalable and parallelized) when they’re uploaded from different sources and have been slightly modified via third-part annotation tools or publisher watermarks and cover pages.Futures: Very competitive business. Need to aware of other companies and trends in both content (which Movies are hot) and technology. Need to investigate new business initiatives such as Netflix sponsored contentFutures: A very competitive field where continuous innovation needed. Two important areas are addressing mobile clients which are a growing fraction of users and increasing sophistication of responses and layout to maximize total benefit of clients, advertisers and Search Company. The “deep web” (that behind user interfaces to databases etc.) and multimedia search of increasing importance. 500M photos uploaded each day and 100 hours of video uploaded to YouTube each minuteFutures: The complexities associated with migrating from a Primary Site to either a Replication Site or a Backup Site is not fully automated at this point in time. The goal is to enable the user to automatically initiate the Failover sequence. Both organizations must know which servers have to be restored and what are the dependencies and inter-dependencies between the Primary Site servers and Replication and/or Backup Site servers. This requires a continuous monitoring of both.Futures: This Internet of Things application needs to track items in real time. A new aspect will be status condition of the items which will include sensor information, GPS coordinates, and a unique identification schema based upon a new ISO 29161 standards under development within ISO JTC1 SC31 WG2. Futures: Materials informatics is an area in which the new tools of data science can have major impact by predicting the performance of real materials (gram to ton quantities) starting at the atomistic, nanometer, and/or micrometer level of description. One must establish materials data repositories beyond the existing ones that focus on fundamental data; one must develop internationally-accepted data recording standards that can be used by a very diverse materials community, including developers materials test standards (such as ASTM and ISO), testing companies, materials producers, and R&D labs; one needs tools and procedures to help organizations wishing to deposit proprietary materials in data repositories to mask proprietary information, yet to maintain the usability of data; one needs multi-variable materials data visualization tools, in which the number of variables can be quite highFutures: Need large scale computing at scale for simulation science. Flexible data methods at scale for messy data. Machine learning and knowledge systems that integrate data from publications, experiments, and simulations to advance goal-driven thinking in materials design. The current 100TB of data will become 500TB in 5 years.Futures: Today’s intelligence systems often contain trillions of geospatial objects and need to be able to visualize and interact with millions of objects. Critical issues are Indexing, retrieval and distributed analysis; Visualization generation and transmission; Visualization of data at the end of low bandwidth wireless connections; Data is sensitive and must be completely secure in transit and at rest (particularly on handhelds); Geospatial data requires unique approaches to indexing and distributed analysis.Futures: Typical problem is integration of this processing into a large (GPU) cluster capable of processing data from several sensors in parallel and in near real time. Transmission of data from sensor to system is also a major challenge.Futures: Data currently exists in disparate silos which must be accessible through a semantically integrated data space. Wide variety of data types, sources, structures, and quality which will span domains and requires integrated search and reasoning. Most critical data is either unstructured or imagery/video which requires significant processing to extract entities and information. Network quality, Provenance and security essential.Futures: Teradata, PostgreSQL, MongoDB running on Indiana University supercomputer supporting information retrieval methods to identify relevant clinical features (tf-idf, latent semantic analysis, mutual information). Natural Language Processing techniques to extract relevant clinical features. Validated features will be used to parameterize clinical phenotype decision models based on maximum likelihood estimators and Bayesian networks. Decision models will be used to identify a variety of clinical phenotypes such as diabetes, congestive heart failure, and pancreatic cancerFutures: Recently, 3D pathology imaging is made possible through 3D laser technologies or serially sectioning hundreds of tissue sections onto slides and scanning them into digital images. Segmenting 3D microanatomic objects from registered serial images could produce tens of millions of 3D objects from a single image. This provides a deep “map” of human tissues for next generation diagnosis. 1TB raw image data + 1TB analytical results per 3D image and 1PB data per moderated hospital per year.Futures: Our goal is to solve that bottleneck with extreme scale computing with community-focused science gateways to support the application of massive data analysis toward massive imaging data sets. Workflow components include data acquisition, storage, enhancement, minimizing noise, segmentation of regions of interest, crowd-based selection and extraction of features, and object classification, and organization, and search. Use ImageJ, OMERO, VolRover, advanced segmentation and feature detection software. Futures: DNA sequencers can generate ~300GB compressed data/day which volume has increased much faster than Moore’s Law. Future data could include other ‘omics’ measurements, which will be even larger than DNA sequencing. Clouds have been exploredFutures: Management of heterogeneity of biological data is currently performed by relational database management system (Oracle). Unfortunately, it does not scale for even the current volume 50TB of data. NoSQL solutions aim at providing an alternative but unfortunately they do not always lend themselves to real time interactive use, rapid and parallel bulk loading, and sometimes have issues regarding robustness. Futures: Identify similar patients from a large Electronic Health Record (EHR) database, i.e. an individualized cohort, and evaluate their respective management outcomes to formulate most appropriate solution suited for a given patient with diabetes. Use efficient parallel retrieval algorithms, suitable for cloud or HPC, using open source Hbase with both indexed and custom search to identify patients of possible interest. Use Semantic Linking for Property Values method to convert an existing data warehouse at Mayo Clinic, called the Enterprise Data Trust (EDT), into RDF triples that enables one to find similar patients through linking of both vocabulary-based and continuous values. The time dependent properties need to be processed before query to allow matching based on derivatives and other derived properties.Futures: A cohort of millions of patient can involve petabyte datasets. Issues include availability of too much data (as images, genetic sequences etc) that can make the analysis complicated. A major challenge lies in aligning the data and merging from multiple sources in a form that can be made useful for a combined analysis. Another issue is that sometimes, large amount of data is available about a single subject but the number of subjects themselves is not very high (i.e., data imbalance). This can result in learning algorithms picking up random correlations between the multiple data types as important features in analysis.Futures: Use large social contagion models to study complex global scale issuesFutures: Data fusion a big issue; how should one combine data from different sources and how to deal with missing or incomplete data? take into account heterogeneous features of 100s of millions or billions of individuals, models of cultural variations across countries that are assigned to individual agents? How to validate these large models? Futures: LifeWatch initiative will provide integrated access to a variety of data, analytical and modeling tools as served by a variety of collaborating initiatives. Another service is offered with data and tools in selected workflows for specific scientific communities. In addition, LifeWatch will provide opportunities to construct personalized ‘virtual labs', also allowing one to enter new data and analytical tools. New data will be shared with the data facilities cooperating with LifeWatch. LifeWatch operates the Global Biodiversity Information facility and Biodiversity Catalogue that is Biodiversity Science Web Services Catalogue. Data includes ‘omics, species information, ecological information (such as biomass, population density etc.), ecosystem data (such as CO2 fluxes. Algal blooming, water and soil characteristics)Futures: Large datasets of 100TB or more may be necessary in order to exploit the representational power of the larger models. Training a self-driving car could take 100 million images at megapixel resolution. Deep Learning shares many characteristics with the broader field of machine learning. The paramount requirements are high computational throughput for mostly dense linear algebra operations, and extremely high productivity for researcher exploration. One needs integration of high performance libraries with high level (python) prototyping environments.Futures: Need many analytics including feature extraction, feature matching, and large-scale probabilistic inference, which appear in many or most computer vision and image processing problems, including recognition, stereo resolution, and image denoising. Need to visualize large-scale 3-d reconstructions, and navigate large-scale collections of images that have been aligned to maps.Futures: Truthy plans to expand incorporating Google+ and Facebook. Need to move towards Hadoop/IndexedHBase & HDFS distributed storage. Use Redis as a in-memory database as a buffer for real-time analysis. Need streaming clustering, anomaly detection and online learning.Futures: Crowd sourcing has been barely started to be used on a larger scale but with the availability of mobile devices, now there is a huge potential for collecting much data from many individuals, also making use of sensors in mobile devices. This has not been explored on a large scale so far; existing projects of crowd sourcing are usually of a limited scale and web-based. Privacy issues may be involved (A/V from individuals), anonymization may be necessary but not always possible. Data management and curation critical. Size could be hundreds of terabytes with multimedia.Futures: As the repository grows, we expect a rapid growth to lead to over 1000-5000 networks and methods in about a year. As more fields use graphs of increasing size, parallel algorithms will be important. Data manipulation and bookkeeping of the derived data for users is a challenge there are no well-defined and effective models and tools for management of various graph data in a unified fashionFutures: Even larger data collections are being planned for future evaluations of analytics involving multiple data streams and very heterogeneous data. As well as larger datasets, future includes testing of streaming algorithms with multiple heterogeneous data. Use of clouds being explored.

Current Approach: Currently 25 science and engineering domains have projects that rely on the iRODS (Integrated Rule Oriented Data System) policy-based data management system including major NSF projects such as Ocean Observatories Initiative (sensor archiving); Temporal Dynamics of Learning Center (Cognitive science data grid); the iPlant Collaborative (plant genomics); Drexel engineering digital library; Odum Institute for social science research (data grid federation with Dataverse). iRODS currently manages petabytes of data, hundreds of millions of files, hundreds of millions of metadata attributes, tens of thousands of users, and a thousand storage resources. It interoperates with workflow systems (NCSA Cyberintegrator, Kepler, Taverna), cloud and more traditional storage models and different transport protocols. The full description has a diagram of the iRODS architecture.Futures: Discinnet itself would not be Bigdata but rather will generate metadata when applied to a cluster that involves Bigdata. In interdisciplinary integration of several fields, the process would reconcile metadata from many complexity levels.Futures: Create a cloud infrastructure for social media of scientific information where many scientists from various parts of the world can participate and deposit results of their experiment. Some of the issues that one has to resolve prior to establishing a scientific social media are: a) How to minimize challenges related to establishing re-usable, inter-disciplinary, scalable, on-demand, use-case and user-friendly vocabulary? b) How to adopt a existing or create new on-demand ‘data-graph’ to place an information in an intuitive way such that it would easily integrate with existing ‘data-graphs’ in a federated environment without knowing too much about the data management? c) How to find relevant scientific data without spending too much time on the internet? Start with resources like the Open Government movement, Material genome Initiative and Protein Databank. This effort includes many local and networked resources. Developing an infrastructure to automatically integrate information from all these resources using data-graphs is a challenge that we are trying to solve. Good database tools and servers for data-graph manipulation are needed.Futures: Camera resolution is continually increasing. Data transfer to large-scale computing facilities is becoming necessary because of the computational power required to conduct the analysis on time scales useful to the experiment. Large number of beamlines (e.g. 39 at LBNL ALS) means that aggregate data load is likely to increase significantly over the coming years and need for a generalized infrastructure for analyzing gigabytes per second of data from many beamline detectors at multiple facilities. Futures: CRTS is a scientific and methodological testbed and precursor of larger surveys to come, notably the Large Synoptic Survey Telescope (LSST), expected to operate in 2020’s and selected as the highest-priority ground-based instrument in the 2010 Astronomy and Astrophysics Decadal Survey. LSST will gather about 30 TB per night. The schematic architecture for a cyber-infrastructure for time domain astronomy illustrated by a figure in full description.Futures: Data sizes are Dark Energy Survey (DES) 4 PB in 2015; Zwicky Transient Factory (ZTF) 1 PB/year in 2015; Large Synoptic Sky Survey (LSST see CRTS description) 7 PB/year in 2019; Simulations > 10 PB in 2017. Huge amounts of supercomputer time (over 200M hours) will be used.Futures: Techniques for handling Cholesky decompostion for thousands of simulations with matrices of order 1M on a side and parallel image storage would be important. LSST will generate 60PB of imaging data and 15PB of catalog data and a correspondingly large (or larger) amount of simulation data. Over 20TB of data per night.Futures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.Futures: An upgraded experiment Belle II and accelerator SuperKEKB will start operation in 2015 with a factor of 50 increased data with total integrated RAW data ~120PB and physics data ~15PB and ~100PB MC samples. Move to a distributed computing model requiring continuous RAW data transfer of ~20Gbps at designed luminosity between Japan and US. Will need Open Science Grid, Geant4, DIRAC, FTS, Belle II framework software.Futures: The design of the next generation radar, EISCAT_3D, will consist of a core site with a transmitting and receiving radar arrays and four sites with receiving antenna arrays at some 100 km from the core. The fully operational 5-site system will generate several thousand times data of current EISCAT system with 40 PB/year in 2022 and is expected to operate for 30 years. EISCAT 3D data e-Infrastructure plans to use the high performance computers for central site data processing and high throughput computers for mirror sites data processing. Downloading the full data is not time critical, but operations require real-time information about certain pre-defined events to be sent from the sites to the operation center and a real-time link from the operation center to the sites to set the mode of radar operation on with immediate action. See figure in full description.Futures: ENVRI’s common environment will empower the users of the collaborating environmental research infrastructures and enable multidisciplinary scientists to access, study and correlate data from multiple domains for "system level" research. It provides Bigdata requirements coming from interdisciplinary research. As shown in Figure 1 in full description, analysis of the computational characteristics of the 6 ESFRI Environmental Research infrastructure, 5 common subsystems has been identified. The definition of them are given in the ENVRI Reference Model, www.envri.eu/rm: Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system. Data curation: facilitates quality control and preservation of scientific data. It is typically operated at a data centre. Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem. Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments. Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities. As shown in Figure 2 of full description, the 5 sub-system map well to the architectures of the ESFRI Environmental Research Infrastructures.Futures: An order of magnitude more data (petabyte per mission) is projected with improved instrumentation. Demands of processing increasing field data in an environment with more data but still constrained power budget, suggests low power/performance architectures such as GPU systems. The full descriptions gives workflows for different parts of problem and pictures of operation.Futures: The data size would increase dramatically if Earth Radar Mission launched. Clouds are suitable hosts but are not used today in production.Futures: The improved access will be enabled through the use of iRODS that enables parallel downloads of datasets from selected replica servers that can be geographically dispersed, but still accessible by users worldwide. iRODS operation will be enhanced with semantically organized metadata, and managed via a highly precise Earth Science ontology. Cloud solutions will also be explored.Futures: Clouds are being investigated. The data is growing by one TB a month.Futures: The dataset will reach 500TB in 5 years. The initial turbulence case can be extended to other ocean/atmosphere phenomena but the analytics would be different in each case.

Page 53: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: Rapid growth of data with 30 PB produced at NERSC (assuming 15 end-to-end climate change experiments) in 2017 and many times more this worldwide.Futures: Little effort to date has been devoted to developing a framework for systematically connecting scales, as is needed to identify key controls and to simulate important feedbacks. GEWaSC will develop a simulation framework that formally scales from genomes to watersheds and will synthesize diverse and disparate field, laboratory, and simulation datasets across different semantic, spatial, and temporal scales.Futures: Field experiment data taking would be improved by access to existing data and automated entry of new data via mobile devices. Need to support interdisciplinary studies integrating diverse data sources.Futures: Wide spread deployment of Smart Grids with new analytics integrating diverse data and supporting curtailment requests. Mobile applications for client interactions.

Page 54: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: There are distributed data sources from federal agencies where current solution requires transfer of those data to a centralized storage. In the future, those data sources may reside in multiple Cloud environments. In this case, physical custody should avoid transferring big data from Cloud to Cloud or from Cloud to Data Center.Futures: Need to improve recommendation systems similar to those used in e-commerce (see Netflix use case) that reduce costs and improve quality while providing confidentiality safeguards that are reliable and publically auditable. Data visualization is useful for data review, operational activity and general analysis. It continues to evolve; mobile access important.Futures: Analytics needs to be developed which give statistical estimations that provide more detail, on a more near real time basis for less cost. The reliability of estimated statistics from such “mashed up” sources still must be evaluated.Futures: One must address GRC (Governance, Risk & Compliance) as well as CIA (Confidentiality, Integrity & Availability) issues, which can even be impacted by the SEC’s mandated use of XBRL (extensible Business Related Markup Language). While at the same time, addressing the influence these same issues can and will have within a Cloud Eco-system/Big Data environment, and their global impact across all Financial sectors.Futures: Currently Hadoop batch jobs are scheduled daily, but work has begun on real-time recommendation. The database contains ~400M documents, roughly 80M unique documents, and receives 5-700k new uploads on a weekday. Thus a major challenge is clustering matching documents together in a computationally efficient way (scalable and parallelized) when they’re uploaded from different sources and have been slightly modified via third-part annotation tools or publisher watermarks and cover pages.Futures: Very competitive business. Need to aware of other companies and trends in both content (which Movies are hot) and technology. Need to investigate new business initiatives such as Netflix sponsored contentFutures: A very competitive field where continuous innovation needed. Two important areas are addressing mobile clients which are a growing fraction of users and increasing sophistication of responses and layout to maximize total benefit of clients, advertisers and Search Company. The “deep web” (that behind user interfaces to databases etc.) and multimedia search of increasing importance. 500M photos uploaded each day and 100 hours of video uploaded to YouTube each minuteFutures: The complexities associated with migrating from a Primary Site to either a Replication Site or a Backup Site is not fully automated at this point in time. The goal is to enable the user to automatically initiate the Failover sequence. Both organizations must know which servers have to be restored and what are the dependencies and inter-dependencies between the Primary Site servers and Replication and/or Backup Site servers. This requires a continuous monitoring of both.Futures: This Internet of Things application needs to track items in real time. A new aspect will be status condition of the items which will include sensor information, GPS coordinates, and a unique identification schema based upon a new ISO 29161 standards under development within ISO JTC1 SC31 WG2. Futures: Materials informatics is an area in which the new tools of data science can have major impact by predicting the performance of real materials (gram to ton quantities) starting at the atomistic, nanometer, and/or micrometer level of description. One must establish materials data repositories beyond the existing ones that focus on fundamental data; one must develop internationally-accepted data recording standards that can be used by a very diverse materials community, including developers materials test standards (such as ASTM and ISO), testing companies, materials producers, and R&D labs; one needs tools and procedures to help organizations wishing to deposit proprietary materials in data repositories to mask proprietary information, yet to maintain the usability of data; one needs multi-variable materials data visualization tools, in which the number of variables can be quite highFutures: Need large scale computing at scale for simulation science. Flexible data methods at scale for messy data. Machine learning and knowledge systems that integrate data from publications, experiments, and simulations to advance goal-driven thinking in materials design. The current 100TB of data will become 500TB in 5 years.Futures: Today’s intelligence systems often contain trillions of geospatial objects and need to be able to visualize and interact with millions of objects. Critical issues are Indexing, retrieval and distributed analysis; Visualization generation and transmission; Visualization of data at the end of low bandwidth wireless connections; Data is sensitive and must be completely secure in transit and at rest (particularly on handhelds); Geospatial data requires unique approaches to indexing and distributed analysis.Futures: Typical problem is integration of this processing into a large (GPU) cluster capable of processing data from several sensors in parallel and in near real time. Transmission of data from sensor to system is also a major challenge.Futures: Data currently exists in disparate silos which must be accessible through a semantically integrated data space. Wide variety of data types, sources, structures, and quality which will span domains and requires integrated search and reasoning. Most critical data is either unstructured or imagery/video which requires significant processing to extract entities and information. Network quality, Provenance and security essential.Futures: Teradata, PostgreSQL, MongoDB running on Indiana University supercomputer supporting information retrieval methods to identify relevant clinical features (tf-idf, latent semantic analysis, mutual information). Natural Language Processing techniques to extract relevant clinical features. Validated features will be used to parameterize clinical phenotype decision models based on maximum likelihood estimators and Bayesian networks. Decision models will be used to identify a variety of clinical phenotypes such as diabetes, congestive heart failure, and pancreatic cancerFutures: Recently, 3D pathology imaging is made possible through 3D laser technologies or serially sectioning hundreds of tissue sections onto slides and scanning them into digital images. Segmenting 3D microanatomic objects from registered serial images could produce tens of millions of 3D objects from a single image. This provides a deep “map” of human tissues for next generation diagnosis. 1TB raw image data + 1TB analytical results per 3D image and 1PB data per moderated hospital per year.Futures: Our goal is to solve that bottleneck with extreme scale computing with community-focused science gateways to support the application of massive data analysis toward massive imaging data sets. Workflow components include data acquisition, storage, enhancement, minimizing noise, segmentation of regions of interest, crowd-based selection and extraction of features, and object classification, and organization, and search. Use ImageJ, OMERO, VolRover, advanced segmentation and feature detection software. Futures: DNA sequencers can generate ~300GB compressed data/day which volume has increased much faster than Moore’s Law. Future data could include other ‘omics’ measurements, which will be even larger than DNA sequencing. Clouds have been exploredFutures: Management of heterogeneity of biological data is currently performed by relational database management system (Oracle). Unfortunately, it does not scale for even the current volume 50TB of data. NoSQL solutions aim at providing an alternative but unfortunately they do not always lend themselves to real time interactive use, rapid and parallel bulk loading, and sometimes have issues regarding robustness. Futures: Identify similar patients from a large Electronic Health Record (EHR) database, i.e. an individualized cohort, and evaluate their respective management outcomes to formulate most appropriate solution suited for a given patient with diabetes. Use efficient parallel retrieval algorithms, suitable for cloud or HPC, using open source Hbase with both indexed and custom search to identify patients of possible interest. Use Semantic Linking for Property Values method to convert an existing data warehouse at Mayo Clinic, called the Enterprise Data Trust (EDT), into RDF triples that enables one to find similar patients through linking of both vocabulary-based and continuous values. The time dependent properties need to be processed before query to allow matching based on derivatives and other derived properties.Futures: A cohort of millions of patient can involve petabyte datasets. Issues include availability of too much data (as images, genetic sequences etc) that can make the analysis complicated. A major challenge lies in aligning the data and merging from multiple sources in a form that can be made useful for a combined analysis. Another issue is that sometimes, large amount of data is available about a single subject but the number of subjects themselves is not very high (i.e., data imbalance). This can result in learning algorithms picking up random correlations between the multiple data types as important features in analysis.

Futures: Data fusion a big issue; how should one combine data from different sources and how to deal with missing or incomplete data? take into account heterogeneous features of 100s of millions or billions of individuals, models of cultural variations across countries that are assigned to individual agents? How to validate these large models? Futures: LifeWatch initiative will provide integrated access to a variety of data, analytical and modeling tools as served by a variety of collaborating initiatives. Another service is offered with data and tools in selected workflows for specific scientific communities. In addition, LifeWatch will provide opportunities to construct personalized ‘virtual labs', also allowing one to enter new data and analytical tools. New data will be shared with the data facilities cooperating with LifeWatch. LifeWatch operates the Global Biodiversity Information facility and Biodiversity Catalogue that is Biodiversity Science Web Services Catalogue. Data includes ‘omics, species information, ecological information (such as biomass, population density etc.), ecosystem data (such as CO2 fluxes. Algal blooming, water and soil characteristics)Futures: Large datasets of 100TB or more may be necessary in order to exploit the representational power of the larger models. Training a self-driving car could take 100 million images at megapixel resolution. Deep Learning shares many characteristics with the broader field of machine learning. The paramount requirements are high computational throughput for mostly dense linear algebra operations, and extremely high productivity for researcher exploration. One needs integration of high performance libraries with high level (python) prototyping environments.Futures: Need many analytics including feature extraction, feature matching, and large-scale probabilistic inference, which appear in many or most computer vision and image processing problems, including recognition, stereo resolution, and image denoising. Need to visualize large-scale 3-d reconstructions, and navigate large-scale collections of images that have been aligned to maps.Futures: Truthy plans to expand incorporating Google+ and Facebook. Need to move towards Hadoop/IndexedHBase & HDFS distributed storage. Use Redis as a in-memory database as a buffer for real-time analysis. Need streaming clustering, anomaly detection and online learning.Futures: Crowd sourcing has been barely started to be used on a larger scale but with the availability of mobile devices, now there is a huge potential for collecting much data from many individuals, also making use of sensors in mobile devices. This has not been explored on a large scale so far; existing projects of crowd sourcing are usually of a limited scale and web-based. Privacy issues may be involved (A/V from individuals), anonymization may be necessary but not always possible. Data management and curation critical. Size could be hundreds of terabytes with multimedia.Futures: As the repository grows, we expect a rapid growth to lead to over 1000-5000 networks and methods in about a year. As more fields use graphs of increasing size, parallel algorithms will be important. Data manipulation and bookkeeping of the derived data for users is a challenge there are no well-defined and effective models and tools for management of various graph data in a unified fashionFutures: Even larger data collections are being planned for future evaluations of analytics involving multiple data streams and very heterogeneous data. As well as larger datasets, future includes testing of streaming algorithms with multiple heterogeneous data. Use of clouds being explored.

Current Approach: Currently 25 science and engineering domains have projects that rely on the iRODS (Integrated Rule Oriented Data System) policy-based data management system including major NSF projects such as Ocean Observatories Initiative (sensor archiving); Temporal Dynamics of Learning Center (Cognitive science data grid); the iPlant Collaborative (plant genomics); Drexel engineering digital library; Odum Institute for social science research (data grid federation with Dataverse). iRODS currently manages petabytes of data, hundreds of millions of files, hundreds of millions of metadata attributes, tens of thousands of users, and a thousand storage resources. It interoperates with workflow systems (NCSA Cyberintegrator, Kepler, Taverna), cloud and more traditional storage models and different transport protocols. The full description has a diagram of the iRODS architecture.Futures: Discinnet itself would not be Bigdata but rather will generate metadata when applied to a cluster that involves Bigdata. In interdisciplinary integration of several fields, the process would reconcile metadata from many complexity levels.Futures: Create a cloud infrastructure for social media of scientific information where many scientists from various parts of the world can participate and deposit results of their experiment. Some of the issues that one has to resolve prior to establishing a scientific social media are: a) How to minimize challenges related to establishing re-usable, inter-disciplinary, scalable, on-demand, use-case and user-friendly vocabulary? b) How to adopt a existing or create new on-demand ‘data-graph’ to place an information in an intuitive way such that it would easily integrate with existing ‘data-graphs’ in a federated environment without knowing too much about the data management? c) How to find relevant scientific data without spending too much time on the internet? Start with resources like the Open Government movement, Material genome Initiative and Protein Databank. This effort includes many local and networked resources. Developing an infrastructure to automatically integrate information from all these resources using data-graphs is a challenge that we are trying to solve. Good database tools and servers for data-graph manipulation are needed.Futures: Camera resolution is continually increasing. Data transfer to large-scale computing facilities is becoming necessary because of the computational power required to conduct the analysis on time scales useful to the experiment. Large number of beamlines (e.g. 39 at LBNL ALS) means that aggregate data load is likely to increase significantly over the coming years and need for a generalized infrastructure for analyzing gigabytes per second of data from many beamline detectors at multiple facilities. Futures: CRTS is a scientific and methodological testbed and precursor of larger surveys to come, notably the Large Synoptic Survey Telescope (LSST), expected to operate in 2020’s and selected as the highest-priority ground-based instrument in the 2010 Astronomy and Astrophysics Decadal Survey. LSST will gather about 30 TB per night. The schematic architecture for a cyber-infrastructure for time domain astronomy illustrated by a figure in full description.Futures: Data sizes are Dark Energy Survey (DES) 4 PB in 2015; Zwicky Transient Factory (ZTF) 1 PB/year in 2015; Large Synoptic Sky Survey (LSST see CRTS description) 7 PB/year in 2019; Simulations > 10 PB in 2017. Huge amounts of supercomputer time (over 200M hours) will be used.Futures: Techniques for handling Cholesky decompostion for thousands of simulations with matrices of order 1M on a side and parallel image storage would be important. LSST will generate 60PB of imaging data and 15PB of catalog data and a correspondingly large (or larger) amount of simulation data. Over 20TB of data per night.Futures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.Futures: An upgraded experiment Belle II and accelerator SuperKEKB will start operation in 2015 with a factor of 50 increased data with total integrated RAW data ~120PB and physics data ~15PB and ~100PB MC samples. Move to a distributed computing model requiring continuous RAW data transfer of ~20Gbps at designed luminosity between Japan and US. Will need Open Science Grid, Geant4, DIRAC, FTS, Belle II framework software.Futures: The design of the next generation radar, EISCAT_3D, will consist of a core site with a transmitting and receiving radar arrays and four sites with receiving antenna arrays at some 100 km from the core. The fully operational 5-site system will generate several thousand times data of current EISCAT system with 40 PB/year in 2022 and is expected to operate for 30 years. EISCAT 3D data e-Infrastructure plans to use the high performance computers for central site data processing and high throughput computers for mirror sites data processing. Downloading the full data is not time critical, but operations require real-time information about certain pre-defined events to be sent from the sites to the operation center and a real-time link from the operation center to the sites to set the mode of radar operation on with immediate action. See figure in full description.Futures: ENVRI’s common environment will empower the users of the collaborating environmental research infrastructures and enable multidisciplinary scientists to access, study and correlate data from multiple domains for "system level" research. It provides Bigdata requirements coming from interdisciplinary research. As shown in Figure 1 in full description, analysis of the computational characteristics of the 6 ESFRI Environmental Research infrastructure, 5 common subsystems has been identified. The definition of them are given in the ENVRI Reference Model, www.envri.eu/rm: Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system. Data curation: facilitates quality control and preservation of scientific data. It is typically operated at a data centre. Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem. Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments. Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities. As shown in Figure 2 of full description, the 5 sub-system map well to the architectures of the ESFRI Environmental Research Infrastructures.Futures: An order of magnitude more data (petabyte per mission) is projected with improved instrumentation. Demands of processing increasing field data in an environment with more data but still constrained power budget, suggests low power/performance architectures such as GPU systems. The full descriptions gives workflows for different parts of problem and pictures of operation.

Futures: The improved access will be enabled through the use of iRODS that enables parallel downloads of datasets from selected replica servers that can be geographically dispersed, but still accessible by users worldwide. iRODS operation will be enhanced with semantically organized metadata, and managed via a highly precise Earth Science ontology. Cloud solutions will also be explored.

Futures: The dataset will reach 500TB in 5 years. The initial turbulence case can be extended to other ocean/atmosphere phenomena but the analytics would be different in each case.

Page 55: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: Little effort to date has been devoted to developing a framework for systematically connecting scales, as is needed to identify key controls and to simulate important feedbacks. GEWaSC will develop a simulation framework that formally scales from genomes to watersheds and will synthesize diverse and disparate field, laboratory, and simulation datasets across different semantic, spatial, and temporal scales.Futures: Field experiment data taking would be improved by access to existing data and automated entry of new data via mobile devices. Need to support interdisciplinary studies integrating diverse data sources.Futures: Wide spread deployment of Smart Grids with new analytics integrating diverse data and supporting curtailment requests. Mobile applications for client interactions.

Page 56: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: There are distributed data sources from federal agencies where current solution requires transfer of those data to a centralized storage. In the future, those data sources may reside in multiple Cloud environments. In this case, physical custody should avoid transferring big data from Cloud to Cloud or from Cloud to Data Center.Futures: Need to improve recommendation systems similar to those used in e-commerce (see Netflix use case) that reduce costs and improve quality while providing confidentiality safeguards that are reliable and publically auditable. Data visualization is useful for data review, operational activity and general analysis. It continues to evolve; mobile access important.

Futures: One must address GRC (Governance, Risk & Compliance) as well as CIA (Confidentiality, Integrity & Availability) issues, which can even be impacted by the SEC’s mandated use of XBRL (extensible Business Related Markup Language). While at the same time, addressing the influence these same issues can and will have within a Cloud Eco-system/Big Data environment, and their global impact across all Financial sectors.Futures: Currently Hadoop batch jobs are scheduled daily, but work has begun on real-time recommendation. The database contains ~400M documents, roughly 80M unique documents, and receives 5-700k new uploads on a weekday. Thus a major challenge is clustering matching documents together in a computationally efficient way (scalable and parallelized) when they’re uploaded from different sources and have been slightly modified via third-part annotation tools or publisher watermarks and cover pages.

Futures: A very competitive field where continuous innovation needed. Two important areas are addressing mobile clients which are a growing fraction of users and increasing sophistication of responses and layout to maximize total benefit of clients, advertisers and Search Company. The “deep web” (that behind user interfaces to databases etc.) and multimedia search of increasing importance. 500M photos uploaded each day and 100 hours of video uploaded to YouTube each minuteFutures: The complexities associated with migrating from a Primary Site to either a Replication Site or a Backup Site is not fully automated at this point in time. The goal is to enable the user to automatically initiate the Failover sequence. Both organizations must know which servers have to be restored and what are the dependencies and inter-dependencies between the Primary Site servers and Replication and/or Backup Site servers. This requires a continuous monitoring of both.Futures: This Internet of Things application needs to track items in real time. A new aspect will be status condition of the items which will include sensor information, GPS coordinates, and a unique identification schema based upon a new ISO 29161 standards under development within ISO JTC1 SC31 WG2. Futures: Materials informatics is an area in which the new tools of data science can have major impact by predicting the performance of real materials (gram to ton quantities) starting at the atomistic, nanometer, and/or micrometer level of description. One must establish materials data repositories beyond the existing ones that focus on fundamental data; one must develop internationally-accepted data recording standards that can be used by a very diverse materials community, including developers materials test standards (such as ASTM and ISO), testing companies, materials producers, and R&D labs; one needs tools and procedures to help organizations wishing to deposit proprietary materials in data repositories to mask proprietary information, yet to maintain the usability of data; one needs multi-variable materials data visualization tools, in which the number of variables can be quite highFutures: Need large scale computing at scale for simulation science. Flexible data methods at scale for messy data. Machine learning and knowledge systems that integrate data from publications, experiments, and simulations to advance goal-driven thinking in materials design. The current 100TB of data will become 500TB in 5 years.Futures: Today’s intelligence systems often contain trillions of geospatial objects and need to be able to visualize and interact with millions of objects. Critical issues are Indexing, retrieval and distributed analysis; Visualization generation and transmission; Visualization of data at the end of low bandwidth wireless connections; Data is sensitive and must be completely secure in transit and at rest (particularly on handhelds); Geospatial data requires unique approaches to indexing and distributed analysis.

Futures: Data currently exists in disparate silos which must be accessible through a semantically integrated data space. Wide variety of data types, sources, structures, and quality which will span domains and requires integrated search and reasoning. Most critical data is either unstructured or imagery/video which requires significant processing to extract entities and information. Network quality, Provenance and security essential.Futures: Teradata, PostgreSQL, MongoDB running on Indiana University supercomputer supporting information retrieval methods to identify relevant clinical features (tf-idf, latent semantic analysis, mutual information). Natural Language Processing techniques to extract relevant clinical features. Validated features will be used to parameterize clinical phenotype decision models based on maximum likelihood estimators and Bayesian networks. Decision models will be used to identify a variety of clinical phenotypes such as diabetes, congestive heart failure, and pancreatic cancerFutures: Recently, 3D pathology imaging is made possible through 3D laser technologies or serially sectioning hundreds of tissue sections onto slides and scanning them into digital images. Segmenting 3D microanatomic objects from registered serial images could produce tens of millions of 3D objects from a single image. This provides a deep “map” of human tissues for next generation diagnosis. 1TB raw image data + 1TB analytical results per 3D image and 1PB data per moderated hospital per year.Futures: Our goal is to solve that bottleneck with extreme scale computing with community-focused science gateways to support the application of massive data analysis toward massive imaging data sets. Workflow components include data acquisition, storage, enhancement, minimizing noise, segmentation of regions of interest, crowd-based selection and extraction of features, and object classification, and organization, and search. Use ImageJ, OMERO, VolRover, advanced segmentation and feature detection software.

Futures: Management of heterogeneity of biological data is currently performed by relational database management system (Oracle). Unfortunately, it does not scale for even the current volume 50TB of data. NoSQL solutions aim at providing an alternative but unfortunately they do not always lend themselves to real time interactive use, rapid and parallel bulk loading, and sometimes have issues regarding robustness. Futures: Identify similar patients from a large Electronic Health Record (EHR) database, i.e. an individualized cohort, and evaluate their respective management outcomes to formulate most appropriate solution suited for a given patient with diabetes. Use efficient parallel retrieval algorithms, suitable for cloud or HPC, using open source Hbase with both indexed and custom search to identify patients of possible interest. Use Semantic Linking for Property Values method to convert an existing data warehouse at Mayo Clinic, called the Enterprise Data Trust (EDT), into RDF triples that enables one to find similar patients through linking of both vocabulary-based and continuous values. The time dependent properties need to be processed before query to allow matching based on derivatives and other derived properties.Futures: A cohort of millions of patient can involve petabyte datasets. Issues include availability of too much data (as images, genetic sequences etc) that can make the analysis complicated. A major challenge lies in aligning the data and merging from multiple sources in a form that can be made useful for a combined analysis. Another issue is that sometimes, large amount of data is available about a single subject but the number of subjects themselves is not very high (i.e., data imbalance). This can result in learning algorithms picking up random correlations between the multiple data types as important features in analysis.

Futures: Data fusion a big issue; how should one combine data from different sources and how to deal with missing or incomplete data? take into account heterogeneous features of 100s of millions or billions of individuals, models of cultural variations across countries that are assigned to individual agents? How to validate these large models? Futures: LifeWatch initiative will provide integrated access to a variety of data, analytical and modeling tools as served by a variety of collaborating initiatives. Another service is offered with data and tools in selected workflows for specific scientific communities. In addition, LifeWatch will provide opportunities to construct personalized ‘virtual labs', also allowing one to enter new data and analytical tools. New data will be shared with the data facilities cooperating with LifeWatch. LifeWatch operates the Global Biodiversity Information facility and Biodiversity Catalogue that is Biodiversity Science Web Services Catalogue. Data includes ‘omics, species information, ecological information (such as biomass, population density etc.), ecosystem data (such as CO2 fluxes. Algal blooming, water and soil characteristics)Futures: Large datasets of 100TB or more may be necessary in order to exploit the representational power of the larger models. Training a self-driving car could take 100 million images at megapixel resolution. Deep Learning shares many characteristics with the broader field of machine learning. The paramount requirements are high computational throughput for mostly dense linear algebra operations, and extremely high productivity for researcher exploration. One needs integration of high performance libraries with high level (python) prototyping environments.Futures: Need many analytics including feature extraction, feature matching, and large-scale probabilistic inference, which appear in many or most computer vision and image processing problems, including recognition, stereo resolution, and image denoising. Need to visualize large-scale 3-d reconstructions, and navigate large-scale collections of images that have been aligned to maps.Futures: Truthy plans to expand incorporating Google+ and Facebook. Need to move towards Hadoop/IndexedHBase & HDFS distributed storage. Use Redis as a in-memory database as a buffer for real-time analysis. Need streaming clustering, anomaly detection and online learning.Futures: Crowd sourcing has been barely started to be used on a larger scale but with the availability of mobile devices, now there is a huge potential for collecting much data from many individuals, also making use of sensors in mobile devices. This has not been explored on a large scale so far; existing projects of crowd sourcing are usually of a limited scale and web-based. Privacy issues may be involved (A/V from individuals), anonymization may be necessary but not always possible. Data management and curation critical. Size could be hundreds of terabytes with multimedia.Futures: As the repository grows, we expect a rapid growth to lead to over 1000-5000 networks and methods in about a year. As more fields use graphs of increasing size, parallel algorithms will be important. Data manipulation and bookkeeping of the derived data for users is a challenge there are no well-defined and effective models and tools for management of various graph data in a unified fashionFutures: Even larger data collections are being planned for future evaluations of analytics involving multiple data streams and very heterogeneous data. As well as larger datasets, future includes testing of streaming algorithms with multiple heterogeneous data. Use of clouds being explored.

Current Approach: Currently 25 science and engineering domains have projects that rely on the iRODS (Integrated Rule Oriented Data System) policy-based data management system including major NSF projects such as Ocean Observatories Initiative (sensor archiving); Temporal Dynamics of Learning Center (Cognitive science data grid); the iPlant Collaborative (plant genomics); Drexel engineering digital library; Odum Institute for social science research (data grid federation with Dataverse). iRODS currently manages petabytes of data, hundreds of millions of files, hundreds of millions of metadata attributes, tens of thousands of users, and a thousand storage resources. It interoperates with workflow systems (NCSA Cyberintegrator, Kepler, Taverna), cloud and more traditional storage models and different transport protocols. The full description has a diagram of the iRODS architecture.

Futures: Create a cloud infrastructure for social media of scientific information where many scientists from various parts of the world can participate and deposit results of their experiment. Some of the issues that one has to resolve prior to establishing a scientific social media are: a) How to minimize challenges related to establishing re-usable, inter-disciplinary, scalable, on-demand, use-case and user-friendly vocabulary? b) How to adopt a existing or create new on-demand ‘data-graph’ to place an information in an intuitive way such that it would easily integrate with existing ‘data-graphs’ in a federated environment without knowing too much about the data management? c) How to find relevant scientific data without spending too much time on the internet? Start with resources like the Open Government movement, Material genome Initiative and Protein Databank. This effort includes many local and networked resources. Developing an infrastructure to automatically integrate information from all these resources using data-graphs is a challenge that we are trying to solve. Good database tools and servers for data-graph manipulation are needed.Futures: Camera resolution is continually increasing. Data transfer to large-scale computing facilities is becoming necessary because of the computational power required to conduct the analysis on time scales useful to the experiment. Large number of beamlines (e.g. 39 at LBNL ALS) means that aggregate data load is likely to increase significantly over the coming years and need for a generalized infrastructure for analyzing gigabytes per second of data from many beamline detectors at multiple facilities. Futures: CRTS is a scientific and methodological testbed and precursor of larger surveys to come, notably the Large Synoptic Survey Telescope (LSST), expected to operate in 2020’s and selected as the highest-priority ground-based instrument in the 2010 Astronomy and Astrophysics Decadal Survey. LSST will gather about 30 TB per night. The schematic architecture for a cyber-infrastructure for time domain astronomy illustrated by a figure in full description.Futures: Data sizes are Dark Energy Survey (DES) 4 PB in 2015; Zwicky Transient Factory (ZTF) 1 PB/year in 2015; Large Synoptic Sky Survey (LSST see CRTS description) 7 PB/year in 2019; Simulations > 10 PB in 2017. Huge amounts of supercomputer time (over 200M hours) will be used.Futures: Techniques for handling Cholesky decompostion for thousands of simulations with matrices of order 1M on a side and parallel image storage would be important. LSST will generate 60PB of imaging data and 15PB of catalog data and a correspondingly large (or larger) amount of simulation data. Over 20TB of data per night.Futures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.Futures: An upgraded experiment Belle II and accelerator SuperKEKB will start operation in 2015 with a factor of 50 increased data with total integrated RAW data ~120PB and physics data ~15PB and ~100PB MC samples. Move to a distributed computing model requiring continuous RAW data transfer of ~20Gbps at designed luminosity between Japan and US. Will need Open Science Grid, Geant4, DIRAC, FTS, Belle II framework software.Futures: The design of the next generation radar, EISCAT_3D, will consist of a core site with a transmitting and receiving radar arrays and four sites with receiving antenna arrays at some 100 km from the core. The fully operational 5-site system will generate several thousand times data of current EISCAT system with 40 PB/year in 2022 and is expected to operate for 30 years. EISCAT 3D data e-Infrastructure plans to use the high performance computers for central site data processing and high throughput computers for mirror sites data processing. Downloading the full data is not time critical, but operations require real-time information about certain pre-defined events to be sent from the sites to the operation center and a real-time link from the operation center to the sites to set the mode of radar operation on with immediate action. See figure in full description.Futures: ENVRI’s common environment will empower the users of the collaborating environmental research infrastructures and enable multidisciplinary scientists to access, study and correlate data from multiple domains for "system level" research. It provides Bigdata requirements coming from interdisciplinary research. As shown in Figure 1 in full description, analysis of the computational characteristics of the 6 ESFRI Environmental Research infrastructure, 5 common subsystems has been identified. The definition of them are given in the ENVRI Reference Model, www.envri.eu/rm: Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system. Data curation: facilitates quality control and preservation of scientific data. It is typically operated at a data centre. Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem. Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments. Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities. As shown in Figure 2 of full description, the 5 sub-system map well to the architectures of the ESFRI Environmental Research Infrastructures.Futures: An order of magnitude more data (petabyte per mission) is projected with improved instrumentation. Demands of processing increasing field data in an environment with more data but still constrained power budget, suggests low power/performance architectures such as GPU systems. The full descriptions gives workflows for different parts of problem and pictures of operation.

Futures: The improved access will be enabled through the use of iRODS that enables parallel downloads of datasets from selected replica servers that can be geographically dispersed, but still accessible by users worldwide. iRODS operation will be enhanced with semantically organized metadata, and managed via a highly precise Earth Science ontology. Cloud solutions will also be explored.

Page 57: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: Little effort to date has been devoted to developing a framework for systematically connecting scales, as is needed to identify key controls and to simulate important feedbacks. GEWaSC will develop a simulation framework that formally scales from genomes to watersheds and will synthesize diverse and disparate field, laboratory, and simulation datasets across different semantic, spatial, and temporal scales.

Page 58: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: One must address GRC (Governance, Risk & Compliance) as well as CIA (Confidentiality, Integrity & Availability) issues, which can even be impacted by the SEC’s mandated use of XBRL (extensible Business Related Markup Language). While at the same time, addressing the influence these same issues can and will have within a Cloud Eco-system/Big Data environment, and their global impact across all Financial sectors.Futures: Currently Hadoop batch jobs are scheduled daily, but work has begun on real-time recommendation. The database contains ~400M documents, roughly 80M unique documents, and receives 5-700k new uploads on a weekday. Thus a major challenge is clustering matching documents together in a computationally efficient way (scalable and parallelized) when they’re uploaded from different sources and have been slightly modified via third-part annotation tools or publisher watermarks and cover pages.

Futures: A very competitive field where continuous innovation needed. Two important areas are addressing mobile clients which are a growing fraction of users and increasing sophistication of responses and layout to maximize total benefit of clients, advertisers and Search Company. The “deep web” (that behind user interfaces to databases etc.) and multimedia search of increasing importance. 500M photos uploaded each day and 100 hours of video uploaded to YouTube each minuteFutures: The complexities associated with migrating from a Primary Site to either a Replication Site or a Backup Site is not fully automated at this point in time. The goal is to enable the user to automatically initiate the Failover sequence. Both organizations must know which servers have to be restored and what are the dependencies and inter-dependencies between the Primary Site servers and Replication and/or Backup Site servers. This requires a continuous monitoring of both.

Futures: Materials informatics is an area in which the new tools of data science can have major impact by predicting the performance of real materials (gram to ton quantities) starting at the atomistic, nanometer, and/or micrometer level of description. One must establish materials data repositories beyond the existing ones that focus on fundamental data; one must develop internationally-accepted data recording standards that can be used by a very diverse materials community, including developers materials test standards (such as ASTM and ISO), testing companies, materials producers, and R&D labs; one needs tools and procedures to help organizations wishing to deposit proprietary materials in data repositories to mask proprietary information, yet to maintain the usability of data; one needs multi-variable materials data visualization tools, in which the number of variables can be quite high

Futures: Today’s intelligence systems often contain trillions of geospatial objects and need to be able to visualize and interact with millions of objects. Critical issues are Indexing, retrieval and distributed analysis; Visualization generation and transmission; Visualization of data at the end of low bandwidth wireless connections; Data is sensitive and must be completely secure in transit and at rest (particularly on handhelds); Geospatial data requires unique approaches to indexing and distributed analysis.

Futures: Data currently exists in disparate silos which must be accessible through a semantically integrated data space. Wide variety of data types, sources, structures, and quality which will span domains and requires integrated search and reasoning. Most critical data is either unstructured or imagery/video which requires significant processing to extract entities and information. Network quality, Provenance and security essential.Futures: Teradata, PostgreSQL, MongoDB running on Indiana University supercomputer supporting information retrieval methods to identify relevant clinical features (tf-idf, latent semantic analysis, mutual information). Natural Language Processing techniques to extract relevant clinical features. Validated features will be used to parameterize clinical phenotype decision models based on maximum likelihood estimators and Bayesian networks. Decision models will be used to identify a variety of clinical phenotypes such as diabetes, congestive heart failure, and pancreatic cancerFutures: Recently, 3D pathology imaging is made possible through 3D laser technologies or serially sectioning hundreds of tissue sections onto slides and scanning them into digital images. Segmenting 3D microanatomic objects from registered serial images could produce tens of millions of 3D objects from a single image. This provides a deep “map” of human tissues for next generation diagnosis. 1TB raw image data + 1TB analytical results per 3D image and 1PB data per moderated hospital per year.Futures: Our goal is to solve that bottleneck with extreme scale computing with community-focused science gateways to support the application of massive data analysis toward massive imaging data sets. Workflow components include data acquisition, storage, enhancement, minimizing noise, segmentation of regions of interest, crowd-based selection and extraction of features, and object classification, and organization, and search. Use ImageJ, OMERO, VolRover, advanced segmentation and feature detection software.

Futures: Management of heterogeneity of biological data is currently performed by relational database management system (Oracle). Unfortunately, it does not scale for even the current volume 50TB of data. NoSQL solutions aim at providing an alternative but unfortunately they do not always lend themselves to real time interactive use, rapid and parallel bulk loading, and sometimes have issues regarding robustness. Futures: Identify similar patients from a large Electronic Health Record (EHR) database, i.e. an individualized cohort, and evaluate their respective management outcomes to formulate most appropriate solution suited for a given patient with diabetes. Use efficient parallel retrieval algorithms, suitable for cloud or HPC, using open source Hbase with both indexed and custom search to identify patients of possible interest. Use Semantic Linking for Property Values method to convert an existing data warehouse at Mayo Clinic, called the Enterprise Data Trust (EDT), into RDF triples that enables one to find similar patients through linking of both vocabulary-based and continuous values. The time dependent properties need to be processed before query to allow matching based on derivatives and other derived properties.Futures: A cohort of millions of patient can involve petabyte datasets. Issues include availability of too much data (as images, genetic sequences etc) that can make the analysis complicated. A major challenge lies in aligning the data and merging from multiple sources in a form that can be made useful for a combined analysis. Another issue is that sometimes, large amount of data is available about a single subject but the number of subjects themselves is not very high (i.e., data imbalance). This can result in learning algorithms picking up random correlations between the multiple data types as important features in analysis.

Futures: LifeWatch initiative will provide integrated access to a variety of data, analytical and modeling tools as served by a variety of collaborating initiatives. Another service is offered with data and tools in selected workflows for specific scientific communities. In addition, LifeWatch will provide opportunities to construct personalized ‘virtual labs', also allowing one to enter new data and analytical tools. New data will be shared with the data facilities cooperating with LifeWatch. LifeWatch operates the Global Biodiversity Information facility and Biodiversity Catalogue that is Biodiversity Science Web Services Catalogue. Data includes ‘omics, species information, ecological information (such as biomass, population density etc.), ecosystem data (such as CO2 fluxes. Algal blooming, water and soil characteristics)Futures: Large datasets of 100TB or more may be necessary in order to exploit the representational power of the larger models. Training a self-driving car could take 100 million images at megapixel resolution. Deep Learning shares many characteristics with the broader field of machine learning. The paramount requirements are high computational throughput for mostly dense linear algebra operations, and extremely high productivity for researcher exploration. One needs integration of high performance libraries with high level (python) prototyping environments.Futures: Need many analytics including feature extraction, feature matching, and large-scale probabilistic inference, which appear in many or most computer vision and image processing problems, including recognition, stereo resolution, and image denoising. Need to visualize large-scale 3-d reconstructions, and navigate large-scale collections of images that have been aligned to maps.

Futures: Crowd sourcing has been barely started to be used on a larger scale but with the availability of mobile devices, now there is a huge potential for collecting much data from many individuals, also making use of sensors in mobile devices. This has not been explored on a large scale so far; existing projects of crowd sourcing are usually of a limited scale and web-based. Privacy issues may be involved (A/V from individuals), anonymization may be necessary but not always possible. Data management and curation critical. Size could be hundreds of terabytes with multimedia.Futures: As the repository grows, we expect a rapid growth to lead to over 1000-5000 networks and methods in about a year. As more fields use graphs of increasing size, parallel algorithms will be important. Data manipulation and bookkeeping of the derived data for users is a challenge there are no well-defined and effective models and tools for management of various graph data in a unified fashion

Current Approach: Currently 25 science and engineering domains have projects that rely on the iRODS (Integrated Rule Oriented Data System) policy-based data management system including major NSF projects such as Ocean Observatories Initiative (sensor archiving); Temporal Dynamics of Learning Center (Cognitive science data grid); the iPlant Collaborative (plant genomics); Drexel engineering digital library; Odum Institute for social science research (data grid federation with Dataverse). iRODS currently manages petabytes of data, hundreds of millions of files, hundreds of millions of metadata attributes, tens of thousands of users, and a thousand storage resources. It interoperates with workflow systems (NCSA Cyberintegrator, Kepler, Taverna), cloud and more traditional storage models and different transport protocols. The full description has a diagram of the iRODS architecture.

Futures: Create a cloud infrastructure for social media of scientific information where many scientists from various parts of the world can participate and deposit results of their experiment. Some of the issues that one has to resolve prior to establishing a scientific social media are: a) How to minimize challenges related to establishing re-usable, inter-disciplinary, scalable, on-demand, use-case and user-friendly vocabulary? b) How to adopt a existing or create new on-demand ‘data-graph’ to place an information in an intuitive way such that it would easily integrate with existing ‘data-graphs’ in a federated environment without knowing too much about the data management? c) How to find relevant scientific data without spending too much time on the internet? Start with resources like the Open Government movement, Material genome Initiative and Protein Databank. This effort includes many local and networked resources. Developing an infrastructure to automatically integrate information from all these resources using data-graphs is a challenge that we are trying to solve. Good database tools and servers for data-graph manipulation are needed.Futures: Camera resolution is continually increasing. Data transfer to large-scale computing facilities is becoming necessary because of the computational power required to conduct the analysis on time scales useful to the experiment. Large number of beamlines (e.g. 39 at LBNL ALS) means that aggregate data load is likely to increase significantly over the coming years and need for a generalized infrastructure for analyzing gigabytes per second of data from many beamline detectors at multiple facilities. Futures: CRTS is a scientific and methodological testbed and precursor of larger surveys to come, notably the Large Synoptic Survey Telescope (LSST), expected to operate in 2020’s and selected as the highest-priority ground-based instrument in the 2010 Astronomy and Astrophysics Decadal Survey. LSST will gather about 30 TB per night. The schematic architecture for a cyber-infrastructure for time domain astronomy illustrated by a figure in full description.

Futures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.Futures: An upgraded experiment Belle II and accelerator SuperKEKB will start operation in 2015 with a factor of 50 increased data with total integrated RAW data ~120PB and physics data ~15PB and ~100PB MC samples. Move to a distributed computing model requiring continuous RAW data transfer of ~20Gbps at designed luminosity between Japan and US. Will need Open Science Grid, Geant4, DIRAC, FTS, Belle II framework software.Futures: The design of the next generation radar, EISCAT_3D, will consist of a core site with a transmitting and receiving radar arrays and four sites with receiving antenna arrays at some 100 km from the core. The fully operational 5-site system will generate several thousand times data of current EISCAT system with 40 PB/year in 2022 and is expected to operate for 30 years. EISCAT 3D data e-Infrastructure plans to use the high performance computers for central site data processing and high throughput computers for mirror sites data processing. Downloading the full data is not time critical, but operations require real-time information about certain pre-defined events to be sent from the sites to the operation center and a real-time link from the operation center to the sites to set the mode of radar operation on with immediate action. See figure in full description.Futures: ENVRI’s common environment will empower the users of the collaborating environmental research infrastructures and enable multidisciplinary scientists to access, study and correlate data from multiple domains for "system level" research. It provides Bigdata requirements coming from interdisciplinary research. As shown in Figure 1 in full description, analysis of the computational characteristics of the 6 ESFRI Environmental Research infrastructure, 5 common subsystems has been identified. The definition of them are given in the ENVRI Reference Model, www.envri.eu/rm: Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system. Data curation: facilitates quality control and preservation of scientific data. It is typically operated at a data centre. Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem. Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments. Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities. As shown in Figure 2 of full description, the 5 sub-system map well to the architectures of the ESFRI Environmental Research Infrastructures.Futures: An order of magnitude more data (petabyte per mission) is projected with improved instrumentation. Demands of processing increasing field data in an environment with more data but still constrained power budget, suggests low power/performance architectures such as GPU systems. The full descriptions gives workflows for different parts of problem and pictures of operation.

Futures: The improved access will be enabled through the use of iRODS that enables parallel downloads of datasets from selected replica servers that can be geographically dispersed, but still accessible by users worldwide. iRODS operation will be enhanced with semantically organized metadata, and managed via a highly precise Earth Science ontology. Cloud solutions will also be explored.

Page 59: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: Little effort to date has been devoted to developing a framework for systematically connecting scales, as is needed to identify key controls and to simulate important feedbacks. GEWaSC will develop a simulation framework that formally scales from genomes to watersheds and will synthesize diverse and disparate field, laboratory, and simulation datasets across different semantic, spatial, and temporal scales.

Page 60: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: Currently Hadoop batch jobs are scheduled daily, but work has begun on real-time recommendation. The database contains ~400M documents, roughly 80M unique documents, and receives 5-700k new uploads on a weekday. Thus a major challenge is clustering matching documents together in a computationally efficient way (scalable and parallelized) when they’re uploaded from different sources and have been slightly modified via third-part annotation tools or publisher watermarks and cover pages.

Futures: Materials informatics is an area in which the new tools of data science can have major impact by predicting the performance of real materials (gram to ton quantities) starting at the atomistic, nanometer, and/or micrometer level of description. One must establish materials data repositories beyond the existing ones that focus on fundamental data; one must develop internationally-accepted data recording standards that can be used by a very diverse materials community, including developers materials test standards (such as ASTM and ISO), testing companies, materials producers, and R&D labs; one needs tools and procedures to help organizations wishing to deposit proprietary materials in data repositories to mask proprietary information, yet to maintain the usability of data; one needs multi-variable materials data visualization tools, in which the number of variables can be quite high

Futures: Today’s intelligence systems often contain trillions of geospatial objects and need to be able to visualize and interact with millions of objects. Critical issues are Indexing, retrieval and distributed analysis; Visualization generation and transmission; Visualization of data at the end of low bandwidth wireless connections; Data is sensitive and must be completely secure in transit and at rest (particularly on handhelds); Geospatial data requires unique approaches to indexing and distributed analysis.

Futures: Teradata, PostgreSQL, MongoDB running on Indiana University supercomputer supporting information retrieval methods to identify relevant clinical features (tf-idf, latent semantic analysis, mutual information). Natural Language Processing techniques to extract relevant clinical features. Validated features will be used to parameterize clinical phenotype decision models based on maximum likelihood estimators and Bayesian networks. Decision models will be used to identify a variety of clinical phenotypes such as diabetes, congestive heart failure, and pancreatic cancerFutures: Recently, 3D pathology imaging is made possible through 3D laser technologies or serially sectioning hundreds of tissue sections onto slides and scanning them into digital images. Segmenting 3D microanatomic objects from registered serial images could produce tens of millions of 3D objects from a single image. This provides a deep “map” of human tissues for next generation diagnosis. 1TB raw image data + 1TB analytical results per 3D image and 1PB data per moderated hospital per year.Futures: Our goal is to solve that bottleneck with extreme scale computing with community-focused science gateways to support the application of massive data analysis toward massive imaging data sets. Workflow components include data acquisition, storage, enhancement, minimizing noise, segmentation of regions of interest, crowd-based selection and extraction of features, and object classification, and organization, and search. Use ImageJ, OMERO, VolRover, advanced segmentation and feature detection software.

Futures: Identify similar patients from a large Electronic Health Record (EHR) database, i.e. an individualized cohort, and evaluate their respective management outcomes to formulate most appropriate solution suited for a given patient with diabetes. Use efficient parallel retrieval algorithms, suitable for cloud or HPC, using open source Hbase with both indexed and custom search to identify patients of possible interest. Use Semantic Linking for Property Values method to convert an existing data warehouse at Mayo Clinic, called the Enterprise Data Trust (EDT), into RDF triples that enables one to find similar patients through linking of both vocabulary-based and continuous values. The time dependent properties need to be processed before query to allow matching based on derivatives and other derived properties.Futures: A cohort of millions of patient can involve petabyte datasets. Issues include availability of too much data (as images, genetic sequences etc) that can make the analysis complicated. A major challenge lies in aligning the data and merging from multiple sources in a form that can be made useful for a combined analysis. Another issue is that sometimes, large amount of data is available about a single subject but the number of subjects themselves is not very high (i.e., data imbalance). This can result in learning algorithms picking up random correlations between the multiple data types as important features in analysis.

Futures: LifeWatch initiative will provide integrated access to a variety of data, analytical and modeling tools as served by a variety of collaborating initiatives. Another service is offered with data and tools in selected workflows for specific scientific communities. In addition, LifeWatch will provide opportunities to construct personalized ‘virtual labs', also allowing one to enter new data and analytical tools. New data will be shared with the data facilities cooperating with LifeWatch. LifeWatch operates the Global Biodiversity Information facility and Biodiversity Catalogue that is Biodiversity Science Web Services Catalogue. Data includes ‘omics, species information, ecological information (such as biomass, population density etc.), ecosystem data (such as CO2 fluxes. Algal blooming, water and soil characteristics)Futures: Large datasets of 100TB or more may be necessary in order to exploit the representational power of the larger models. Training a self-driving car could take 100 million images at megapixel resolution. Deep Learning shares many characteristics with the broader field of machine learning. The paramount requirements are high computational throughput for mostly dense linear algebra operations, and extremely high productivity for researcher exploration. One needs integration of high performance libraries with high level (python) prototyping environments.

Futures: Crowd sourcing has been barely started to be used on a larger scale but with the availability of mobile devices, now there is a huge potential for collecting much data from many individuals, also making use of sensors in mobile devices. This has not been explored on a large scale so far; existing projects of crowd sourcing are usually of a limited scale and web-based. Privacy issues may be involved (A/V from individuals), anonymization may be necessary but not always possible. Data management and curation critical. Size could be hundreds of terabytes with multimedia.

Current Approach: Currently 25 science and engineering domains have projects that rely on the iRODS (Integrated Rule Oriented Data System) policy-based data management system including major NSF projects such as Ocean Observatories Initiative (sensor archiving); Temporal Dynamics of Learning Center (Cognitive science data grid); the iPlant Collaborative (plant genomics); Drexel engineering digital library; Odum Institute for social science research (data grid federation with Dataverse). iRODS currently manages petabytes of data, hundreds of millions of files, hundreds of millions of metadata attributes, tens of thousands of users, and a thousand storage resources. It interoperates with workflow systems (NCSA Cyberintegrator, Kepler, Taverna), cloud and more traditional storage models and different transport protocols. The full description has a diagram of the iRODS architecture.

Futures: Create a cloud infrastructure for social media of scientific information where many scientists from various parts of the world can participate and deposit results of their experiment. Some of the issues that one has to resolve prior to establishing a scientific social media are: a) How to minimize challenges related to establishing re-usable, inter-disciplinary, scalable, on-demand, use-case and user-friendly vocabulary? b) How to adopt a existing or create new on-demand ‘data-graph’ to place an information in an intuitive way such that it would easily integrate with existing ‘data-graphs’ in a federated environment without knowing too much about the data management? c) How to find relevant scientific data without spending too much time on the internet? Start with resources like the Open Government movement, Material genome Initiative and Protein Databank. This effort includes many local and networked resources. Developing an infrastructure to automatically integrate information from all these resources using data-graphs is a challenge that we are trying to solve. Good database tools and servers for data-graph manipulation are needed.Futures: Camera resolution is continually increasing. Data transfer to large-scale computing facilities is becoming necessary because of the computational power required to conduct the analysis on time scales useful to the experiment. Large number of beamlines (e.g. 39 at LBNL ALS) means that aggregate data load is likely to increase significantly over the coming years and need for a generalized infrastructure for analyzing gigabytes per second of data from many beamline detectors at multiple facilities.

Futures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.

Futures: The design of the next generation radar, EISCAT_3D, will consist of a core site with a transmitting and receiving radar arrays and four sites with receiving antenna arrays at some 100 km from the core. The fully operational 5-site system will generate several thousand times data of current EISCAT system with 40 PB/year in 2022 and is expected to operate for 30 years. EISCAT 3D data e-Infrastructure plans to use the high performance computers for central site data processing and high throughput computers for mirror sites data processing. Downloading the full data is not time critical, but operations require real-time information about certain pre-defined events to be sent from the sites to the operation center and a real-time link from the operation center to the sites to set the mode of radar operation on with immediate action. See figure in full description.Futures: ENVRI’s common environment will empower the users of the collaborating environmental research infrastructures and enable multidisciplinary scientists to access, study and correlate data from multiple domains for "system level" research. It provides Bigdata requirements coming from interdisciplinary research. As shown in Figure 1 in full description, analysis of the computational characteristics of the 6 ESFRI Environmental Research infrastructure, 5 common subsystems has been identified. The definition of them are given in the ENVRI Reference Model, www.envri.eu/rm: Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system. Data curation: facilitates quality control and preservation of scientific data. It is typically operated at a data centre. Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem. Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments. Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities. As shown in Figure 2 of full description, the 5 sub-system map well to the architectures of the ESFRI Environmental Research Infrastructures.

Page 61: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: Materials informatics is an area in which the new tools of data science can have major impact by predicting the performance of real materials (gram to ton quantities) starting at the atomistic, nanometer, and/or micrometer level of description. One must establish materials data repositories beyond the existing ones that focus on fundamental data; one must develop internationally-accepted data recording standards that can be used by a very diverse materials community, including developers materials test standards (such as ASTM and ISO), testing companies, materials producers, and R&D labs; one needs tools and procedures to help organizations wishing to deposit proprietary materials in data repositories to mask proprietary information, yet to maintain the usability of data; one needs multi-variable materials data visualization tools, in which the number of variables can be quite high

Futures: Identify similar patients from a large Electronic Health Record (EHR) database, i.e. an individualized cohort, and evaluate their respective management outcomes to formulate most appropriate solution suited for a given patient with diabetes. Use efficient parallel retrieval algorithms, suitable for cloud or HPC, using open source Hbase with both indexed and custom search to identify patients of possible interest. Use Semantic Linking for Property Values method to convert an existing data warehouse at Mayo Clinic, called the Enterprise Data Trust (EDT), into RDF triples that enables one to find similar patients through linking of both vocabulary-based and continuous values. The time dependent properties need to be processed before query to allow matching based on derivatives and other derived properties.Futures: A cohort of millions of patient can involve petabyte datasets. Issues include availability of too much data (as images, genetic sequences etc) that can make the analysis complicated. A major challenge lies in aligning the data and merging from multiple sources in a form that can be made useful for a combined analysis. Another issue is that sometimes, large amount of data is available about a single subject but the number of subjects themselves is not very high (i.e., data imbalance). This can result in learning algorithms picking up random correlations between the multiple data types as important features in analysis.

Futures: LifeWatch initiative will provide integrated access to a variety of data, analytical and modeling tools as served by a variety of collaborating initiatives. Another service is offered with data and tools in selected workflows for specific scientific communities. In addition, LifeWatch will provide opportunities to construct personalized ‘virtual labs', also allowing one to enter new data and analytical tools. New data will be shared with the data facilities cooperating with LifeWatch. LifeWatch operates the Global Biodiversity Information facility and Biodiversity Catalogue that is Biodiversity Science Web Services Catalogue. Data includes ‘omics, species information, ecological information (such as biomass, population density etc.), ecosystem data (such as CO2 fluxes. Algal blooming, water and soil characteristics)

Current Approach: Currently 25 science and engineering domains have projects that rely on the iRODS (Integrated Rule Oriented Data System) policy-based data management system including major NSF projects such as Ocean Observatories Initiative (sensor archiving); Temporal Dynamics of Learning Center (Cognitive science data grid); the iPlant Collaborative (plant genomics); Drexel engineering digital library; Odum Institute for social science research (data grid federation with Dataverse). iRODS currently manages petabytes of data, hundreds of millions of files, hundreds of millions of metadata attributes, tens of thousands of users, and a thousand storage resources. It interoperates with workflow systems (NCSA Cyberintegrator, Kepler, Taverna), cloud and more traditional storage models and different transport protocols. The full description has a diagram of the iRODS architecture.

Futures: Create a cloud infrastructure for social media of scientific information where many scientists from various parts of the world can participate and deposit results of their experiment. Some of the issues that one has to resolve prior to establishing a scientific social media are: a) How to minimize challenges related to establishing re-usable, inter-disciplinary, scalable, on-demand, use-case and user-friendly vocabulary? b) How to adopt a existing or create new on-demand ‘data-graph’ to place an information in an intuitive way such that it would easily integrate with existing ‘data-graphs’ in a federated environment without knowing too much about the data management? c) How to find relevant scientific data without spending too much time on the internet? Start with resources like the Open Government movement, Material genome Initiative and Protein Databank. This effort includes many local and networked resources. Developing an infrastructure to automatically integrate information from all these resources using data-graphs is a challenge that we are trying to solve. Good database tools and servers for data-graph manipulation are needed.

Futures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.

Futures: The design of the next generation radar, EISCAT_3D, will consist of a core site with a transmitting and receiving radar arrays and four sites with receiving antenna arrays at some 100 km from the core. The fully operational 5-site system will generate several thousand times data of current EISCAT system with 40 PB/year in 2022 and is expected to operate for 30 years. EISCAT 3D data e-Infrastructure plans to use the high performance computers for central site data processing and high throughput computers for mirror sites data processing. Downloading the full data is not time critical, but operations require real-time information about certain pre-defined events to be sent from the sites to the operation center and a real-time link from the operation center to the sites to set the mode of radar operation on with immediate action. See figure in full description.Futures: ENVRI’s common environment will empower the users of the collaborating environmental research infrastructures and enable multidisciplinary scientists to access, study and correlate data from multiple domains for "system level" research. It provides Bigdata requirements coming from interdisciplinary research. As shown in Figure 1 in full description, analysis of the computational characteristics of the 6 ESFRI Environmental Research infrastructure, 5 common subsystems has been identified. The definition of them are given in the ENVRI Reference Model, www.envri.eu/rm: Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system. Data curation: facilitates quality control and preservation of scientific data. It is typically operated at a data centre. Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem. Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments. Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities. As shown in Figure 2 of full description, the 5 sub-system map well to the architectures of the ESFRI Environmental Research Infrastructures.

Page 62: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: Materials informatics is an area in which the new tools of data science can have major impact by predicting the performance of real materials (gram to ton quantities) starting at the atomistic, nanometer, and/or micrometer level of description. One must establish materials data repositories beyond the existing ones that focus on fundamental data; one must develop internationally-accepted data recording standards that can be used by a very diverse materials community, including developers materials test standards (such as ASTM and ISO), testing companies, materials producers, and R&D labs; one needs tools and procedures to help organizations wishing to deposit proprietary materials in data repositories to mask proprietary information, yet to maintain the usability of data; one needs multi-variable materials data visualization tools, in which the number of variables can be quite high

Futures: Identify similar patients from a large Electronic Health Record (EHR) database, i.e. an individualized cohort, and evaluate their respective management outcomes to formulate most appropriate solution suited for a given patient with diabetes. Use efficient parallel retrieval algorithms, suitable for cloud or HPC, using open source Hbase with both indexed and custom search to identify patients of possible interest. Use Semantic Linking for Property Values method to convert an existing data warehouse at Mayo Clinic, called the Enterprise Data Trust (EDT), into RDF triples that enables one to find similar patients through linking of both vocabulary-based and continuous values. The time dependent properties need to be processed before query to allow matching based on derivatives and other derived properties.

Futures: LifeWatch initiative will provide integrated access to a variety of data, analytical and modeling tools as served by a variety of collaborating initiatives. Another service is offered with data and tools in selected workflows for specific scientific communities. In addition, LifeWatch will provide opportunities to construct personalized ‘virtual labs', also allowing one to enter new data and analytical tools. New data will be shared with the data facilities cooperating with LifeWatch. LifeWatch operates the Global Biodiversity Information facility and Biodiversity Catalogue that is Biodiversity Science Web Services Catalogue. Data includes ‘omics, species information, ecological information (such as biomass, population density etc.), ecosystem data (such as CO2 fluxes. Algal blooming, water and soil characteristics)

Current Approach: Currently 25 science and engineering domains have projects that rely on the iRODS (Integrated Rule Oriented Data System) policy-based data management system including major NSF projects such as Ocean Observatories Initiative (sensor archiving); Temporal Dynamics of Learning Center (Cognitive science data grid); the iPlant Collaborative (plant genomics); Drexel engineering digital library; Odum Institute for social science research (data grid federation with Dataverse). iRODS currently manages petabytes of data, hundreds of millions of files, hundreds of millions of metadata attributes, tens of thousands of users, and a thousand storage resources. It interoperates with workflow systems (NCSA Cyberintegrator, Kepler, Taverna), cloud and more traditional storage models and different transport protocols. The full description has a diagram of the iRODS architecture.

Futures: Create a cloud infrastructure for social media of scientific information where many scientists from various parts of the world can participate and deposit results of their experiment. Some of the issues that one has to resolve prior to establishing a scientific social media are: a) How to minimize challenges related to establishing re-usable, inter-disciplinary, scalable, on-demand, use-case and user-friendly vocabulary? b) How to adopt a existing or create new on-demand ‘data-graph’ to place an information in an intuitive way such that it would easily integrate with existing ‘data-graphs’ in a federated environment without knowing too much about the data management? c) How to find relevant scientific data without spending too much time on the internet? Start with resources like the Open Government movement, Material genome Initiative and Protein Databank. This effort includes many local and networked resources. Developing an infrastructure to automatically integrate information from all these resources using data-graphs is a challenge that we are trying to solve. Good database tools and servers for data-graph manipulation are needed.

Futures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.

Futures: The design of the next generation radar, EISCAT_3D, will consist of a core site with a transmitting and receiving radar arrays and four sites with receiving antenna arrays at some 100 km from the core. The fully operational 5-site system will generate several thousand times data of current EISCAT system with 40 PB/year in 2022 and is expected to operate for 30 years. EISCAT 3D data e-Infrastructure plans to use the high performance computers for central site data processing and high throughput computers for mirror sites data processing. Downloading the full data is not time critical, but operations require real-time information about certain pre-defined events to be sent from the sites to the operation center and a real-time link from the operation center to the sites to set the mode of radar operation on with immediate action. See figure in full description.Futures: ENVRI’s common environment will empower the users of the collaborating environmental research infrastructures and enable multidisciplinary scientists to access, study and correlate data from multiple domains for "system level" research. It provides Bigdata requirements coming from interdisciplinary research. As shown in Figure 1 in full description, analysis of the computational characteristics of the 6 ESFRI Environmental Research infrastructure, 5 common subsystems has been identified. The definition of them are given in the ENVRI Reference Model, www.envri.eu/rm: Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system. Data curation: facilitates quality control and preservation of scientific data. It is typically operated at a data centre. Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem. Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments. Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities. As shown in Figure 2 of full description, the 5 sub-system map well to the architectures of the ESFRI Environmental Research Infrastructures.

Page 63: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: Materials informatics is an area in which the new tools of data science can have major impact by predicting the performance of real materials (gram to ton quantities) starting at the atomistic, nanometer, and/or micrometer level of description. One must establish materials data repositories beyond the existing ones that focus on fundamental data; one must develop internationally-accepted data recording standards that can be used by a very diverse materials community, including developers materials test standards (such as ASTM and ISO), testing companies, materials producers, and R&D labs; one needs tools and procedures to help organizations wishing to deposit proprietary materials in data repositories to mask proprietary information, yet to maintain the usability of data; one needs multi-variable materials data visualization tools, in which the number of variables can be quite high

Futures: Identify similar patients from a large Electronic Health Record (EHR) database, i.e. an individualized cohort, and evaluate their respective management outcomes to formulate most appropriate solution suited for a given patient with diabetes. Use efficient parallel retrieval algorithms, suitable for cloud or HPC, using open source Hbase with both indexed and custom search to identify patients of possible interest. Use Semantic Linking for Property Values method to convert an existing data warehouse at Mayo Clinic, called the Enterprise Data Trust (EDT), into RDF triples that enables one to find similar patients through linking of both vocabulary-based and continuous values. The time dependent properties need to be processed before query to allow matching based on derivatives and other derived properties.

Current Approach: Currently 25 science and engineering domains have projects that rely on the iRODS (Integrated Rule Oriented Data System) policy-based data management system including major NSF projects such as Ocean Observatories Initiative (sensor archiving); Temporal Dynamics of Learning Center (Cognitive science data grid); the iPlant Collaborative (plant genomics); Drexel engineering digital library; Odum Institute for social science research (data grid federation with Dataverse). iRODS currently manages petabytes of data, hundreds of millions of files, hundreds of millions of metadata attributes, tens of thousands of users, and a thousand storage resources. It interoperates with workflow systems (NCSA Cyberintegrator, Kepler, Taverna), cloud and more traditional storage models and different transport protocols. The full description has a diagram of the iRODS architecture.

Futures: Create a cloud infrastructure for social media of scientific information where many scientists from various parts of the world can participate and deposit results of their experiment. Some of the issues that one has to resolve prior to establishing a scientific social media are: a) How to minimize challenges related to establishing re-usable, inter-disciplinary, scalable, on-demand, use-case and user-friendly vocabulary? b) How to adopt a existing or create new on-demand ‘data-graph’ to place an information in an intuitive way such that it would easily integrate with existing ‘data-graphs’ in a federated environment without knowing too much about the data management? c) How to find relevant scientific data without spending too much time on the internet? Start with resources like the Open Government movement, Material genome Initiative and Protein Databank. This effort includes many local and networked resources. Developing an infrastructure to automatically integrate information from all these resources using data-graphs is a challenge that we are trying to solve. Good database tools and servers for data-graph manipulation are needed.

Futures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.

Futures: The design of the next generation radar, EISCAT_3D, will consist of a core site with a transmitting and receiving radar arrays and four sites with receiving antenna arrays at some 100 km from the core. The fully operational 5-site system will generate several thousand times data of current EISCAT system with 40 PB/year in 2022 and is expected to operate for 30 years. EISCAT 3D data e-Infrastructure plans to use the high performance computers for central site data processing and high throughput computers for mirror sites data processing. Downloading the full data is not time critical, but operations require real-time information about certain pre-defined events to be sent from the sites to the operation center and a real-time link from the operation center to the sites to set the mode of radar operation on with immediate action. See figure in full description.Futures: ENVRI’s common environment will empower the users of the collaborating environmental research infrastructures and enable multidisciplinary scientists to access, study and correlate data from multiple domains for "system level" research. It provides Bigdata requirements coming from interdisciplinary research. As shown in Figure 1 in full description, analysis of the computational characteristics of the 6 ESFRI Environmental Research infrastructure, 5 common subsystems has been identified. The definition of them are given in the ENVRI Reference Model, www.envri.eu/rm: Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system. Data curation: facilitates quality control and preservation of scientific data. It is typically operated at a data centre. Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem. Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments. Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities. As shown in Figure 2 of full description, the 5 sub-system map well to the architectures of the ESFRI Environmental Research Infrastructures.

Page 64: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: Create a cloud infrastructure for social media of scientific information where many scientists from various parts of the world can participate and deposit results of their experiment. Some of the issues that one has to resolve prior to establishing a scientific social media are: a) How to minimize challenges related to establishing re-usable, inter-disciplinary, scalable, on-demand, use-case and user-friendly vocabulary? b) How to adopt a existing or create new on-demand ‘data-graph’ to place an information in an intuitive way such that it would easily integrate with existing ‘data-graphs’ in a federated environment without knowing too much about the data management? c) How to find relevant scientific data without spending too much time on the internet? Start with resources like the Open Government movement, Material genome Initiative and Protein Databank. This effort includes many local and networked resources. Developing an infrastructure to automatically integrate information from all these resources using data-graphs is a challenge that we are trying to solve. Good database tools and servers for data-graph manipulation are needed.

Futures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.

Futures: ENVRI’s common environment will empower the users of the collaborating environmental research infrastructures and enable multidisciplinary scientists to access, study and correlate data from multiple domains for "system level" research. It provides Bigdata requirements coming from interdisciplinary research. As shown in Figure 1 in full description, analysis of the computational characteristics of the 6 ESFRI Environmental Research infrastructure, 5 common subsystems has been identified. The definition of them are given in the ENVRI Reference Model, www.envri.eu/rm: Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system. Data curation: facilitates quality control and preservation of scientific data. It is typically operated at a data centre. Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem. Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments. Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities. As shown in Figure 2 of full description, the 5 sub-system map well to the architectures of the ESFRI Environmental Research Infrastructures.

Page 65: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: Create a cloud infrastructure for social media of scientific information where many scientists from various parts of the world can participate and deposit results of their experiment. Some of the issues that one has to resolve prior to establishing a scientific social media are: a) How to minimize challenges related to establishing re-usable, inter-disciplinary, scalable, on-demand, use-case and user-friendly vocabulary? b) How to adopt a existing or create new on-demand ‘data-graph’ to place an information in an intuitive way such that it would easily integrate with existing ‘data-graphs’ in a federated environment without knowing too much about the data management? c) How to find relevant scientific data without spending too much time on the internet? Start with resources like the Open Government movement, Material genome Initiative and Protein Databank. This effort includes many local and networked resources. Developing an infrastructure to automatically integrate information from all these resources using data-graphs is a challenge that we are trying to solve. Good database tools and servers for data-graph manipulation are needed.

Futures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.

Futures: ENVRI’s common environment will empower the users of the collaborating environmental research infrastructures and enable multidisciplinary scientists to access, study and correlate data from multiple domains for "system level" research. It provides Bigdata requirements coming from interdisciplinary research. As shown in Figure 1 in full description, analysis of the computational characteristics of the 6 ESFRI Environmental Research infrastructure, 5 common subsystems has been identified. The definition of them are given in the ENVRI Reference Model, www.envri.eu/rm: Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system. Data curation: facilitates quality control and preservation of scientific data. It is typically operated at a data centre. Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem. Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments. Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities. As shown in Figure 2 of full description, the 5 sub-system map well to the architectures of the ESFRI Environmental Research Infrastructures.

Page 66: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: Create a cloud infrastructure for social media of scientific information where many scientists from various parts of the world can participate and deposit results of their experiment. Some of the issues that one has to resolve prior to establishing a scientific social media are: a) How to minimize challenges related to establishing re-usable, inter-disciplinary, scalable, on-demand, use-case and user-friendly vocabulary? b) How to adopt a existing or create new on-demand ‘data-graph’ to place an information in an intuitive way such that it would easily integrate with existing ‘data-graphs’ in a federated environment without knowing too much about the data management? c) How to find relevant scientific data without spending too much time on the internet? Start with resources like the Open Government movement, Material genome Initiative and Protein Databank. This effort includes many local and networked resources. Developing an infrastructure to automatically integrate information from all these resources using data-graphs is a challenge that we are trying to solve. Good database tools and servers for data-graph manipulation are needed.

Futures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.

Futures: ENVRI’s common environment will empower the users of the collaborating environmental research infrastructures and enable multidisciplinary scientists to access, study and correlate data from multiple domains for "system level" research. It provides Bigdata requirements coming from interdisciplinary research. As shown in Figure 1 in full description, analysis of the computational characteristics of the 6 ESFRI Environmental Research infrastructure, 5 common subsystems has been identified. The definition of them are given in the ENVRI Reference Model, www.envri.eu/rm: Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system. Data curation: facilitates quality control and preservation of scientific data. It is typically operated at a data centre. Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem. Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments. Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities. As shown in Figure 2 of full description, the 5 sub-system map well to the architectures of the ESFRI Environmental Research Infrastructures.

Page 67: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.

Futures: ENVRI’s common environment will empower the users of the collaborating environmental research infrastructures and enable multidisciplinary scientists to access, study and correlate data from multiple domains for "system level" research. It provides Bigdata requirements coming from interdisciplinary research. As shown in Figure 1 in full description, analysis of the computational characteristics of the 6 ESFRI Environmental Research infrastructure, 5 common subsystems has been identified. The definition of them are given in the ENVRI Reference Model, www.envri.eu/rm: Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system. Data curation: facilitates quality control and preservation of scientific data. It is typically operated at a data centre. Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem. Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments. Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities. As shown in Figure 2 of full description, the 5 sub-system map well to the architectures of the ESFRI Environmental Research Infrastructures.

Page 68: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.

Futures: ENVRI’s common environment will empower the users of the collaborating environmental research infrastructures and enable multidisciplinary scientists to access, study and correlate data from multiple domains for "system level" research. It provides Bigdata requirements coming from interdisciplinary research. As shown in Figure 1 in full description, analysis of the computational characteristics of the 6 ESFRI Environmental Research infrastructure, 5 common subsystems has been identified. The definition of them are given in the ENVRI Reference Model, www.envri.eu/rm: Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system. Data curation: facilitates quality control and preservation of scientific data. It is typically operated at a data centre. Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem. Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments. Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities. As shown in Figure 2 of full description, the 5 sub-system map well to the architectures of the ESFRI Environmental Research Infrastructures.

Page 69: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Futures: In the past the particle physics community has been able to rely on industry to deliver exponential increases in performance per unit cost over time, as described by Moore's Law. However the available performance will be much more difficult to exploit in the future since technology limitations, in particular regarding power consumption, have led to profound changes in the architecture of modern CPU chips. In the past software could run unchanged on successive processor generations and achieve performance gains that follow Moore's Law thanks to the regular increase in clock rate that continued until 2006. The era of scaling HEP sequential applications is now over. Changes in CPU architectures imply significantly more software parallelism as well as exploitation of specialized floating point capabilities. The structure and performance of HEP data processing software needs to be changed such that it can continue to be adapted and further developed in order to run efficiently on new hardware. This represents a major paradigm-shift in HEP software design and implies large scale re-engineering of data structures and algorithms. Parallelism needs to be added at all levels at the same time, the event level, the algorithm level, and the sub-algorithm level. Components at all levels in the software stack need to interoperate and therefore the goal is to standardize as much as possible on basic design patterns and on the choice of a concurrency model. This will also help to ensure efficient and balanced use of resources.

Page 70: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Use Case Category (NIST Number) URLGovernment Operations (4)Commercial (8)Defense (3)Healthcare and Life Sciences (10)Deep Learning and Social Media (6)The Ecosystem for Research (4)Astronomy and Physics (5)Earth, Environmental and Polar Science (Energy (1)

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.2_Government_Operationhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.3_Commercialhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.4_Defensehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.5_Health_Care_and_Life_Scienceshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.6_Deep_Learning_and_Social_Mediahttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.7_The_Ecosystem_for_Researchhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.8_Astronomy_and_Physicshttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.9_Earth.2C_Environmental.2C_and_Polar_Sciencehttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#2.10_Energy

Page 71: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

URLData Science for Agency Initiatives 20152015 Wharton DC Innovation Summit, April 28-29: FBDWGM Workshop 4/29 1-2:30 p.m.Data Science for the DTIC Data Ecosystem RFIData Science for Affordable Care Act Data

Data Science for the National Big Data R and D InitiativeData Science for Cyber Physical Systems-Internet of ThingsData Science for EPA Big Data Analytics

FBDWGM Examplehttp://semanticommunity.info/Data_Science/Data_Science_for_Agency_Initiatives_2015http://www.meetup.com/Federal-Big-Data-Working-Group/events/222105461/http://semanticommunity.info/Data_Science/Data_Science_for_DTIC_Data_Ecosystemhttp://semanticommunity.info/Data_Science/Data_Science_for_ACA

Big Data from Everywhere for Families and Community Service and Data Science for MyFamilySearch.orghttp://semanticommunity.info/My_Stories_and_Lessons#Big_Data_from_Everywhere_for_Families_and_Community_Servicehttp://semanticommunity.info/Data_Science/Data_Science_for_the_National_Big_Data_R_and_D_Initiativehttp://semanticommunity.info/Data_Science/Data_Science_for_Cyber_Physical_Systems-Internet_of_Thingshttp://semanticommunity.info/Data_Science/Data_Science_for_EPA_Big_Data_Analytics

Data Science for USGS Minerals Big Data http://semanticommunity.info/Data_Science/Data_Science_for_USGS_Minerals_Big_Data

Page 72: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Meetup Date URLAugust 3rd MeeupThe Wharton DC Alumni Innovation Summit, AprilJanuary 27th MeetupJuly 20th MeetupFebruary 13th Meetup

June 29th MeetupApril 20th Meetup

http://www.meetup.com/Federal-Big-Data-Working-Group/events/222610841/http://www.whartondcinnovation.com/http://www.meetup.com/Virginia-Big-Data-Meetup/events/219296442/http://www.meetup.com/Federal-Big-Data-Working-Group/events/222369666/http://www.meetup.com/Federal-Big-Data-Working-Group/events/220271343/

February 2nd Meetup http://www.meetup.com/Federal-Big-Data-Working-Group/events/218868025/http://www.meetup.com/Federal-Big-Data-Working-Group/events/222263009/http://www.meetup.com/Federal-Big-Data-Working-Group/events/220799665/

May 29th EarthCube and June 15th Meetup http://earthcube.org/forum/earthcube-data-science-publications/data-science-publication-usgs-minerals-big-data

Page 73: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://www.meetup.com/Federal-Big-Data-Working-Group/events/222610841/

http://www.meetup.com/Virginia-Big-Data-Meetup/events/219296442/http://www.meetup.com/Federal-Big-Data-Working-Group/events/222369666/http://www.meetup.com/Federal-Big-Data-Working-Group/events/220271343/http://www.meetup.com/Federal-Big-Data-Working-Group/events/218868025/http://www.meetup.com/Federal-Big-Data-Working-Group/events/222263009/http://www.meetup.com/Federal-Big-Data-Working-Group/events/220799665/http://earthcube.org/forum/earthcube-data-science-publications/data-science-publication-usgs-minerals-big-data

Page 74: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Concept4Vs (Volume, Variety, Velocity, and Variability) and EngineeringVolumeVolumeBigger DataNot Only VolumeNot Only VolumeBig Data EngineeringBig Data EngineeringBig Data EngineeringLess SamplingLess SamplingNew Data TypesNew Data TypesAnalyticsData ScienceData ScienceValueValueValueValueCultural ChangeCultural ChangeCultural Change

Page 75: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Author Definition URL URL“Big data i http://semhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B8.5D“Although Bhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B9.5D“big data nhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B10.5D

Annette Greiner[5] “Big data i http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5DQuentin Hardy[5] “What’s ‘bihttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5DChris Neumann[5] “…our origihttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5D

“Big data thttp://semhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B16.5DHal Varian[5] “Big data mhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5D

“Big Data rhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B13.5DJohn Foreman[5] “Big data i http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5DPeter Skomoroch[5] “Big data ohttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5D

“The broadhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B13.5DMark van Rijmenam[5] “Big data i http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5DRyan Swanstrom[5] “Big data uhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5DJoel Gurin[5] “Big data dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5DJosh Ferguson[5] “Big data i http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5DHarlan Harris[5] “To me, ‘bihttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5DJessica Kirkpatrick[5] “Big data rhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5DHilary Mason[5] “Big data i http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5DGregory Piatetsky-Shapiro[5] “The best dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5DDrew Conway[5] “Big data, http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5DDaniel Gillick[5] “‘Big data’ http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5DCathy O’Neil[5] “‘Big data’ http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5D

Gartner[7],[8]Techtarget [9]Oxford English Dictionary (OED)[10]

IDC[11] [16]

McKinsey[12]

Tom Davenport[13]

Page 76: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B8.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B9.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B10.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5D

http://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B16.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B13.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B13.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5Dhttp://semanticommunity.info/Data_Science/Data_Science_for_NIST_Big_Data_Framework#.5B5.5D

Page 77: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Number Use Case URL Name Volume Velocity Variety Software Analytics1 Census 200380 TB Static for Scanned d Robust archNone for 75 years2 NARA: SearcHundreds oData loadedUnstructureCustom sofCrawl/index, search, ranking, predictive search; data categorization (sensitive, confidential, etc.); personally identifiable information (PII) detection and flagging3 Statistica ApproximatVariable, f Strings andHadoop, SpRecommendation systems, continued monitoring4 Non-Tradit— — Survey dataHadoop, SpNew analytics to create reliable information from non-traditional disparate sources5 Cloud Eco-— Real time — Hadoop RDFraud detection6 Mendeley 15 TB pres Currently PDF documenHadoop, ScStandard libraries for machine learning and analytics, LDA, custom-built reporting tools for aggregating readership and social activities per document7 Netflix MovSummer 2012Media (vidData vary fHadoop andPersonalized recommender systems using logistic/linear regression, elastic nets, matrix factorization, clustering, LDA, association rules, gradient-boosted decision trees, and others; streaming video delivery8 Web Searc45 billion Real-time uMultiple mMapReduce Crawling; searching, including topic-based searches; ranking; recommending9 Business C Terabytes Can be realMust work fHadoop, MaRobust backup

10 Cargo Ship— Needs to bEvent-base— Distributed event analysis identifying problems11 Materials 500,000 maOngoing inMany datasNational prNo broadly applicable analytics12 Simulation100 TB (curRegular daVaried dataMongoDB, GMapReduce and search that join simulation and experimental data13 Large-ScaleImagery – hVectors traImagery, veGeospatialClosest point of approach, deviation from route, point density over time, PCA and ICA14 Object IdenFMV – 30–60Real time A few stanCustom softVisualization as overlays on a GIS, basic object detection analytics and integration with sophisticated situation awareness tools with data fusion15 Intelligenc Tens of terMuch real-tText files, Hadoop, AcNear real-time alerts based on patterns and baseline changes, link analysis, geospatial analysis, text analytics (sentiment, entity extraction, etc.)16 Electronic 12 million 0.5 – 1.5 mBroad varieTeradata, Information retrieval methods (tf-idf), NLP, maximum likelihood estimators, Bayesian networks17 Pathology 1 GB raw imOnce generImages MPI for imaImage analysis, spatial queries and analytics, feature clustering and classification18 ComputatioMedical diaVolume of Multi-modaScalable k Machine learning (support vector machine [SVM] and random forest [RF]) for classification and recommendation services19 Genomic M>100 TB in ~300 GB ofFile formatOpen-sourcProcessing of raw data to produce variant calls, clinical interpretation of variants20 Comparativ50 TB New sequenBiological Standard biDescriptive statistics, statistical significance in hypothesis testing, data clustering and classification21 Individual 5 million p Not real ti 100 controHDFS supplIntegration of data into semantic graphs, using graph traverse to replace SQL join; development of semantic graph-mining algorithms to identify graph patterns, index graph, and search graph; indexed Hbase; custom code to develop new patient properties from stored data22 Statistical Hundreds ofConstant upCritical fe Mainly JavaRelational probabilistic models (Statistical Relational AI) learned from multiple data types23 World Popu100 TB Low numberCan be richCharm++, Simulations on a synthetic population24 Social Con Tens of terDuring sociBig issues Specialize Models of behavior of humans and hard infrastructures, models of their interactions, visualization of results25 BiodiversitN/A Real-time pRich varie RDBMS Requires advanced and rich visualization26 Large-Scal Current datMuch fasterNeural net In-house GSmall degree of batch statistical pre-processing, all other data analysis performed by the learning algorithm itself27 Organizing500+ billio Over 500 mImages and Hadoop MapRobust non-linear least squares optimization problem, SVM28 Truthy Twi30 TB/yearNear real-tSchema provHadoop IndAnomaly detection, stream clustering, signal classification, online learning; information diffusion, clustering, dynamic network visualization29 Crowd SourGigabytes (Data conti So far mosXML technolPattern recognition (e.g., speech recognition, automatic audio-visual analysis, cultural patterns), identification of structures (lexical units, linguistic rules, etc.)30 CINET for Can be hunDynamic neMany typesGraph libr Network visualization31 NIST Inform>900 millioLegacy evalWide varietPERL, Pyth Information extraction, filtering, search, and summarization; image and voice biometrics; speech recognition and understanding; machine translation; video person/object detection and tracking; event detection; imagery/document matching; novelty detection; structural semantic temporal analytics32 DataNet (i Petabytes, Real time Rich iRODS Supports general analysis workflows33 The Discin Small as mReal time Can tackle Symfony-PH--34 Semantic GA few tera Evolving inRich Database Data graph processing35 Light Sour 50–400 GB Continuous Images Volume reconstruction, feature identification, etc.36 Catalina Re~100 TB totNightly updImages, speCustom dataDetection of rare events and relation to existing diverse data37 DOE ExtremSeveral petAnalysis d Image and MPI, FFTW,New analytics needed to analyze simulation results38 Large Surv Petabytes 400 images Images Linux clust Machine learning to find optical transients, Cholesky decomposition for thousands of simulations with matrices of order 1 million on a side and parallel image storage39 Particle Ph15 PB of d Data updateDifferent f Grid-basedSophisticated specialized data analysis code followed by basic exploratory statistics (histogram) with complex detector efficiency corrections40 Belle II Hi Eventually Data updateDifferent f DIRAC GridSophisticated specialized data analysis code followed by basic exploratory statistics (histogram) with complex detector efficiency corrections41 EISCAT 3D Terabytes/yData updateBig data u Custom analPattern recognition, demanding correlation routines, high-level parameter extraction42 ENVRI EnviLow volumeMainly reaSix separatR and PythoData assimilation, (statistical) analysis, data mining, data extraction, scientific modeling and simulation, scientific workflow43 CReSIS RemAround 1 PBData taken Raw data, iMatlab for Custom signal processing to produce radar images that are analyzed by image processing to find layers44 UAVSAR Dat110 TB rawData come Image and aROI_PAC, GProcess raw data to get images that are run through image processing tools and accessed from GIS45 NASA LARCMERRA collePeriodic u Many appliSGE Univa GFederation software46 MERRA Anal480 TB fr Increases Applicatio Cloudera, Climate Analytics-as-a-Service (CAaaS)47 Atmospheri200 TB (curData analyRe-analysisMapReduce oData mining customized for specific event types

M0147 http://bigdatawg.nist.gov/_uploadfiles/M0147_v1_9011190023.docxM0148 http://bigdatawg.nist.gov/_uploadfiles/M0148_v1_1457436047.docxM0219 http://bigdatawg.nist.gov/_uploadfiles/M0219_v1_1106458060.docxM0222 http://bigdatawg.nist.gov/_uploadfiles/M0222_v1_8823653701.docxM0175 http://bigdatawg.nist.gov/_uploadfiles/M0175_v1_1361846645.docM0161 http://bigdatawg.nist.gov/_uploadfiles/M0161_v1_8712614971.docxM0164 http://bigdatawg.nist.gov/_uploadfiles/M0164_v1_8073380462.docxM0165 http://bigdatawg.nist.gov/_uploadfiles/M0165_v1_9206577703.docxM0137 http://bigdatawg.nist.gov/_uploadfiles/M0137_v1_9902753113.docM0103 http://bigdatawg.nist.gov/_uploadfiles/M0103_v1_9862181899.docxM0162 http://bigdatawg.nist.gov/_uploadfiles/M0162_v1_8977322730.docxM0176 http://bigdatawg.nist.gov/_uploadfiles/M0176_v1_7714944584.docxM0213 http://bigdatawg.nist.gov/_uploadfiles/M0213_v1_5447164009.docxM0214 http://bigdatawg.nist.gov/_uploadfiles/M0214_v1_5406533104.docxM0215 http://bigdatawg.nist.gov/_uploadfiles/M0215_v1_1579991796.docxM0177 http://bigdatawg.nist.gov/_uploadfiles/M0177_v1_1133239355.docxM0089 http://bigdatawg.nist.gov/_uploadfiles/M0089_v1_7814086875.docxM0191 http://bigdatawg.nist.gov/_uploadfiles/M0191_v2_5659292903.docxM0078 http://bigdatawg.nist.gov/_uploadfiles/M0078_v1_8198680934.docxM0188 http://bigdatawg.nist.gov/_uploadfiles/M0188_v1_8691012255.docxM0140 http://bigdatawg.nist.gov/_uploadfiles/M0140_v1_5675248635.docxM0174 http://bigdatawg.nist.gov/_uploadfiles/M0174_v1_8098597993.docxM0172 http://bigdatawg.nist.gov/_uploadfiles/M0172_v1_8972697421.docxM0173 http://bigdatawg.nist.gov/_uploadfiles/M0173_v1_3577651730.docxM0141 http://bigdatawg.nist.gov/_uploadfiles/M0141_v1_5563475154.docxM0136 http://bigdatawg.nist.gov/_uploadfiles/M0136_v1_5489292512.docxM0171 http://bigdatawg.nist.gov/_uploadfiles/M0171_v1_7185377580.docxM0160 http://bigdatawg.nist.gov/_uploadfiles/M0160_v1_6667987957.docxM0211 http://bigdatawg.nist.gov/_uploadfiles/M0211_v2_3994987602.docxM0158 http://bigdatawg.nist.gov/_uploadfiles/M0158_v1_1209297717.docxM0190 http://bigdatawg.nist.gov/_uploadfiles/M0190_v1_2052764107.docxM0130 http://bigdatawg.nist.gov/_uploadfiles/M0130_v1_3759224345.docxM0163 http://bigdatawg.nist.gov/_uploadfiles/M0163_v1_6644793897.docxM0131 http://bigdatawg.nist.gov/_uploadfiles/M0131_v1_9568192535.docxM0189 http://bigdatawg.nist.gov/_uploadfiles/M0189_v1_1536495869.docxOctopus for Tomographic Reconstruction, Avizo (http://vsg3d.com) and FIJI (a distribution of ImageJ)M0170 http://bigdatawg.nist.gov/_uploadfiles/M0170_v1_5720273656.docxM0185 http://bigdatawg.nist.gov/_uploadfiles/M0185_v1_4843821869.docxM0209 http://bigdatawg.nist.gov/_uploadfiles/M0209_v1_4702199454.docxM0166 http://bigdatawg.nist.gov/_uploadfiles/M0166_v3_2675550648.DOCXM0210 http://bigdatawg.nist.gov/_uploadfiles/M0210_v1_7474668890.docxM0155 http://bigdatawg.nist.gov/_uploadfiles/M0155_v1_3537561150.docxM0157 http://bigdatawg.nist.gov/_uploadfiles/M0157_v1_6396188402.docxM0167 http://bigdatawg.nist.gov/_uploadfiles/M0167_v1_7320744610.docxM0127 http://bigdatawg.nist.gov/_uploadfiles/M0127_v1_8374144249.docxM0182 http://bigdatawg.nist.gov/_uploadfiles/M0182_v1_3824910269.docxM0129 http://bigdatawg.nist.gov/_uploadfiles/M0129_v1_8721988256.pdfM0090 http://bigdatawg.nist.gov/_uploadfiles/M0090_v1_7386661507.docx

Page 78: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

48 Climate St Up to 30 P 42 GB/secoVariety ac National CeNeed analytics next to data storage49 DOE-BER Su— — From omics PFLOWTran,Data mining, data quality assessment, cross-correlation across datasets, reduced model development, statistics, quality assessment, data fusion50 DOE-BER A— Streaming Flux data mEddyPro, cData mining, data quality assessment, cross-correlation across datasets, data assimilation, data interpolation, statistics, quality assessment, data fusion51 Consumptio4 TB/year fStreaming dTuple-baseR/Matlab, Forecasting models, machine learning models, time series analysis, clustering, motif detection, complex event processing, visual network analysis

M0186 http://bigdatawg.nist.gov/_uploadfiles/M0186_v1_2893359960.docxM0183 http://bigdatawg.nist.gov/_uploadfiles/M0183_v2_2632549904.docxM0184 http://bigdatawg.nist.gov/_uploadfiles/M0184_v1_8925840651.docxM0223 http://bigdatawg.nist.gov/_uploadfiles/M0223_v1_9531843932.docx

Page 79: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Crawl/index, search, ranking, predictive search; data categorization (sensitive, confidential, etc.); personally identifiable information (PII) detection and flaggingRecommendation systems, continued monitoringNew analytics to create reliable information from non-traditional disparate sources

Standard libraries for machine learning and analytics, LDA, custom-built reporting tools for aggregating readership and social activities per documentPersonalized recommender systems using logistic/linear regression, elastic nets, matrix factorization, clustering, LDA, association rules, gradient-boosted decision trees, and others; streaming video deliveryCrawling; searching, including topic-based searches; ranking; recommending

Distributed event analysis identifying problemsNo broadly applicable analyticsMapReduce and search that join simulation and experimental dataClosest point of approach, deviation from route, point density over time, PCA and ICAVisualization as overlays on a GIS, basic object detection analytics and integration with sophisticated situation awareness tools with data fusionNear real-time alerts based on patterns and baseline changes, link analysis, geospatial analysis, text analytics (sentiment, entity extraction, etc.)Information retrieval methods (tf-idf), NLP, maximum likelihood estimators, Bayesian networksImage analysis, spatial queries and analytics, feature clustering and classificationMachine learning (support vector machine [SVM] and random forest [RF]) for classification and recommendation servicesProcessing of raw data to produce variant calls, clinical interpretation of variantsDescriptive statistics, statistical significance in hypothesis testing, data clustering and classificationIntegration of data into semantic graphs, using graph traverse to replace SQL join; development of semantic graph-mining algorithms to identify graph patterns, index graph, and search graph; indexed Hbase; custom code to develop new patient properties from stored dataRelational probabilistic models (Statistical Relational AI) learned from multiple data typesSimulations on a synthetic populationModels of behavior of humans and hard infrastructures, models of their interactions, visualization of resultsRequires advanced and rich visualizationSmall degree of batch statistical pre-processing, all other data analysis performed by the learning algorithm itselfRobust non-linear least squares optimization problem, SVMAnomaly detection, stream clustering, signal classification, online learning; information diffusion, clustering, dynamic network visualizationPattern recognition (e.g., speech recognition, automatic audio-visual analysis, cultural patterns), identification of structures (lexical units, linguistic rules, etc.)Network visualizationInformation extraction, filtering, search, and summarization; image and voice biometrics; speech recognition and understanding; machine translation; video person/object detection and tracking; event detection; imagery/document matching; novelty detection; structural semantic temporal analyticsSupports general analysis workflows

Data graph processingVolume reconstruction, feature identification, etc.Detection of rare events and relation to existing diverse dataNew analytics needed to analyze simulation resultsMachine learning to find optical transients, Cholesky decomposition for thousands of simulations with matrices of order 1 million on a side and parallel image storageSophisticated specialized data analysis code followed by basic exploratory statistics (histogram) with complex detector efficiency correctionsSophisticated specialized data analysis code followed by basic exploratory statistics (histogram) with complex detector efficiency correctionsPattern recognition, demanding correlation routines, high-level parameter extractionData assimilation, (statistical) analysis, data mining, data extraction, scientific modeling and simulation, scientific workflowCustom signal processing to produce radar images that are analyzed by image processing to find layersProcess raw data to get images that are run through image processing tools and accessed from GIS

Climate Analytics-as-a-Service (CAaaS)Data mining customized for specific event types

Page 80: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Need analytics next to data storageData mining, data quality assessment, cross-correlation across datasets, reduced model development, statistics, quality assessment, data fusionData mining, data quality assessment, cross-correlation across datasets, data assimilation, data interpolation, statistics, quality assessment, data fusionForecasting models, machine learning models, time series analysis, clustering, motif detection, complex event processing, visual network analysis

Page 81: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Crawl/index, search, ranking, predictive search; data categorization (sensitive, confidential, etc.); personally identifiable information (PII) detection and flagging

Standard libraries for machine learning and analytics, LDA, custom-built reporting tools for aggregating readership and social activities per documentPersonalized recommender systems using logistic/linear regression, elastic nets, matrix factorization, clustering, LDA, association rules, gradient-boosted decision trees, and others; streaming video delivery

Visualization as overlays on a GIS, basic object detection analytics and integration with sophisticated situation awareness tools with data fusionNear real-time alerts based on patterns and baseline changes, link analysis, geospatial analysis, text analytics (sentiment, entity extraction, etc.)

Integration of data into semantic graphs, using graph traverse to replace SQL join; development of semantic graph-mining algorithms to identify graph patterns, index graph, and search graph; indexed Hbase; custom code to develop new patient properties from stored data

Anomaly detection, stream clustering, signal classification, online learning; information diffusion, clustering, dynamic network visualizationPattern recognition (e.g., speech recognition, automatic audio-visual analysis, cultural patterns), identification of structures (lexical units, linguistic rules, etc.)

Information extraction, filtering, search, and summarization; image and voice biometrics; speech recognition and understanding; machine translation; video person/object detection and tracking; event detection; imagery/document matching; novelty detection; structural semantic temporal analytics

Machine learning to find optical transients, Cholesky decomposition for thousands of simulations with matrices of order 1 million on a side and parallel image storageSophisticated specialized data analysis code followed by basic exploratory statistics (histogram) with complex detector efficiency correctionsSophisticated specialized data analysis code followed by basic exploratory statistics (histogram) with complex detector efficiency corrections

Page 82: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Data mining, data quality assessment, cross-correlation across datasets, reduced model development, statistics, quality assessment, data fusionData mining, data quality assessment, cross-correlation across datasets, data assimilation, data interpolation, statistics, quality assessment, data fusionForecasting models, machine learning models, time series analysis, clustering, motif detection, complex event processing, visual network analysis

Page 83: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Integration of data into semantic graphs, using graph traverse to replace SQL join; development of semantic graph-mining algorithms to identify graph patterns, index graph, and search graph; indexed Hbase; custom code to develop new patient properties from stored data

Information extraction, filtering, search, and summarization; image and voice biometrics; speech recognition and understanding; machine translation; video person/object detection and tracking; event detection; imagery/document matching; novelty detection; structural semantic temporal analytics

Page 84: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Number Use Case URL Name Capabilities1 Census 2011. Large docume-- 1. Large centra

2 NARA: Searc1. Distributed 1. Crawl and index from 1. Large data 2. Large data s 2. Various analytics proc2. Various sto3. Bursty data 3. Data pre-processing4. Wide variety4. Long-term preservation management of large varied datasets5. Distributed d5. Huge numbers of data with high relevancy and recall

3 Statistica 1. Data size of 1. Analytics for recomm1. Hadoop, Spa

4 Non-Traditi-- 1. Analytics to create r 1. Hadoop, Spa

5 Cloud Eco- 1. Real-time ing1. Real-time analytics --

6 Mendeley 1. File-based 1. Standard machine lear1. Amazon Elas2. Variety of fi 2. Efficient scalable a 2. S3 (storage)

3. Third-party annotati 3. Hadoop (platform)4. Scribe, Hive, Mahout, Python (language)5. Moderate storage (15 TB with 1 TB/ month)6. Batch and real-time processing

7 Netflix Mov1. User profile 1. Streaming video conte1. Hadoop (pla2. Analytic processing fo2. Pig (language)3. Various analytic proc 3. Cassandra and Hive4. Robust learning algor4. Huge numbers of subscribers, ratings, and searches per day (DB)5. Continued analytic p 5. Huge amounts of storage (2 PB)

6. I/O intensive processing8 Web Searc 1. Distributed 1. Dynamic fetching con1. Petabytes o

2. Streaming da2. Linking of user profiles and social 3. Multimedia content

9 Business Co-- 1. Robust backup algori 1. Hadoop2. Replication of recent 2. Commercial cloud services

10 Cargo Ship 1. Centralized a1. Tracking items based 1. Internet con2. Real-time updates on tracking items

11 Materials D1. Distributed 1. Hundreds of independe--2. Many varieties of datasets3. Text, graphics, and images

12 Simulation 1. Data streams1. High-throughput comp1. Massive (152. Distributed 2. Mashup of simulation2. GPFS (storage)

3. Search and crowd-driv3. MonogDB systems (platform)4. MapReduce and search4. 10 GB networking

5. Various analytic tools such as PyMatGen, FireWorks, VASP, ABINIT, NWChem, BerkeleyGW, varied community codes6. Large storage (storage)7. Scalable key-value and object store (platform)8. Data streams from peta/exascale centralized simulation systems

13 Large-Scale1. Unique appro1. Analytics: closest po 1. Geospatiall

Data Sources Data TransformationM0147 http://bigdatawg.nist.gov/_uploadfiles/M0147_v1_9011190023.docx

M0148 http://bigdatawg.nist.gov/_uploadfiles/M0148_v1_1457436047.docx

M0219 http://bigdatawg.nist.gov/_uploadfiles/M0219_v1_1106458060.docx

M0222 http://bigdatawg.nist.gov/_uploadfiles/M0222_v1_8823653701.docx

M0175 http://bigdatawg.nist.gov/_uploadfiles/M0175_v1_1361846645.doc

M0161 http://bigdatawg.nist.gov/_uploadfiles/M0161_v1_8712614971.docx

M0164 http://bigdatawg.nist.gov/_uploadfiles/M0164_v1_8073380462.docx

M0165 http://bigdatawg.nist.gov/_uploadfiles/M0165_v1_9206577703.docx

M0137 http://bigdatawg.nist.gov/_uploadfiles/M0137_v1_9902753113.doc

M0103 http://bigdatawg.nist.gov/_uploadfiles/M0103_v1_9862181899.docx

M0162 http://bigdatawg.nist.gov/_uploadfiles/M0162_v1_8977322730.docx

M0176 http://bigdatawg.nist.gov/_uploadfiles/M0176_v1_7714944584.docx

M0213 http://bigdatawg.nist.gov/_uploadfiles/M0213_v1_5447164009.docx

Page 85: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

2. Unique approaches to indexing and distributed analysis required for geospatial data14 Object Iden1. Real-time da1. Rich analytics with ob1. Wide range

2. Several ne3. GPU usage important

15 Intelligenc 1. Much real-ti 1. Analytics: Near Real 1. Tolerance o2. Data in disparate silos, must be acce 2. Up to hundreds of petabytes of data supported by modest to large clusters and clouds3. Diverse data: text files, raw media, 3. Hadoop, Accumulo (Big Table), Solr, NLP (several variants), Puppet (for deployment and security), Storm, custom applications, visualization tools

16 Electronic 1. Heterogeneou1. A comprehensive and 1. Hadoop, Hiv2. Volume: > 12 2. Analytic techniques: 2. Cray supercomputer3. Velocity: 500,000–1.5 million new tr 3. Teradata, PostgreSQL, Mong4. Variety: formats include numeric, str4. Various, with significant I/O i5. Data evolve over time in a highly variable fashion

17 Pathology 1. High-resoluti1. High-performance imag1. Legacy syst2. Various imag2. Spatial queries and an2. Huge legacy and new storage such as storage area network (SAN) or HDFS (storage)3. Various image3. Analytic processing o 3. High-throughput network link (networking)4. Image analysis, spatial queries and an4. MPI image analysis, MapReduce, Hive with spatial extension (software packages)

18 Computatio1. Distributed 1. High-throughput comp1. ImageJ, OM2. 50 TB of dat 2. Segmentation of regio2. NERSC’s Hopper infrastructure

3. Advanced biosciences3. database and image collections4. Massive data analysi 4. 10 GB and future 100 GB and advanced networking (software-defined networking [SDN])

19 Genomic M1. High-throug 1. Processing raw data in1. Legacy comp2. Distributed 2. Challenge: characteri 2. Huge data storage in PB range (storage)3. Various file formats with both struc 3. Unix-based legacy sequencing bioinformatics software (software package)

20 Comparativ1. Multiple cen 2. Scalable RDBMS for h1. Huge data s2. Proteins and2. Real-time rapid and parallel bulk lo3. Front real-t 3. Oracle RDBMS, SQLite files, flat te4. Heterogeneou4. Linux cluster, Oracle RDBMS server5. Metagenomic 5. Sequencing and comparative analysi

6. Descriptive statistics21 Individual 1. Distributed 1. Data integration usi 1. data wareh

2. Over 5 milli 2. Parallel retrieval alg 2. supercomputers, cloud and pa3. Each record: 3. Distributed graph min3. I/O intensive processing4. No real-time4. Robust statistical ana 4. HDFS storage5. Two main cat5. Semantic graph mining5. custom code to develop new properties from stored data.6. Data consist 6. Semantic graph traversal

22 Statistical 1. Centralized 1. Relational probabilis 1. Java, some 2. Range from h2. Robust and accurate 2. Cloud and parallel computing3. Both constan3. Learning algorithms to3. High-performance computer, 48 GB RAM (to perform analysis for a moderate sample size)4. Large, multi- 4. Generalized and refin4. Dlusters for large datasets5. Rich relatio 5. Challenge: acceptance5. 200 GB–1 TB hard drive for test data6. Unpredictable arrival rates, often real time

23 World Popul1. File-based sy1. Compute-intensive an1. Movement of2. Large volume2. Unstructured and irre2. Distributed MPI-based simulat3. Variety of o 3. Summary of various r3. Charm++ on multi-nodes (software)

4. Network file system (storage)5. Infiniband network (networking)

24 Social Cont1. Traditional 1. Large-scale modeling 1. Computing i2. Fine-resolut 2. Scalable fusion betw 2. File server

M0214 http://bigdatawg.nist.gov/_uploadfiles/M0214_v1_5406533104.docx

M0215 http://bigdatawg.nist.gov/_uploadfiles/M0215_v1_1579991796.docx

M0177 http://bigdatawg.nist.gov/_uploadfiles/M0177_v1_1133239355.docx

M0089 http://bigdatawg.nist.gov/_uploadfiles/M0089_v1_7814086875.docx

M0191 http://bigdatawg.nist.gov/_uploadfiles/M0191_v2_5659292903.docx

M0078 http://bigdatawg.nist.gov/_uploadfiles/M0078_v1_8198680934.docx

M0188 http://bigdatawg.nist.gov/_uploadfiles/M0188_v1_8691012255.docx

M0140 http://bigdatawg.nist.gov/_uploadfiles/M0140_v1_5675248635.docx

M0174 http://bigdatawg.nist.gov/_uploadfiles/M0174_v1_8098597993.docx

M0172 http://bigdatawg.nist.gov/_uploadfiles/M0172_v1_8972697421.docx

M0173 http://bigdatawg.nist.gov/_uploadfiles/M0173_v1_3577651730.docx

Page 86: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

3. Huge data st3. Multi-level analysis w3. Ethernet and Infiniband networking (networking)4. Specialized simulators, open source software, and proprietary modeling (application)5. Huge user accounts across country boundaries (networking)

25 Biodiversit 1. Special dedi 1. Web-based services, g1. Expandable 2. Storage: dist 2. Personalized virtual l 2. Cloud comm3. Distributed d3. Grid- and cloud-based resources4. Wide variety4. Data analyzed incrementally and/or in real time at varying rates owing to var5. Multi-type d 5. A variety of data and analytical and modeling tools to support analytics for d6. Data streami6. Parallel data streams and streaming analytics

7. Access and integration of multiple distributed databases26 Large-Scal -- -- 1. GPU

2. High-performance MPI and HPC Infiniband cluster3. Libraries for single-machine or single-GPU computation – available (e.g., BLAS, CuBLAS, MAGMA, etc.); distributed computation of dense BLAS-like or LAPACK-like operations on GPUs – poorly developed; existing solutions (e.g., ScaLapack for CPUs) – not well-integrated with higher-level languages and require low-level programming, lengthening experiment and development time

27 Organizing 1. Over 500 mil1. Classifier (e.g. an SVM1. Hadoop or2. Features seen in many large-scale image processing problems

28 Truthy Twit1. Distributed 1. Various real-time data1. Hadoop and2. Large volume of real-time streaming 2. IndexedHBas3. Raw data in compressed formats 3. In-memory 4. Fully structured data in JSON, user 4. High-speed Infiniband network (networking)5. Multiple data schemas

29 Crowd Sour-- 1. Digitize existing aud --2. Analytics: pattern recognition of all kinds (e.g., speech recognition, automatic A&V analysis, cultural patterns), identification of structures (lexical units, linguistic rules, etc.)

30 CINET for 1. A set of net 1. Environments to run 1. Large file s2. Asynchronou2. Dynamic growth of t 2. Various network connectivity (networking)

3. Asynchronous and rea3. Existing computing cluster4. Different parallel alg 4. EC2 computing cluster

5. Various graph libraries, management tools, databases, semantic web tools31 NIST Inform1. Large amount1. Test analytic algorith 1. PERL, Pytho

2. Scaling ground-truthing to larger data, intrinsic and annotation uncertainty measurement, performance measurement for incompletely annotated data, measuring analytic performance for heterogeneous data and analytic flows involving users32 DataNet (i 1. Process key 1. Provision of general 1. iRODS data

2. Real-time and batch data 2. interoperability across stora33 The Discinn1. Integration -- 1. Software:

34 Semantic G1. All data type1. Data graph processin 1. Cloud comm2. RDBMS

35 Light sourc1. Multiple str 1. Standard bioinformati1. High-volume2. Sample data to be analyzed in real time

36 Catalina Re1. ~0.1 TB per d1. A wide variety of the --2. Automated classification with machine learning tools given the very sparse and heterogeneous data, dynamically evolving in time as more data come in, with follow-up decision making reflecting limited follow-up resources

37 DOE Extrem1. ~1 PB/year b1. Advanced analysis and1. MPI, OpenMP2. Methods/ tools to address supercomputer I/O subsystem limitations

38 Large Surv 1. 20 TB of dat 1. Analysis on both the 1. Standard as2. Techniques for handli2. Oracle RDBMS, Postgres psql, GPFS and Lustre file systems and tape archives

3. Parallel image storage39 Particle Ph 1. Real-time da1. Experimental data fr 1. Legacy comp

2. Asynchroniza2. Histograms, scatter-pl2. Distributed cached files (storage)3. Calibration o3. Monte-Carlo computa3. Object databases (software package)

40 Belle II Hi 1. 120 PB of ra -- 1. 120 PB raw

M0141 http://bigdatawg.nist.gov/_uploadfiles/M0141_v1_5563475154.docx

M0136 http://bigdatawg.nist.gov/_uploadfiles/M0136_v1_5489292512.docx

M0171 http://bigdatawg.nist.gov/_uploadfiles/M0171_v1_7185377580.docx

M0160 http://bigdatawg.nist.gov/_uploadfiles/M0160_v1_6667987957.docx

M0211 http://bigdatawg.nist.gov/_uploadfiles/M0211_v2_3994987602.docx

M0158 http://bigdatawg.nist.gov/_uploadfiles/M0158_v1_1209297717.docx

M0190 http://bigdatawg.nist.gov/_uploadfiles/M0190_v1_2052764107.docx

M0130 http://bigdatawg.nist.gov/_uploadfiles/M0130_v1_3759224345.docx

M0163 http://bigdatawg.nist.gov/_uploadfiles/M0163_v1_6644793897.docx

M0131 http://bigdatawg.nist.gov/_uploadfiles/M0131_v1_9568192535.docx

M0189 http://bigdatawg.nist.gov/_uploadfiles/M0189_v1_1536495869.docx

M0170 http://bigdatawg.nist.gov/_uploadfiles/M0170_v1_5720273656.docx

M0185 http://bigdatawg.nist.gov/_uploadfiles/M0185_v1_4843821869.docx

M0209 http://bigdatawg.nist.gov/_uploadfiles/M0209_v1_4702199454.docx

M0166 http://bigdatawg.nist.gov/_uploadfiles/M0166_v3_2675550648.DOCX

M0210 http://bigdatawg.nist.gov/_uploadfiles/M0210_v1_7474668890.docx

Page 87: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

2. International distributed computing model to augment that at accelerator (Japan)3. Data transfer of ~20 GB/ second at designed luminosity between Japan and United States4. Software from Open Science Grid, Geant4, DIRAC, FTS, Belle II framework

41 EISCAT 3D 1. Remote sites1. Queen Bea architectur1. Architectur2. Hierarchical 2. Real-time monitoring of equipment by partial streaming analysis3. Visualization3. Hosting needed for rich set of radar image processing services using machine learning, statistical modelling, and graph algorithms

42 ENVRI Envi1. Huge volume 1. Diversified analytics t 1. Variety of 2. Variety of instrumentation datasets 2. Scattered re

43 CReSIS Rem1. Provision of 1. Legacy software (Matl1. ~0.5 PB/yea2. Data gatherin2. Signal processing an 2. Transfer co3. Varieties of datasets 3. MapReduce or MPI plus language binding for C/Java

44 UAVSAR Dat1. Angular and 1. Geolocated data that 1. Support for2. Compatibility2. Significant human int 2. Hosting of rich set of radar image processing services

3. Hosting of rich set of 3. ROI_PAC, GeoServer, GDAL, GeoTIFF-supporting tools4. ROI_PAC, GeoServer, 4. Compatibility with other NASA radar systems and repositories (Alaska Satellite Facility)

45 NASA LARC1. Federate dis 1. CAaaS on clouds 1. Support vir2. GPFS parallel file system integrated with Hadoop3. iRODS

46 MERRA Anal1. Integrate si 1. CAaaS on clouds 1. NetCDF awa2. Real-time and batch mode needed 2. MapReduce3. Interoperable use of AWS and local c3. Interoperable use of AWS and local clusters4. iRODS data management

47 Atmospheri1. Real-time di 1. MapReduce, SciDB, an1. Other legac2. Various form2. Continuous computing2. high throughput data transmission over the network

3. Event specification language for data mining and event searching4. Semantics interpretation and optimal structuring for 4D data mining and predictive analysis

48 Climate Stu1. ~100 PB data1. Data analytics close t 1. Extension of2. Integration of large-scale distributed data from si3. Linking of diverse data to novel HPC simulation

49 DOE-BER Su1. Heterogeneou-- 1. Postgres, 2. Synthesis of diverse and disparate field, laboratory, omic, and simulation datasets across different semantic, spatial, and temporal scales3. Linking of diverse data to novel HPC simulation

50 DOE-BER Am1. Heterogeneou1. Custom software such1. Custom soft2. Link to many other environment and 2. Analytics including data mining, data quality assessment, cross-correlation across datasets, data assimilation, data interpolation, statistics, quality assessment, data fusion, etc.3. Link to HPC climate and other simulations4. Link to European data sources and projects5. Access to data from 500 distributed sources

51 Consumptio1. Diverse data 1. New machine learning1. SQL databas2. Data updated every 15 minutes 2. R/Matlab, Weka, Hadoop (platform)

M0155 http://bigdatawg.nist.gov/_uploadfiles/M0155_v1_3537561150.docx

M0157 http://bigdatawg.nist.gov/_uploadfiles/M0157_v1_6396188402.docx

M0167 http://bigdatawg.nist.gov/_uploadfiles/M0167_v1_7320744610.docx

M0127 http://bigdatawg.nist.gov/_uploadfiles/M0127_v1_8374144249.docx

M0182 http://bigdatawg.nist.gov/_uploadfiles/M0182_v1_3824910269.docx

M0129 http://bigdatawg.nist.gov/_uploadfiles/M0129_v1_8721988256.pdf

M0090 http://bigdatawg.nist.gov/_uploadfiles/M0090_v1_7386661507.docx

M0186 http://bigdatawg.nist.gov/_uploadfiles/M0186_v1_2893359960.docx

M0183 http://bigdatawg.nist.gov/_uploadfiles/M0183_v2_2632549904.docx

M0184 http://bigdatawg.nist.gov/_uploadfiles/M0184_v1_8925840651.docx

M0223 http://bigdatawg.nist.gov/_uploadfiles/M0223_v1_9531843932.docx

Page 88: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Data Consumer Other-- 1. Title 13 data 1. Long-term preservation o--

2. Long-term preservation at the bit level3. Curation process including format transformation4. Access and analytics processing after 75 years5. No data loss

1. High relevancy 1. Security policy 1. Pre-process for virus sc 1. Mobile search with similar interfaces/ results from desktop2. High accuracy from categorization of r 2. File format identification3. Various storage systems such as NetApp3. Indexing

4. Long-term preservation management of large varied datasets 4. Records categorization5. Huge numbers of data with high relevancy and recall

1. Data visualizati 1. Improved recommendat1. High veracity on data a 1. Mobile access2. Confidential and secure data; processes that are auditable for security and confidentiality as required by various legal statutes

1. Data visualizati 1. Confidential and secu1. High veracity on data a --

-- 1. Strong security and p-- 1. Mobile access

1. Custom-built re1. Access controls for 1. Metadata management f1. Windows Android and iOS mobile devices for content deliverables from Windows desktops2. Visualization tools such as networking g2. Identification of document duplication

3. Hadoop (platform) 3. Persistent identifier4. Scribe, Hive, Mahout, Python (language) 4. Metadata correlation between data repositories such as CrossRef, PubMed, and Arxiv5. Moderate storage (15 TB with 1 TB/ month)6. Batch and real-time processing

1. Streaming and 1. Preservation of users1. Continued ranking and u1. Smart interface accessing movie content on mobile platforms2. Pig (language)3. Cassandra and Hive4. Huge numbers of subscribers, ratings, and searches per day (DB)5. Huge amounts of storage (2 PB)6. I/O intensive processing

1. Search time of 1. Access control 1. Data purge after certain1. Mobile search and rendering2. Top 10 ranked r2. Protection of sensiti 2. Data cleaning3. Page layout (visual)-- 1. Strong security for m-- --

2. Commercial cloud services-- 1. Security policy -- --

2. Real-time updates on tracking items1. Visualization f 1. Protection of proprie1. Handle data quality (cur--2. Visualization to 2. Tools to mask proprietary information

1. Browser-based s1. Sandbox as independ1. Validation and uncertai 1. Mobile applications (apps) to access materials genomics information2. GPFS (storage) 2. Policy-driven federat2. UQ in results from multiple datasets3. MonogDB systems (platform)4. 10 GB networking5. Various analytic tools such as PyMatGen, FireWorks, VASP, ABINIT, NWChem, BerkeleyGW, varied community codes6. Large storage (storage)7. Scalable key-value and object store (platform)8. Data streams from peta/exascale centralized simulation systems

1. Visualization w 1. Complete security of -- --

Security and Privacy Lifecycle Management

Page 89: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

2. Unique approaches to indexing and distributed analysis required for geospatial data1. Visualization o 1. Significant security 1. Veracity of extracted ob--2. Output the form of Open Geospatial Consortium (OGC)-compliant web features or standard geospatial files (shape files, KML)

3. GPU usage important1. Geospatial over1. Protection of data a 1. Data provenance (e.g. tr--

2. Up to hundreds of petabytes of data supported by modest to large clusters and clouds3. Hadoop, Accumulo (Big Table), Solr, NLP (several variants), Puppet (for deployment and security), Storm, custom applications, visualization tools

1. Results of analy1. Data consumer direct 1. Standardize, aggregate,1. Security across mobile devices2. Cray supercomputer 2. Protection of all he 2. Reduce errors and bias3. Teradata, PostgreSQL, Mong 3. Protection of data in 3. Common nomenclature and classification of content across disparate sources—particularly challenging in the health IT space, as the taxonomies continue to evolve— SNOMED, International Classification of Diseases (ICD) 9 and future ICD 10, etc.4. Various, with significant I/O i 4. Security and privacy policies unique to a data subset

5. Robust security to prevent data breaches1. Visualization fo 1. Security and privacy 1. Human annotations for 1. 3D visualization and rendering on mobile platforms

2. Huge legacy and new storage such as storage area network (SAN) or HDFS (storage)3. High-throughput network link (networking)4. MPI image analysis, MapReduce, Hive with spatial extension (software packages)

1. 3D structural m1. Significant but optio 1. Workflow components in--2. NERSC’s Hopper infrastructure3. database and image collections4. 10 GB and future 100 GB and advanced networking (software-defined networking [SDN])

1. Data format fo 1. Security and privacy -- 1. Mobile platforms for physicians accessing genomic data (mobile device)2. Huge data storage in PB range (storage)3. Unix-based legacy sequencing bioinformatics software (software package)

1. Real-time intera1. Login security: use 1. Methods to improve dat--2. Interactive We 2. Creation of user acc 2. Data clustering, classification, reduction3. Download of ass3. Single sign-on capabil3. Integration of new data/content into the system’s data store and data annotation4. Ability to query and browse data via interactive web UI5. Visualize data structure at different levels of resolution; ability to view abstract representations of highly similar data

1. Efficient data 1. Protection of health 1. Data annotated based o1. Mobile access2. supercomputers, cloud and pa2. Security policies for 2. Traceability of data from origin (initial point of collection) through use3. I/O intensive processing 3. Data conversion from existing data warehouse into RDF triples4. HDFS storage5. custom code to develop new properties from stored data.

1. Visualization of 1. Secure handling and 1. Merging multiple tables--2. Cloud and parallel computing 2. Methods to validate data to minimize errors3. High-performance computer, 48 GB RAM (to perform analysis for a moderate sample size)4. Dlusters for large datasets5. 200 GB–1 TB hard drive for test data

1. Visualization 1. Protection of PII on 1. Data quality, ability to --2. Distributed MPI-based simulat2. Data protection and secure platform for computation3. Charm++ on multi-nodes (software)4. Network file system (storage)5. Infiniband network (networking)

1. Multi-level det 1. Protection of PII of 1. Data fusion from variety 1. Efficient method of moving data2. Visualization wi2. Data protection and 2. Data consistency and no corruption

Page 90: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

3. Ethernet and Infiniband networking (networking) 3. Preprocessing of raw data4. Specialized simulators, open source software, and proprietary modeling (application)5. Huge user accounts across country boundaries (networking)

1. Access by mobil1. Federated identity 1. Data storage and archiv--2. Advanced/ rich/2. Access control and a 2. Data lifecycle management: data provenance, referral integrity and identification traceability back to initial observational data3. 4D visualization computational models 3. Processed (secondary) data storage (in addition to original source data) for future uses

4. Data analyzed incrementally and/or in real time at varying rates owing to var 4. Provenance (and persistent identification [PID]) control of data, algorithms, and workflows5. A variety of data and analytical and modeling tools to support analytics for d 5. Curated (authorized) reference data (e.g. species name lists), algorithms, software code, workflows6. Parallel data streams and streaming analytics7. Access and integration of multiple distributed databases

-- -- -- --2. High-performance MPI and HPC Infiniband cluster3. Libraries for single-machine or single-GPU computation – available (e.g., BLAS, CuBLAS, MAGMA, etc.); distributed computation of dense BLAS-like or LAPACK-like operations on GPUs – poorly developed; existing solutions (e.g., ScaLapack for CPUs) – not well-integrated with higher-level languages and require low-level programming, lengthening experiment and development time

1. Visualize large 1. Preserve privacy for -- --2. Features seen in many large-scale image processing problems

1. Data retrieval 1. Security and privacy 1. Standardized data struc1. Low-level data storage infrastructure for efficient mobile access to data2. Data-driven interactive web interfaces3. API for data query

4. High-speed Infiniband network (networking)

-- 1. Privacy issues in pr -- --2. Analytics: pattern recognition of all kinds (e.g., speech recognition, automatic A&V analysis, cultural patterns), identification of structures (lexical units, linguistic rules, etc.)

1. Client-side visua-- -- --2. Various network connectivity (networking)3. Existing computing cluster4. EC2 computing cluster5. Various graph libraries, management tools, databases, semantic web tools

1. Analytic flows i 1. Security requirement-- --2. Scaling ground-truthing to larger data, intrinsic and annotation uncertainty measurement, performance measurement for incompletely annotated data, measuring analytic performance for heterogeneous data and analytic flows involving users

1. General visuali 1. Federate across exis -- --2. interoperability across stora 2. Access controls on files independent of the storage location

-- 1. Significant but optio 1. Integration of metadata--

1. Efficient data- -- -- --

-- 1. Multiple security and-- --

1. Visualization m -- -- --2. Automated classification with machine learning tools given the very sparse and heterogeneous data, dynamically evolving in time as more data come in, with follow-up decision making reflecting limited follow-up resources

1. Interpretation o-- -- --2. Methods/ tools to address supercomputer I/O subsystem limitations

-- -- 1. Links between remote te--2. Oracle RDBMS, Postgres psql, GPFS and Lustre file systems and tape archives3. Parallel image storage

1. Histograms and 1. Data protection 1. Data quality on comple --2. Distributed cached files (storage)3. Object databases (software package)

-- 1. Standard grid authen-- --

Page 91: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

2. International distributed computing model to augment that at accelerator (Japan)3. Data transfer of ~20 GB/ second at designed luminosity between Japan and United States4. Software from Open Science Grid, Geant4, DIRAC, FTS, Belle II framework

1. Support needed -- 1. Preservation of data an 1. Support needed for real-time monitoring of equipment by partial streaming analysis2. Real-time monitoring of equipment by partial streaming analysis3. Hosting needed for rich set of radar image processing services using machine learning, statistical modelling, and graph algorithms

1. Graph plotting t1. Open data policy with1. High data quality 1. Various kinds of mobile sensor devices for data acquisition2. Time series interactive tools 2. Mirror archives3. Brower-based flash playback 3. Various metadata frameworks4. Earth high-resolution map display 4. Scattered repositories and data curation5. Visual tools for quality comparisons1. GIS user interfa1. Security and privacy o1. Data quality assurance 1. Monitoring data collection instruments/ sensors2. Rich user interf 2. Dynamic security and privacy policy mechanisms

3. MapReduce or MPI plus language binding for C/Java1. Support for fie -- 1. Significant human inter 1. Support for field expedition users with phone/tablet interface and low-resolution downloads

2. Hosting of rich set of radar image processing services2. Rich robust provenance defining complex machine/human processing3. ROI_PAC, GeoServer, GDAL, GeoTIFF-supporting tools4. Compatibility with other NASA radar systems and repositories (Alaska Satellite Facility)

1. Support needed -- -- --2. GPFS parallel file system integrated with Hadoop

1. High-end distrib-- -- 1. Smart phone and tablet access required2. MapReduce 2. iRODS data management3. Interoperable use of AWS and local clusters

1. Visualization to -- 1. Validation for output pr--2. high throughput data transmission over the network

3. Event specification language for data mining and event searching4. Semantics interpretation and optimal structuring for 4D data mining and predictive analysis

1. Worldwide clim-- -- 1. Phone-based input and access2. High-end distributed visualization

1. Phone-based in-- -- 1. Phone-based input and access2. Synthesis of diverse and disparate field, laboratory, omic, and simulation datasets across different semantic, spatial, and temporal scales

1. Phone-based in-- -- 1. Phone-based input and access2. Analytics including data mining, data quality assessment, cross-correlation across datasets, data assimilation, data interpolation, statistics, quality assessment, data fusion, etc.

-- 1. Privacy and anonymiz-- 1. Mobile access for clients2. R/Matlab, Weka, Hadoop (platform)

Page 92: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

1. Mobile search with similar interfaces/ results from desktop

2. Confidential and secure data; processes that are auditable for security and confidentiality as required by various legal statutes

1. Windows Android and iOS mobile devices for content deliverables from Windows desktops

4. Metadata correlation between data repositories such as CrossRef, PubMed, and Arxiv

1. Smart interface accessing movie content on mobile platforms

1. Mobile applications (apps) to access materials genomics information

Page 93: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

2. Output the form of Open Geospatial Consortium (OGC)-compliant web features or standard geospatial files (shape files, KML)

3. Hadoop, Accumulo (Big Table), Solr, NLP (several variants), Puppet (for deployment and security), Storm, custom applications, visualization tools

3. Common nomenclature and classification of content across disparate sources—particularly challenging in the health IT space, as the taxonomies continue to evolve— SNOMED, International Classification of Diseases (ICD) 9 and future ICD 10, etc.

1. 3D visualization and rendering on mobile platforms

1. Mobile platforms for physicians accessing genomic data (mobile device)

3. Integration of new data/content into the system’s data store and data annotation

5. Visualize data structure at different levels of resolution; ability to view abstract representations of highly similar data

2. Traceability of data from origin (initial point of collection) through use

Page 94: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

2. Data lifecycle management: data provenance, referral integrity and identification traceability back to initial observational data3. Processed (secondary) data storage (in addition to original source data) for future uses4. Provenance (and persistent identification [PID]) control of data, algorithms, and workflows5. Curated (authorized) reference data (e.g. species name lists), algorithms, software code, workflows

3. Libraries for single-machine or single-GPU computation – available (e.g., BLAS, CuBLAS, MAGMA, etc.); distributed computation of dense BLAS-like or LAPACK-like operations on GPUs – poorly developed; existing solutions (e.g., ScaLapack for CPUs) – not well-integrated with higher-level languages and require low-level programming, lengthening experiment and development time

1. Low-level data storage infrastructure for efficient mobile access to data

2. Analytics: pattern recognition of all kinds (e.g., speech recognition, automatic A&V analysis, cultural patterns), identification of structures (lexical units, linguistic rules, etc.)

2. Scaling ground-truthing to larger data, intrinsic and annotation uncertainty measurement, performance measurement for incompletely annotated data, measuring analytic performance for heterogeneous data and analytic flows involving users

2. Automated classification with machine learning tools given the very sparse and heterogeneous data, dynamically evolving in time as more data come in, with follow-up decision making reflecting limited follow-up resources

Page 95: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

1. Support needed for real-time monitoring of equipment by partial streaming analysis

1. Various kinds of mobile sensor devices for data acquisition

1. Monitoring data collection instruments/ sensors

1. Support for field expedition users with phone/tablet interface and low-resolution downloads2. Rich robust provenance defining complex machine/human processing

1. Smart phone and tablet access required

2. Analytics including data mining, data quality assessment, cross-correlation across datasets, data assimilation, data interpolation, statistics, quality assessment, data fusion, etc.

Page 96: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

3. Common nomenclature and classification of content across disparate sources—particularly challenging in the health IT space, as the taxonomies continue to evolve— SNOMED, International Classification of Diseases (ICD) 9 and future ICD 10, etc.

Page 97: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

3. Libraries for single-machine or single-GPU computation – available (e.g., BLAS, CuBLAS, MAGMA, etc.); distributed computation of dense BLAS-like or LAPACK-like operations on GPUs – poorly developed; existing solutions (e.g., ScaLapack for CPUs) – not well-integrated with higher-level languages and require low-level programming, lengthening experiment and development time

Page 98: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

3. Libraries for single-machine or single-GPU computation – available (e.g., BLAS, CuBLAS, MAGMA, etc.); distributed computation of dense BLAS-like or LAPACK-like operations on GPUs – poorly developed; existing solutions (e.g., ScaLapack for CPUs) – not well-integrated with higher-level languages and require low-level programming, lengthening experiment and development time

Page 99: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Activities DescriptionSystem Orchestrator

Several security functions have been mapped to the System Orchestrator block, as they require architectural level decisions and awareness. Aspects of these functionalities are strongly related to the Security Fabric and thus touch the entire architecture at various points in different forms of operational details.Such security functions include nation-specific compliance requirements, vastly expanded demand for forensics, and domain-specific, privacy-aware business risk models.

Data ProviderData Providers are subject to guaranteeing authenticity of data and in turn require that sensitive, copyrighted, or valuable data be adequately protected. This leads to operational aspects of entity registration and identity ecosystems.

Data ConsumerData Consumers exhibit a duality with Data Providers in terms of obligations and requirements – only they face the access/visualization aspects of the Application Provider.

Application ProviderApplication Provider interfaces between the Data Provider and Data Consumer. It takes part in all the secure interface protocols with these blocks as well as maintains secure interaction with the Framework Provider.

Framework ProviderFramework Provider is responsible for the security of data/computations for a significant portion of the lifecycle of the data. This includes security of data at rest through encryption and access control; security of computations via isolation/virtualization; and security of communication with the Application Provider..

· Policy Enforcement· Security Metadata Model· Data Loss Prevention, Detection· Data Lifecycle Management· Threat and Vulnerability Management· Mitigation· Configuration Management· Monitoring, Alerting· Malware Surveillance and Remediation· Resiliency, Redundancy and Recovery· Accountability· Compliance· Forensics· Business Risk Model

· Device, User, Asset, Services, Applications Registration· Application Layer Identity· End User Layer Identity Management· End Point Input Validation· Digital Rights Management· Monitoring, Alerting

· Application Layer Identity· End User Layer Identity Management· Web Services Gateway· Digital Rights Management· Monitoring, Alerting

· Application Layer Identity· Web Services Gateway· Data Transformation· Digital Rights Management· Monitoring, Alerting

· Virtualization Layer Identity· Identity Provider· Encryption and Key Management· Isolation/Containerization· Storage Security· Network Boundary Control· Monitoring, Alerting

Page 100: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Several security functions have been mapped to the System Orchestrator block, as they require architectural level decisions and awareness. Aspects of these functionalities are strongly related to the Security Fabric and thus touch the entire architecture at various points in different forms of operational details.Such security functions include nation-specific compliance requirements, vastly expanded demand for forensics, and domain-specific, privacy-aware business risk models.

Data Providers are subject to guaranteeing authenticity of data and in turn require that sensitive, copyrighted, or valuable data be adequately protected. This leads to operational aspects of entity registration and identity ecosystems.

Data Consumers exhibit a duality with Data Providers in terms of obligations and requirements – only they face the access/visualization aspects of the Application Provider.

Application Provider interfaces between the Data Provider and Data Consumer. It takes part in all the secure interface protocols with these blocks as well as maintains secure interaction with the Framework Provider.

Framework Provider is responsible for the security of data/computations for a significant portion of the lifecycle of the data. This includes security of data at rest through encryption and access control; security of computations via isolation/virtualization; and security of communication with the Application Provider..

Page 101: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Several security functions have been mapped to the System Orchestrator block, as they require architectural level decisions and awareness. Aspects of these functionalities are strongly related to the Security Fabric and thus touch the entire architecture at various points in different forms of operational details.Such security functions include nation-specific compliance requirements, vastly expanded demand for forensics, and domain-specific, privacy-aware business risk models.

Data Providers are subject to guaranteeing authenticity of data and in turn require that sensitive, copyrighted, or valuable data be adequately protected. This leads to operational aspects of entity registration and identity ecosystems.

Data Consumers exhibit a duality with Data Providers in terms of obligations and requirements – only they face the access/visualization aspects of the Application Provider.

Application Provider interfaces between the Data Provider and Data Consumer. It takes part in all the secure interface protocols with these blocks as well as maintains secure interaction with the Framework Provider.

Framework Provider is responsible for the security of data/computations for a significant portion of the lifecycle of the data. This includes security of data at rest through encryption and access control; security of computations via isolation/virtualization; and security of communication with the Application Provider..

Page 102: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Several security functions have been mapped to the System Orchestrator block, as they require architectural level decisions and awareness. Aspects of these functionalities are strongly related to the Security Fabric and thus touch the entire architecture at various points in different forms of operational details.

Framework Provider is responsible for the security of data/computations for a significant portion of the lifecycle of the data. This includes security of data at rest through encryption and access control; security of computations via isolation/virtualization; and security of communication with the Application Provider..

Page 103: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

NBDRA Component and InterfacesData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderApplication Provider → Data ConsumerApplication Provider → Data ConsumerApplication Provider → Data ConsumerData Provider ↔Framework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFabricFabricFabric

Page 104: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Security and Privacy TopicEnd-point input validationReal-time security monitoringData discovery and classificationSecure data aggregationPrivacy-preserving data analyticsCompliance with regulationsGovernment access to data and freedom of expression concernsData-centric security such as identity/policy-based encryptionPolicy management for access controlComputing on the encrypted data: searching/ filtering/ deduplicate/ fully homomorphic encryptionAuditsSecuring data storage and transaction logsKey managementSecurity best practices for non-relational data storesSecurity against DoS attacksData provenanceAnalytics for security intelligenceEvent detectionForensics

Page 105: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Use Case Mapping

Content creation securityDiscovery/classification is possible across media, populations, and channelsVendor-supplied aggregation services—security practices are opaqueAggregate reporting to content ownersPII disclosure issues aboundVarious issues; for example, playing terrorist podcast and illegal playbackUnknownUser, playback administrator, library maintenance, and auditorUnknownAudit DRM usage for royaltiesUnknownUnknownUnknownN/ATraceability to data owners, producers, consumers is preservedMachine intelligence for unsanctioned use/access“Playback” granularity definedSubpoena of playback records in legal disputes

Varies and is vendor dependent. Spoofing is possible. For example, protections afforded by securing Microsoft Rights Management Services.[49] Secure/Multipurpose Internet Mail Extensions (S/MIME)

Page 106: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Varies and is vendor dependent. Spoofing is possible. For example, protections afforded by securing Microsoft Rights Management Services.[49] Secure/Multipurpose Internet Mail Extensions (S/MIME)

Page 107: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

NBDRA Component and InterfacesData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderApplication Provider → Data ConsumerApplication Provider → Data ConsumerApplication Provider → Data ConsumerData Provider ↔Framework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFabricFabricFabric

Page 108: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Security and Privacy TopicEnd-point input validationReal-time security monitoringData discovery and classificationSecure data aggregationPrivacy-preserving data analyticsCompliance with regulationsGovernment access to data and freedom of expression concernsData-centric security such as identity/policy-based encryptionPolicy management for access controlComputing on the encrypted data: searching/filtering/deduplicate/fully homomorphic encryptionAuditsSecuring data storage and transaction logsKey managementSecurity best practices for non-relational data storesSecurity against DoS attacksData provenanceAnalytics for security intelligenceEvent detectionForensics

Page 109: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Use Case MappingDevice-specific keys from digital sources; receipt sources scanned internally and reconciled to family ID (Role issues)NoneClassifications based on data sources (e.g., retail outlets, devices, and paper sources)Aggregated into demographic crosstabs. Internal analysts had access to PIIAggregated to (sometimes) product-specific, statistically valid independent variablesPanel data rights secured in advance and enforced through organizational controlsN/AEncryption not employed in place; only for data-center-to-data-center transfers. XML (Extensible Markup Language) cube security mapped to Sybase IQ and reporting toolsExtensive role-based controlsN/ASchematron and process step auditsProject-specific audits secured by infrastructure teamManaged by project chief security officer (CSO). Separate key pairs issued for customers and internal usersRegular data integrity checks via XML schema validationIndustry-standard webhost protection provided for query subsystemUniqueNo project-specific initiativesN/AUsage, cube-creation, and device merge audit records were retained for forensics and billing

Page 110: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Device-specific keys from digital sources; receipt sources scanned internally and reconciled to family ID (Role issues)

Encryption not employed in place; only for data-center-to-data-center transfers. XML (Extensible Markup Language) cube security mapped to Sybase IQ and reporting tools

Page 111: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

NBDRA Component and InterfacesData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderApplication Provider → Data ConsumerApplication Provider → Data ConsumerApplication Provider → Data ConsumerData Provider ↔Framework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFabricFabricFabric

Page 112: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Security and Privacy TopicEnd-point input validationReal-time security monitoringData discovery and classificationSecure data aggregationPrivacy-preserving data analyticsCompliance with regulationsGovernment access to data and freedom of expression concernsData-centric security such as identity/policy-based encryptionPolicy management for access controlComputing on the encrypted data: searching/filtering/deduplicate/fully homomorphic encryptionAuditsSecuring data storage and transaction logsKey managementSecurity best practices for non-relational data storesSecurity against DoS attacksData provenanceAnalytics for security intelligenceEvent detectionForensics

Page 113: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Use Case MappingDevice-dependent. Spoofing is often easyWeb server monitoringSome geospatial attributionAggregation to device, visitor, button, web event, and othersIP anonymizing and timestamp degrading. Content-specific opt-outAnonymization may be required for EU compliance. Opt-out honoringYesVaries depending on archivistSystem- and application-level access controlsUnknownCustomer audits for accuracy and integrity are supportedStorage archiving—this is a big issueCSO and applicationsUnknownStandardServer, application, IP-like identity, page point-in-time Document Object Model (DOM), and point-in-time marketing eventsAccess to web logs often requires privilege elevationCan infer; for example, numerous sales, marketing, and overall web health eventsSee the SIEM use case

Page 114: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Server, application, IP-like identity, page point-in-time Document Object Model (DOM), and point-in-time marketing events

Page 115: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

NBDRA Component and InterfacesData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderApplication Provider → Data ConsumerApplication Provider → Data ConsumerApplication Provider → Data ConsumerData Provider ↔Framework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFabricFabricFabric

Page 116: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Security and Privacy TopicEnd-point input validationReal-time security monitoringData discovery and classificationSecure data aggregationPrivacy-preserving data analyticsCompliance with regulationsGovernment access to data and freedom of expression concernsData-centric security such as identity/policy-based encryptionPolicy management for access controlComputing on the encrypted data: searching/filtering/deduplicate/fully homomorphic encryptionAuditsSecuring data storage and transaction logsKey managementSecurity best practices for non-relational data storesSecurity against distributed denial of Service (DDoS) attacksData provenanceAnalytics for security intelligenceEvent detectionForensics

Page 117: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Use Case Mapping

Validation of incoming records to assure integrity through signature validation and to assure HIPAA privacy through ensuring PHI is encrypted. May need to check for evidence of informed consentLeverage Health Level Seven (HL7) and other standard formats opportunistically, but avoid attempts at schema normalization. Some columns will be strongly encrypted while others will be specially encrypted (or associated with cryptographic metadata) for enabling discovery and classification. May need to perform column filtering based on the policies of the data source or the HIE service providerClear text columns can be deduplicated, perhaps columns with deterministic encryption. Other columns may have cryptographic metadata for facilitating aggregation and deduplication. Retention rules are assumed, but disposition rules are not assumed in the related areas of complianceSearching on encrypted data and proofs of data possession. Identification of potential adverse experience due to clinical trial participation. Identification of potential professional patients. Trends and epidemics, and co-relations of these to environmental and other effects. Determination of whether the drug to be administered will generate an adverse reaction, without breaking the double blind. Patients will need to be provided with detailed accounting of accesses to, and uses of, their EHR dataHIPAA security and privacy will require detailed accounting of access to EHR data. Facilitating this, and the logging and alerts, will require federated identity integration with data consumersCDC, law enforcement, subpoenas and warrants. Access may be toggled based on occurrence of a pandemic (e.g., CDC) or receipt of a warrant (e.g., law enforcement)Row-level and column-level access controlRole-based and claim-based. Defined for PHI cellsPrivacy-preserving access to relevant events, anomalies, and trends for CDC and other relevant health organizationsFacilitate HIPAA readiness and HHS auditsNeed to be protected for integrity and privacy, but also for establishing completeness, with an emphasis on availabilityFederated across covered entities, with the need to manage key life cycles across multiple covered entities that are data sourcesEnd-to-end encryption, with scenario-specific schemes that respect min-entropy to provide richer query operations without compromising patient privacyA mandatory requirement: systems must survive DDoS attacksCompleteness and integrity of data with records of all accesses and modifications. This information could be as sensitive as the data and is subject to commensurate access policiesMonitoring of informed patient consent, authorized and unauthorized transfers, and accesses and modificationsTransfer of record custody, addition/modification of record (or cell), authorized queries, unauthorized queries, and modification attemptsTamper-resistant logs, with evidence of tampering events. Ability to identify record-level transfers of custody and cell-level access or modification

Strong authentication, perhaps through X.509v3 certificates, potential leverage of SAFE (Signatures & Authentication for Everything[51]) bridge in lieu of general PKI

Page 118: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Validation of incoming records to assure integrity through signature validation and to assure HIPAA privacy through ensuring PHI is encrypted. May need to check for evidence of informed consentLeverage Health Level Seven (HL7) and other standard formats opportunistically, but avoid attempts at schema normalization. Some columns will be strongly encrypted while others will be specially encrypted (or associated with cryptographic metadata) for enabling discovery and classification. May need to perform column filtering based on the policies of the data source or the HIE service providerClear text columns can be deduplicated, perhaps columns with deterministic encryption. Other columns may have cryptographic metadata for facilitating aggregation and deduplication. Retention rules are assumed, but disposition rules are not assumed in the related areas of complianceSearching on encrypted data and proofs of data possession. Identification of potential adverse experience due to clinical trial participation. Identification of potential professional patients. Trends and epidemics, and co-relations of these to environmental and other effects. Determination of whether the drug to be administered will generate an adverse reaction, without breaking the double blind. Patients will need to be provided with detailed accounting of accesses to, and uses of, their EHR dataHIPAA security and privacy will require detailed accounting of access to EHR data. Facilitating this, and the logging and alerts, will require federated identity integration with data consumersCDC, law enforcement, subpoenas and warrants. Access may be toggled based on occurrence of a pandemic (e.g., CDC) or receipt of a warrant (e.g., law enforcement)

Privacy-preserving access to relevant events, anomalies, and trends for CDC and other relevant health organizations

Need to be protected for integrity and privacy, but also for establishing completeness, with an emphasis on availabilityFederated across covered entities, with the need to manage key life cycles across multiple covered entities that are data sourcesEnd-to-end encryption, with scenario-specific schemes that respect min-entropy to provide richer query operations without compromising patient privacy

Completeness and integrity of data with records of all accesses and modifications. This information could be as sensitive as the data and is subject to commensurate access policiesMonitoring of informed patient consent, authorized and unauthorized transfers, and accesses and modificationsTransfer of record custody, addition/modification of record (or cell), authorized queries, unauthorized queries, and modification attemptsTamper-resistant logs, with evidence of tampering events. Ability to identify record-level transfers of custody and cell-level access or modification

Strong authentication, perhaps through X.509v3 certificates, potential leverage of SAFE (Signatures & Authentication for Everything[51]) bridge in lieu of general PKI

Page 119: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Leverage Health Level Seven (HL7) and other standard formats opportunistically, but avoid attempts at schema normalization. Some columns will be strongly encrypted while others will be specially encrypted (or associated with cryptographic metadata) for enabling discovery and classification. May need to perform column filtering based on the policies of the data source or the HIE service providerClear text columns can be deduplicated, perhaps columns with deterministic encryption. Other columns may have cryptographic metadata for facilitating aggregation and deduplication. Retention rules are assumed, but disposition rules are not assumed in the related areas of complianceSearching on encrypted data and proofs of data possession. Identification of potential adverse experience due to clinical trial participation. Identification of potential professional patients. Trends and epidemics, and co-relations of these to environmental and other effects. Determination of whether the drug to be administered will generate an adverse reaction, without breaking the double blind. Patients will need to be provided with detailed accounting of accesses to, and uses of, their EHR data

Page 120: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Leverage Health Level Seven (HL7) and other standard formats opportunistically, but avoid attempts at schema normalization. Some columns will be strongly encrypted while others will be specially encrypted (or associated with cryptographic metadata) for enabling discovery and classification. May need to perform column filtering based on the policies of the data source or the HIE service provider

Searching on encrypted data and proofs of data possession. Identification of potential adverse experience due to clinical trial participation. Identification of potential professional patients. Trends and epidemics, and co-relations of these to environmental and other effects. Determination of whether the drug to be administered will generate an adverse reaction, without breaking the double blind. Patients will need to be provided with detailed accounting of accesses to, and uses of, their EHR data

Page 121: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Searching on encrypted data and proofs of data possession. Identification of potential adverse experience due to clinical trial participation. Identification of potential professional patients. Trends and epidemics, and co-relations of these to environmental and other effects. Determination of whether the drug to be administered will generate an adverse reaction, without breaking the double blind. Patients will need to be provided with detailed accounting of accesses to, and uses of, their EHR data

Page 122: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

NBDRA Component and InterfacesData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderApplication Provider → Data ConsumerApplication Provider → Data ConsumerApplication Provider → Data ConsumerData Provider ↔Framework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFabricFabricFabric

Page 123: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Security & Privacy TopicEnd-point input validationReal-time security monitoringData discovery and classificationSecure data aggregationPrivacy-preserving data analyticsCompliance with regulationsGovernment access to data and freedom of expression concernsData-centric security such as identity/policy-based encryptionPolicy management for access controlComputing on the encrypted data: searching/filtering/deduplicate/fully homomorphic encryptionAuditsSecuring data storage and transaction logsKey managementSecurity best practices for non-relational data storesSecurity against DoS attacksData provenanceAnalytics for security intelligenceEvent detectionForensics

Page 124: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Use Case MappingOpaque—company-specificNoneOpaque—company-specificThird-party aggregatorData to be reported in aggregate but preserving potentially small-cell demographicsResponsible developer and third-party custodianLimited use in research community, but there are possible future public health data concerns. Clinical study reports only, but possibly selectively at the study- and patient-levelsTBDInternal roles; third-party custodian roles; researcher roles; participating patients’ physiciansTBDRelease audit by a third partyTBDInternal varies by firm; external TBDTBDUnlikely to become publicTBD—critical issueTBDTBD

Page 125: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Limited use in research community, but there are possible future public health data concerns. Clinical study reports only, but possibly selectively at the study- and patient-levels

Page 126: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

NBDRA Component and InterfacesData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderApplication Provider → Data ConsumerApplication Provider → Data ConsumerApplication Provider → Data ConsumerData Provider ↔Framework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFabricFabricFabric

Page 127: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Security and Privacy TopicEnd-point input validationReal-time security monitoringData discovery and classificationSecure data aggregationPrivacy-preserving data analyticsCompliance with regulationsGovernment access to data and freedom of expression concernsData-centric security such as identity/policy-based encryptionPolicy management for access controlComputing on the encrypted data: searching/filtering/deduplicate/fully homomorphic encryptionAuditsSecuring data storage and transaction logsKey managementSecurity best practices for non-relational data storesSecurity against DDoS attacksData provenanceAnalytics for security intelligenceEvent detectionForensics

Page 128: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Use Case Mapping

---Varies by tool, but classified based on security semantics and sourcesAggregates by subnet, workstation, and serverPlatform-specificApplicable, but regulated events are not readily visible to analystsNSA and FBI have access on demandUsually a feature of the operating systemFor example, a group policy for an event logVendor and platform-specificComplex—audits are possible throughoutVendor and platform-specificChief Security Officer and SIEM product keysTBDBig Data application layer DDoS attacks can be mitigated using combinations of traffic analytics, correlation analysisFor example, how to know an intrusion record was actually associated with a specific workstationFeature of current SIEMsFeature of current SIEMsFeature of current SIEMs

Software-supplier specific; refer to commercially available end point validation[52]

Page 129: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Big Data application layer DDoS attacks can be mitigated using combinations of traffic analytics, correlation analysis

Page 130: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

NBDRA Component and InterfacesData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderApplication Provider → Data ConsumerApplication Provider → Data ConsumerApplication Provider → Data ConsumerData Provider ↔Framework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFabricFabricFabric

Page 131: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Security and Privacy TopicEnd-point input validationReal-time security monitoringData discovery and classificationSecure data aggregationPrivacy-preserving data analyticsCompliance with regulationsGovernment access to data and freedom of expression concernsData-centric security such as identity/policy-based encryptionPolicy management for access controlComputing on the encrypted data: searching/filtering/deduplicate/fully homomorphic encryptionAuditsSecuring data storage and transaction logsKey managementSecurity best practices for non-relational data storesSecurity against DoS attacksData provenanceAnalytics for security intelligenceEvent detectionForensics

Page 132: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Use Case MappingNeed to secure the sensor (e.g., camera) to prevent spoofing/stolen sensor streams. There are new transceivers and protocols in the DOD pipeline. Sensor streams will include smartphone and tablet sourcesOnboard and control station secondary sensor security monitoringVaries from media-specific encoding to sophisticated situation-awareness enhancing fusion schemes

Geospatial constraints: cannot surveil beyond Universal Transverse Mercator (UTM). Military secrecy: target and point of origin privacyNumerous. There are also standards issuesFor example, the Google lawsuit over Street ViewPolicy-based encryption, often dictated by legacy channel capacity/typeTransformations tend to be made within DOD/contractor-devised system schemesSometimes performed within vendor-supplied architectures, or by image-processing parallel architecturesCSO and Inspector General (IG) auditsThe usual, plus data center security levels are tightly managed (e.g., field vs. battalion vs. headquarters)CSO—chain of commandNot handled differently at present; this is changing in DODDOD anti-jamming e-measuresMust track to sensor point in time configuration and metadataDOD develops specific field of battle security software intelligence—event driven and monitoring—that is often remoteFor example, target identification in a video stream infers height of target from shadow. Fuse data from satellite infrared with separate sensor streamUsed for after action review (AAR)—desirable to have full playback of sensor streams

Fusion challenges range from simple to complex. Video streams may be used[54] unsecured or unaggregated

Page 133: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Need to secure the sensor (e.g., camera) to prevent spoofing/stolen sensor streams. There are new transceivers and protocols in the DOD pipeline. Sensor streams will include smartphone and tablet sources

Geospatial constraints: cannot surveil beyond Universal Transverse Mercator (UTM). Military secrecy: target and point of origin privacy

DOD develops specific field of battle security software intelligence—event driven and monitoring—that is often remoteFor example, target identification in a video stream infers height of target from shadow. Fuse data from satellite infrared with separate sensor stream

Fusion challenges range from simple to complex. Video streams may be used[54] unsecured or unaggregated

Page 134: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

NBDRA Component and InterfacesData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderApplication Provider → Data ConsumerApplication Provider → Data ConsumerApplication Provider → Data ConsumerData Provider ↔Framework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFabricFabricFabric

Page 135: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Security and Privacy TopicEnd-point input validationReal-time security monitoringData discovery and classificationSecure data aggregationPrivacy-preserving data analyticsCompliance with regulationsGovernment access to data and freedom of expression concernsData-centric security such as identity/policy-based encryptionPolicy management for access controlComputing on the encrypted data: searching/filtering/deduplicate/fully homomorphic encryptionAuditsSecuring data storage and transaction logsKey managementSecurity best practices for non-relational data storesSecurity against DDoS attacksData provenanceAnalytics for security intelligenceEvent detectionForensics

Page 136: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Use Case MappingApplication-dependent. Spoofing is possibleVendor-specific monitoring of tests, test-takers, administrators, and dataUnknownTypical: Classroom-levelVarious: For example, teacher-level analytics across all same-grade classroomsParent, student, and taxpayer disclosure and privacy rules applyYes. May be required for grants, funding, performance metrics for teachers, administrators, and districtsSupport both individual access (student) and partitioned aggregateVendor (e.g., Pearson) controls, state-level policies, federal-level policies; probably 20-50 different roles are spelled out at present

Support both internal and third-party audits by unions, state agencies, responses to subpoenasLarge enterprise security, transaction level controls—classroom to the federal governmentCSOs from the classroom level to the national level---StandardTraceability to measurement event requires capturing tests at a point in time, which may itself require a Big Data platformVarious commercial security applicationsVarious commercial security applicationsVarious commercial security applications

Proposed [55]

Page 137: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Vendor (e.g., Pearson) controls, state-level policies, federal-level policies; probably 20-50 different roles are spelled out at present

Traceability to measurement event requires capturing tests at a point in time, which may itself require a Big Data platform

Page 138: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

NBDRA Component and InterfacesData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderData Provider → Application ProviderApplication Provider → Data ConsumerApplication Provider → Data ConsumerApplication Provider → Data ConsumerData Provider ↔Framework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFramework ProviderFabricFabricFabric

Page 139: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Security and Privacy TopicEnd-point input validationReal-time security monitoringData discovery and classificationSecure data aggregationPrivacy-preserving data analyticsCompliance with regulationsGovernment access to data and freedom of expression concernsData-centric security such as identity/policy-based encryptionPolicy management for access controlComputing on the encrypted data: searching/filtering/deduplicate/fully homomorphic encryptionAuditsSecuring data storage and transaction logsKey managementSecurity best practices for non-relational data storesSecurity against DoS attacksData provenanceAnalytics for security intelligenceEvent detectionForensics

Page 140: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Use Case MappingEnsuring integrity of data collected from sensorsSensors can detect abnormal temperature/environmental conditions for packages with special requirements. They can also detect leaks/radiation---Securely aggregating data from sensorsSensor-collected data can be private and can reveal information about the package and geo-information. The revealing of such information needs to preserve privacy---The U.S. Department of Homeland Security may monitor suspicious packages moving into/out of the country---Private, sensitive sensor data and package data should only be available to authorized individuals. Third-party commercial offerings may implement low-level access to the dataSee above section on “Transformation”---Logging sensor data is essential for tracking packages. Sensor data at rest should be kept in secure data storesFor encrypted dataThe diversity of sensor types and data types may necessitate the use of non-relational data stores---Metadata should be cryptographically attached to the collected data so that the integrity of origin and progress can be assured. Complete preservation of provenance will sometimes mandate a separate Big Data applicationAnomalies in sensor data can indicate tampering/fraudulent insertion of data trafficAbnormal events such as cargo moving out of the way or being stationary for unwarranted periods can be detectedAnalysis of logged data can reveal details of incidents after they occur

Page 141: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Sensors can detect abnormal temperature/environmental conditions for packages with special requirements. They can also detect leaks/radiation

Sensor-collected data can be private and can reveal information about the package and geo-information. The revealing of such information needs to preserve privacy

The U.S. Department of Homeland Security may monitor suspicious packages moving into/out of the country

Private, sensitive sensor data and package data should only be available to authorized individuals. Third-party commercial offerings may implement low-level access to the data

Logging sensor data is essential for tracking packages. Sensor data at rest should be kept in secure data stores

Metadata should be cryptographically attached to the collected data so that the integrity of origin and progress can be assured. Complete preservation of provenance will sometimes mandate a separate Big Data application

Abnormal events such as cargo moving out of the way or being stationary for unwarranted periods can be detected

Page 142: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Metadata should be cryptographically attached to the collected data so that the integrity of origin and progress can be assured. Complete preservation of provenance will sometimes mandate a separate Big Data application

Page 143: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Category Database Types DescriptionDatabases Analytics Databases In general, these are highly optimized for read-only interactions and typically acceptable for database responses to have high latency (e.g., invoke scalable batch processing over large data sets).Databases Operational Databases In general, these support efficient write and read operations. NoSQL databases are often used in Big Data architectures in this capacity. Data can later be transformed and loaded into analytic databases to support analytic applications.Databases In Memory Data Grids These high performance data caches and stores minimize writing to disk. They can be used for large scale real-time applications requiring transparent access to data.Analytics and Database Interfaces Batch Analytics and Database Interfaces These interfaces use batch scalable processing (e.g., Map-Reduce) to access data in scalable data stores (e.g., Hadoop File System). These interfaces can be SQL-like (e.g., Hive) or programmatic (e.g., Pig).Analytics and Database Interfaces Interactive Analytics and Interfaces These interfaces avoid direct access data stores to provide interactive responses to end users. The data stores can be horizontally scalable databases tuned for interactive responses (e.g., HBase) or query languages tuned to data models (e.g., Drill for nested data).Analytics and Database Interfaces Real-Time Analytics and Interfaces Some applications require real-time responses to events occurring within large data streams (e.g., algorithmic trading). This complex event processing uses machine-based analytics, which require very high performance data access to streams and data stores.

Page 144: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

In general, these are highly optimized for read-only interactions and typically acceptable for database responses to have high latency (e.g., invoke scalable batch processing over large data sets).In general, these support efficient write and read operations. NoSQL databases are often used in Big Data architectures in this capacity. Data can later be transformed and loaded into analytic databases to support analytic applications.These high performance data caches and stores minimize writing to disk. They can be used for large scale real-time applications requiring transparent access to data.These interfaces use batch scalable processing (e.g., Map-Reduce) to access data in scalable data stores (e.g., Hadoop File System). These interfaces can be SQL-like (e.g., Hive) or programmatic (e.g., Pig).These interfaces avoid direct access data stores to provide interactive responses to end users. The data stores can be horizontally scalable databases tuned for interactive responses (e.g., HBase) or query languages tuned to data models (e.g., Drill for nested data).Some applications require real-time responses to events occurring within large data streams (e.g., algorithmic trading). This complex event processing uses machine-based analytics, which require very high performance data access to streams and data stores.

Page 145: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

In general, these are highly optimized for read-only interactions and typically acceptable for database responses to have high latency (e.g., invoke scalable batch processing over large data sets).In general, these support efficient write and read operations. NoSQL databases are often used in Big Data architectures in this capacity. Data can later be transformed and loaded into analytic databases to support analytic applications.These high performance data caches and stores minimize writing to disk. They can be used for large scale real-time applications requiring transparent access to data.These interfaces use batch scalable processing (e.g., Map-Reduce) to access data in scalable data stores (e.g., Hadoop File System). These interfaces can be SQL-like (e.g., Hive) or programmatic (e.g., Pig).These interfaces avoid direct access data stores to provide interactive responses to end users. The data stores can be horizontally scalable databases tuned for interactive responses (e.g., HBase) or query languages tuned to data models (e.g., Drill for nested data).Some applications require real-time responses to events occurring within large data streams (e.g., algorithmic trading). This complex event processing uses machine-based analytics, which require very high performance data access to streams and data stores.

Page 146: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

These interfaces avoid direct access data stores to provide interactive responses to end users. The data stores can be horizontally scalable databases tuned for interactive responses (e.g., HBase) or query languages tuned to data models (e.g., Drill for nested data).Some applications require real-time responses to events occurring within large data streams (e.g., algorithmic trading). This complex event processing uses machine-based analytics, which require very high performance data access to streams and data stores.

Page 147: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Data Transformation Steps Functional DescriptionData Collection Data can be collected for different types and forms; similar sources and structure resulting in uniform security considerations, policies, and allows creation of an initial metadataAggregation This is defined by Microsoft as where sets of existing data is collected to form an easily correlated metadata (e.g., identical keys) and then aggregated into a larger collection thus enriching number of objects as the collection grows.Matching This is defined as where sets of existing data collections with dissimilar metadata (e.g., keys) are aggregated into a larger collection. Similar to aggregation this stage also enhances information about each object.Data Mining Microsoft refers this as a process where data, analyzing it from many dimensions or perspectives, then producing a summary of the information in a useful form that identifies relationships within the data. There are two types of data mining: descriptive, which gives information about existing data; and predictive, which makes forecasts based on the data

Page 148: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Data can be collected for different types and forms; similar sources and structure resulting in uniform security considerations, policies, and allows creation of an initial metadataThis is defined by Microsoft as where sets of existing data is collected to form an easily correlated metadata (e.g., identical keys) and then aggregated into a larger collection thus enriching number of objects as the collection grows.This is defined as where sets of existing data collections with dissimilar metadata (e.g., keys) are aggregated into a larger collection. Similar to aggregation this stage also enhances information about each object.Microsoft refers this as a process where data, analyzing it from many dimensions or perspectives, then producing a summary of the information in a useful form that identifies relationships within the data. There are two types of data mining: descriptive, which gives information about existing data; and predictive, which makes forecasts based on the data

Page 149: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

This is defined by Microsoft as where sets of existing data is collected to form an easily correlated metadata (e.g., identical keys) and then aggregated into a larger collection thus enriching number of objects as the collection grows.This is defined as where sets of existing data collections with dissimilar metadata (e.g., keys) are aggregated into a larger collection. Similar to aggregation this stage also enhances information about each object.Microsoft refers this as a process where data, analyzing it from many dimensions or perspectives, then producing a summary of the information in a useful form that identifies relationships within the data. There are two types of data mining: descriptive, which gives information about existing data; and predictive, which makes forecasts based on the data

Page 150: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Microsoft refers this as a process where data, analyzing it from many dimensions or perspectives, then producing a summary of the information in a useful form that identifies relationships within the data. There are two types of data mining: descriptive, which gives information about existing data; and predictive, which makes forecasts based on the data

Page 151: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Use Case Characterization Categories Corresponds ToData sources →Data transformation →Capabilities →Data consumer →Security and privacy →Lifecycle management →Other requirements →

Page 152: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Reference Architecture Components and FabricsData ProviderBig Data Application ProviderBig Data Framework ProviderData ConsumerSecurity and Privacy FabricSystem Orchestrator; Management FabricTo all components and fabrics

Page 153: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Dense Linear Algebra* Combinational LogicSparse Linear Algebra* Graph TraversalSpectral methods Dynamic ProgrammingN-Body Methods Backtrack and Branch-and-BoundStructured Grids* Graphical ModelsUnstructured Grids* Finite State MachinesMap/Reduce

Page 154: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Use Case Characterization Categories Corresponds ToData sources →Data transformation →Capabilities →Data consumer →Security and privacy →Lifecycle management →Other requirements →

Page 155: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Reference Architecture Components And FabricsData ProviderBig Data Application ProviderBig Data Framework ProviderData ConsumerSecurity and Privacy FabricSystem Orchestrator; Management FabricTo all components and fabric

Page 156: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Standard Name/NumberISO/IEC 9075-*ISO/IEC Technical Report (TR) 9789ISO/IEC 11179-*ISO/IEC 11179-*ISO/IEC 11179-*ISO/IEC 11179-*ISO/IEC 11179-*ISO/IEC 11179-*ISO/IEC 11179-*ISO/IEC 10728-*ISO/IEC 13249-*ISO/IE TR 19075-*ISO/IE TR 19075-*ISO/IE TR 19075-*ISO/IE TR 19075-*ISO/IE TR 19075-*ISO/IEC 19503ISO/IEC 19773ISO/IEC TR 20943ISO/IEC 19763-*ISO/IEC 19763-*ISO/IEC 19763-*ISO/IEC 19763-*ISO/IEC 19763-*ISO/IEC 19763-*ISO/IEC 19763-*ISO/IEC 19763-*ISO/IEC 19763-*ISO/IEC 19763-*ISO/IEC 19763-*ISO/IEC 19763-*ISO/IEC 19763-*ISO/IEC 9281:1990ISO/IEC 10918:1994ISO/IEC 11172:1993ISO/IEC 13818:2013ISO/IEC 14496:2010ISO/IEC 15444:2011ISO/IEC 21000:2003ISO 6709:2008ISO 19115-*ISO 19110ISO 19139ISO 19119ISO 19157ISO 19114IEEE 21451 -*

Page 157: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

IEEE 21451 -*IEEE 21451 -*IEEE 21451 -*IEEE 21451 -*IEEE 2200-2012ISO/IEC 15408-2009ISO/IEC 27010:2012ISO/IEC 27033-1:2009ISO/IEC TR 14516:2002ISO/IEC 29100:2011ISO/IEC 9798:2010ISO/IEC 11770:2010ISO/IEC 27035:2011ISO/IEC 27037:2012JSR (Java Specification Request) 221 (developed by the Java Community Process)W3C XMLW3C Resource Description Framework (RDF)

W3C Document Object Model (DOM) Level 1 SpecificationW3C XQuery 3.0W3C XProcW3C XML Encryption Syntax and Processing Version 1.1W3C XML Signature Syntax and Processing Version 1.1W3C XPath 3.0W3C XSL Transformations (XSLT) Version 2.0W3C Efficient XML Interchange (EXI) Format 1.0 (Second Edition)W3C RDF Data Cube VocabularyW3C Data Catalog Vocabulary (DCAT)W3C HTML5 A vocabulary and associated APIs for HTML and XHTMLW3C Internationalization Tag Set (ITS) 2.0W3C OWL 2 Web Ontology LanguageW3C Platform for Privacy Preferences (P3P) 1.0W3C Protocol for Web Description Resources (POWDER)W3C ProvenanceW3C Rule Interchange Format (RIF)

W3C Simple Knowledge Organization System Reference (SKOS)W3C Simple Object Access Protocol (SOAP) 1.2W3C SPARQL 1.1

W3C XML Key Management Specification (XKMS) 2.0

ISO Metadata Application Profile

W3C JavaScript Object Notation (JSON)-LD 1.0

W3C Service Modeling Language (SML)1.1

W3C Web Service Description Language (WSDL) 2.0

OGC® OpenGIS® Catalogue Services Specification 2.0.2 -

OGC® OpenGIS® GeoAPIOGC® OpenGIS® GeoSPARQLOGC® OpenGIS® Geography Markup Language (GML) Encoding StandardOGC® Geospatial eXtensible Access Control Markup Language (GeoXACML) Version 1OGC® network Common Data Form (netCDF)

Page 158: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

OASIS AS4 Profile of ebMS 3.0 v1.0OASIS Advanced Message Queuing Protocol (AMQP) Version 1.0OASIS Application Vulnerability Description Language (AVDL) v1.0

OASIS Content Management Interoperability Services (CMIS)OASIS Digital Signature Service (DSS)OASIS Directory Services Markup Language (DSML) v2.0OASIS ebXML Messaging ServicesOASIS ebXML RegRepOASIS ebXML Registry Information ModelOASIS ebXML Registry Services SpecificationOASIS eXtensible Access Control Markup Language (XACML)OASIS Message Queuing Telemetry Transport (MQTT)OASIS Open Data (OData) ProtocolOASIS Search Web Services (SWS)OASIS Security Assertion Markup Language (SAML) v2.0OASIS SOAP-over-UDP (User Datagram Protocol) v1.1OASIS Solution Deployment Descriptor Specification v1.0OASIS Symptoms Automation Framework (SAF) Version 1.0OASIS Topology and Orchestration Specification for Cloud Applications Version 1.0OASIS Universal Business Language (UBL) v2.1OASIS Universal Description, Discovery and Integration (UDDI) v3.0.2OASIS Unstructured Information Management Architecture (UIMA) v1.0OASIS Unstructured Operation Markup Language (UOML) v1.0OASIS/W3C WebCGM v2.1OASIS Web Services Business Process Execution Language (WS-BPEL) v2.0OASIS/W3C - Web Services Distributed Management (WSDM): Management Using Web Services (MUWS) v1.1OASIS WSDM: Management of Web Services (MOWS) v1.1OASIS Web Services Dynamic Discovery (WS-Discovery) v1.1OASIS Web Services Federation Language (WS-Federation) v1.2OASIS Web Services Notification (WSN) v1.3IETF Simple Network Management Protocol (SNMP) v3IETF Extensible Provisioning Protocol (EPP)

OGC® Open Modelling Interface Standard (OpenMI)OGC® OpenSearch Geo and Time ExtensionsOGC® Web Services Context Document (OWS Context)OGC® Sensor Web Enablement (SWE)OGC® OpenGIS® Simple Features Access (SFA)OGC® OpenGIS® Georeferenced Table Joining Service (TJS) Implementation StandardOGC® OpenGIS® Web Coverage Processing Service Interface (WCPS) StandardOGC® OpenGIS® Web Coverage Service (WCS)OGC® Web Feature Service (WFS) 2.0 Interface StandardOGC® OpenGIS® Web Map Service (WMS) Interface StandardOGC® OpenGIS® Web Processing Service (WPS) Interface Standard

OASIS Biometric Identity Assurance Services (BIAS) Simple Object Access Protocol (SOAP) Profile v1.0

Page 159: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Descriptio SO DP DC BDAP BDFP S&P MISO/IEC 9075 defines I I/U U I/U U UGuidelines for the Or I/U I/U I/U I/UThe 11179 standard is I I/U I/U U

Information Resource Dictionary System Services InterfaceDatabase Languages –I I/U U I/UThis is a series of TR I I/U U I/U

Extensible Markup LaI I/U U I/U UMetadata Registries I I/U U I/U I/UMetadata Registry CoI I/U U I/U U UInformation TechnologI I/U U U

Information Technol I U I/U I/UInformation TechnoloI U I/U I/UInformation TechnologI U I/U I/UInformation TechnoloI U I/U I/UInformation TechnoloI U I/U I/UInformation TechnoloI U I/U I/UInformation Technol I U I/U I/UStandard RepresentatiI U I/U I/UGeographic MetadataI U I/U UGeographic InformatiI U I/UGeographic MetadataI U I/UGeographic InformatiI U I/UGeographic InformatiI U I/U UGeographic Information—Quality EvaluatioIInformation TechnoloI U

· Part 1: Framework· Part 2: Classification· Part 3: Registry metamodel and basic attributes· Part 4: Formulation of data definitions· Part 5: Naming and identification principles· Part 6: Registration

· Part 1: Xquery· Part 2: SQL Support for Time-Related Information · Part 3: Programs Using the Java Programming Language· Part 4: Routines and Types Using the Java Programming Language

· Part 1: Reference model· Part 3: Metamodel for ontology registration· Part 5: Metamodel for process model registration· Part 6: Registry Summary· Part 7: Metamodel for service registration· Part 8: Metamodel for role and goal registration· Part 9: On Demand Model Selection (ODMS) TR· Part 10: Core model and basic mapping· Part 12: Metamodel for information model registration· Part 13: Metamodel for forms registration· Part 14: Metamodel for dataset registration· Part 15: Metamodel for data provenance registration

Page 160: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

Standard Protocol fo I U I/UInformatioU IInformation TechnoloI U I/UInformation Technol I/U I/U I/U IInformatioU UInformation Technology—Security Techniques—Privacy Framew IInformation TechnoloI/U U U U I/UInformation Technol I/U U U U I/UInformatioU IInformationU IJDBC™ 4.0 ApplicationI/U I/U I/U I/UXML 1.0 (F I/U I/U I/U I/U I/U I/U I/UThe RDF is a frameworI U I/U I/UJSON-LD 1.0 A JSON-bI U I/U I/UThis series of specif I U I/U I/UThe XQuery specificatI U I/U I/U

I I U I/U I/UThis specification cov I U I/UThis specification cov I U I/UXPath 3.0 is an expre I U I/U I/UThis specification de I U I/U I/UThis specification cov I U I/UThe Data Cube vocabulI U I/U I/UDCAT is an RDF vocabuI U I/UThis specification de I U I/UThe ITS 2.0 specifica I U I/U I/UThe OWL 2 Web OntoloI U I/U I/UThe P3P enables Web siI U I/U I/UPOWDER—the Protocol I U I/UProvenance is informaI U I/U I/U URIF is a series of sta I U I/U I/UThis specif I/U I U I/UThis document definesI U I/USOAP is a protocol sp I U I/USPARQL is a language I U I/U I/UThis specif U I U I/UThis standaU I U I/UThis series of standa I U I/U

The GeoAPI Standard dI U I/U I/UI U I/U I/U

The GML is an XML graI U I/U I/UThe Policy Language i I U I/U I/U I/UnetCDF is a set of sof I U I/U

· Part 1: Network Capable Application Processor (NCAP) information model· Part 2: Transducer to microprocessor communication protocols and Transducer Electronic Data Sheet (TEDS) formats· Part 4: Mixed-mode communication protocols and TEDS formats· Part 7: Transducer to radio frequency identification (RFID) systems communication protocols and TEDS formats

This specification describes the syntax and semantics ofXProc: An XML Pipeline Language, a language for describing operations to be performed on XML documents.

The OGC® GeoSPARQL standard supports representing and querying geospatial data on the Semantic Web. GeoSPARQL defines a vocabulary for representing geospatial data in RDF, and it defines an extension to the SPARQL query language for processing geospatial data.

Page 161: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

The purpose of the OpI U I/U I/UThis OGC standard speI U I/U I

I U I/U IThis series of standa I U I/UDescribes the common I U I/U I/UThis standard is the s I U I/U I/UDefines a protocol-indI U I/U IThis document specifiI U I/U IThe WFS standard provI U I/U IThe OpenGIS® WMS InteI U I/U IThe OpenGIS® WPS InterI U I/U IStandard for businessI U I/UThe AMQP is an open iI U U IThis specification des I U I UThis OASIS BIAS profi I U I/U UThe CMIS standard defI U I/U IThis specification de I U I/UThe DSML provides a mI U I/U IThese specifications I U I/UebXML RegRep is a stanI U I/U IThe Registry InformatI U I/UAn ebXML Registry is I U I/UThe standard defines I U I/U I/U I/UMQTT is a Client Serv I U I/UThe OData Protocol is I U I/U I/UThe OASIS SWS initiatiI U I/UThe SAML defines the I U I/U I/U I/UThis specification de I U I/UThis specif U I/UThis standard defines reference architecture for the Symptoms Automation I/UThe conceptI/U U I I/UThe OASIS UBL definesI U I/U U

I U I/U UThe UIMA specification defines platform-inU IUOML is interface staI U I/U IComputer Graphics MetI U I/U IThis standaU IMUWS definU I I U UThis part U I I U UThis specif U I U I/U U

I U I/U UWSN is a family of rel I U I/USNMP is a series of IETF sponsored standar I I I/U UThis IETF s U I/U

The OGC® OWS Context was created to allow a set of configured information resources (service set) to be passed between applications primarily as a collection of services.

The focus of UDDI is the definition of a set of services supporting the description and discovery of (1) businesses, organizations, and other Web services providers, (2) the Web services they make available, and (3) the technical interfaces which may be used to access those services.

This specification defines mechanisms to allow different security realms to federate, such that authorized access to resources managed in one realm can be provided to security principals whose identities and attributes are managed in other realms.

Page 162: semanticommunity.infosemanticommunity.info/@api/deki/files/33791/NISTBigData… · XLS file · Web view... innovative forms of information processing for enhanced insight and decision

· Part 2: Transducer to microprocessor communication protocols and Transducer Electronic Data Sheet (TEDS) formats

· Part 7: Transducer to radio frequency identification (RFID) systems communication protocols and TEDS formats