Big Data Now Current Perspectives From OReilly Radar Copy

download Big Data Now Current Perspectives From OReilly Radar Copy

of 137

Transcript of Big Data Now Current Perspectives From OReilly Radar Copy

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    1/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    2/137

    Big Data Now

    Beijing Cambridge Farnham Kln Sebastopol Tokyo

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    3/137

    Big Data Now

    Printing History:

    mailto:[email protected]://my.safaribooksonline.com/?portal=oreillyhttp://my.safaribooksonline.com/?portal=oreilly
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    4/137

    Table of Contents

    F o rew o rd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

    1. Data Science and Data Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    2. Data Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    iii

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    5/137

    3. The Application of Data: Products and Processes . . . . . . . . . . . . . . . . . . . . 75

    4. The Business of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

    iv | Table of Contents

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    6/137

    Table of Contents | v

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    7/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    8/137

    Foreword

    vii

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    9/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    10/137

    CHAPTER 1

    Data Science and Data Tools

    What is data science?

    1

    http://radar.oreilly.com/mikel/index.htmlhttp://oreilly.com/web2/archive/what-is-web-20.htmlhttp://www.nytimes.com/2009/08/06/technology/06stats.htmlhttp://radar.oreilly.com/mikel/index.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    11/137

    What is data science?

    2 | Chapter 1:Data Science and Data Tools

    http://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-what-is-data-sciencehttp://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-what-is-data-sciencehttp://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-what-is-data-sciencehttp://en.wikipedia.org/wiki/CDDB
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    12/137

    Flu trends

    What is data science? | 3

    http://www.linkedin.com/http://www.amazon.com/http://www.linkedin.com/http://www.facebook.com/http://www.google.org/flutrends/about/how.htmlhttp://gdgt.com/discuss/voice-recognition-is-amazing-ive-only-68e/http://en.wikipedia.org/wiki/PageRank
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    13/137

    Where data comes from

    4 | Chapter 1:Data Science and Data Tools

    http://infochimps.org/http://www.factual.com/http://en.wikipedia.org/wiki/Nielsen_BookScanhttp://www.nytimes.com/2010/05/02/magazine/02self-measurement-t.html?ref=magazinehttp://oreilly.com/catalog/9780596804787http://www.factual.com/http://infochimps.org/http://en.wikipedia.org/wiki/Nielsen_BookScanhttp://www.nytimes.com/2010/05/02/magazine/02self-measurement-t.html?ref=magazinehttp://oreilly.com/catalog/9780596804787
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    14/137

    What is data science? | 5

    http://news.cnet.com/2300-1010_3-6031405-6.htmlhttp://en.wikipedia.org/wiki/Motorola_68000
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    15/137

    1956 disk drive

    6 | Chapter 1:Data Science and Data Tools

    http://en.wikipedia.org/wiki/Data_scraping#Screen_scrapinghttp://www.almaden.ibm.com/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    16/137

    What is data science? | 7

    http://www.nltk.org/http://www.nltk.org/http://google.com/trendshttp://www.nas.nasa.gov/About/Education/Ozone/history.htmlhttp://www.nas.nasa.gov/About/Education/Ozone/history.htmlhttp://oreilly.com/perl/http://oreilly.com/python/http://oreilly.com/catalog/9780596000707http://oreilly.com/catalog/9780596000707http://www.nltk.org/http://www.nltk.org/http://google.com/trends?q=Pythonhttp://google.com/trends?q=Cassandrahttp://google.com/trendshttp://www.nas.nasa.gov/About/Education/Ozone/history.htmlhttp://www.nas.nasa.gov/About/Education/Ozone/history.htmlhttp://oreilly.com/python/http://oreilly.com/perl/http://oreilly.com/catalog/9780596000707http://www.crummy.com/software/BeautifulSoup/http://www.crummy.com/software/BeautifulSoup/http://oreilly.com/catalog/9780596804787%20id=hni2
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    17/137

    Working with data at scale

    8 | Chapter 1:Data Science and Data Tools

    http://twitter.com/hackingdatahttp://oreilly.com/catalog/9780596157128/%20id=aod4%20title=Data?Beautifulhttp://twitter.com/hackingdatahttps://www.mturk.com/mturk/welcome%20id=k3lahttps://www.mturk.com/mturk/welcome%20id=k3la
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    18/137

    What is data science? | 9

    http://aws.amazon.com/elasticmapreduce/http://developer.yahoo.net/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.htmlhttp://hadoop.apache.org/http://labs.google.com/papers/mapreduce.htmlhttp://hadoop.apache.org/hbase/http://www.riptano.com/http://labs.google.com/papers/bigtable.htmlhttp://www.allthingsdistributed.com/2007/10/amazons_dynamo.htmlhttp://aws.amazon.com/elasticmapreduce/http://www.cloudera.com/http://developer.yahoo.net/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.htmlhttp://developer.yahoo.net/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.htmlhttp://hadoop.apache.org/http://hadoop.apache.org/http://labs.google.com/papers/mapreduce.htmlhttp://www.cloudera.com/http://hadoop.apache.org/hbase/http://www.riptano.com/http://cassandra.apache.org/http://www.allthingsdistributed.com/2007/10/amazons_dynamo.htmlhttp://labs.google.com/papers/bigtable.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    19/137

    10 | Chapter 1:Data Science and Data Tools

    http://www.snaptell.com/http://www.google.com/mobile/goggles/http://bit.ly/http://twitter.com/hmasonhttp://twitter.com/http://search.twitter.com/http://code.google.com/p/hop/http://hadoop.apache.org/pig/http://www.stanford.edu/class/cs229/http://www.snaptell.com/http://www.google.com/mobile/goggles/http://bit.ly/http://twitter.com/hmasonhttp://twitter.com/http://twitter.com/http://search.twitter.com/http://code.google.com/p/hop/http://hadoop.apache.org/pig/http://hadoop.apache.org/hive/http://hadoop.apache.org/hdfs/http://oreilly.com/catalog/9780596521981
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    20/137

    What is data science? | 11

    http://www.r-project.org/http://cran.r-project.org/http://oreilly.com/catalog/9780596801717/http://twitter.com/datasporahttp://www.dataspora.com/http://cran.r-project.org/http://www.r-project.org/http://www.r-project.org/http://oreilly.com/catalog/9780596801717/http://twitter.com/datasporahttp://twitter.com/datasporahttp://www.dataspora.com/https://www.mturk.com/mturk/welcome%20id=k3lahttp://opencv.willowgarage.com/wiki/http://code.google.com/apis/predict/http://lucene.apache.org/mahout/http://www.cs.waikato.ac.nz/ml/weka/http://elefant.developer.nicta.com.au/http://pybrain.org/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    21/137

    Making data tell its story

    Data scientists

    12 | Chapter 1:Data Science and Data Tools

    http://flowingdata.com/2010/04/07/watching-the-growth-of-walmart-now-with-100-more-sams-club/http://flowingdata.com/http://manyeyes.alphaworks.ibm.com/manyeyes/http://processing.org/http://www.gnuplot.info/http://twitter.com/wattenberghttp://flowingdata.com/2010/04/07/watching-the-growth-of-walmart-now-with-100-more-sams-club/http://flowingdata.com/http://manyeyes.alphaworks.ibm.com/manyeyes/http://processing.org/http://www.gnuplot.info/http://twitter.com/wattenberghttp://www.amazon.com/Visual-Display-Quantitative-Information-2nd/dp/0961392142/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    22/137

    What is data science? | 13

    http://oreilly.com/catalog/9780596157128/%20id=aod4%20title=Data?Beautifulhttp://oreilly.com/catalog/9780596157128/%20id=aod4%20title=Data?Beautifulhttp://www.midomi.com/http://twitter.com/dpatilhttp://www.linkedin.com/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    23/137

    Hiring trends for data science

    14 | Chapter 1:Data Science and Data Tools

    http://radar.oreilly.com/research/http://radar.oreilly.com/research/http://radar.oreilly.com/research/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    24/137

    What is data science? | 15

    http://oreilly.com/catalog/9780596153946/http://oreilly.com/catalog/9780596527587/http://oreilly.com/catalog/0636920000617/http://oreilly.com/catalog/9780596157128/http://oreilly.com/catalog/9780596529321/http://oreilly.com/catalog/9780596802363/http://oreilly.com/catalog/9780596510497/http://oreilly.com/catalog/9780596801717/http://oreilly.com/catalog/9780596153946/http://oreilly.com/catalog/9780596527587/http://oreilly.com/catalog/0636920000617/http://oreilly.com/catalog/9780596157128/http://oreilly.com/catalog/9780596529321/http://oreilly.com/catalog/9780596802363/http://oreilly.com/catalog/9780596510497/http://oreilly.com/catalog/9780596801717/http://www.mckinseyquarterly.com/Hal_Varian_on_how_the_Web_challenges_managers_2286http://www.mckinseyquarterly.com/Hal_Varian_on_how_the_Web_challenges_managers_2286
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    25/137

    The SMAQ stack for big data

    16 | Chapter 1:Data Science and Data Tools

    http://radar.oreilly.com/edd/index.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    26/137

    MapReduce

    The SMAQ stack for big data | 17

    http://labs.google.com/papers/mapreduce.htmlhttp://labs.google.com/papers/mapreduce.htmlhttp://labs.google.com/papers/mapreduce.htmlhttp://oreilly.com/web2/archive/what-is-web-20.htmlhttp://strataconf.com/http://en.wikipedia.org/wiki/LAMP_(software_bundle)
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    27/137

    18 | Chapter 1:Data Science and Data Tools

    http://en.wikipedia.org/wiki/MapReduce#Examplehttp://en.wikipedia.org/wiki/MapReduce#Examplehttp://en.wikipedia.org/wiki/MapReduce#Example
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    28/137

    Hadoop MapReduce

    public static class Mapextends Mapper {

    private final static IntWritable one = new IntWritable(1);private Text word = new Text();

    public void map(LongWritable key, Text value, Context context)

    throws IOException, InterruptedException {

    String line = value.toString();StringTokenizer tokenizer = new StringTokenizer(line);while (tokenizer.hasMoreTokens()) {

    word.set(tokenizer.nextToken()); context.write(word, one);

    }}

    }

    public static class Reduceextends Reducer {

    public void reduce(Text key, Iterable values,Context context) throws IOException, InterruptedException {

    int sum = 0; for (IntWritable val : values) { sum += val.get(); }

    context.write(key, new IntWritable(sum));}}

    The SMAQ stack for big data | 19

    http://hadoop.apache.org/mapreduce/docs/current/http://hadoop.apache.org/#What+Is+Hadoop%3Fhttp://research.yahoo.com/files/cutting.pdf
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    29/137

    Other implementations

    Storage

    20 | Chapter 1:Data Science and Data Tools

    http://en.wikipedia.org/wiki/MapReduce#Implementationshttp://en.wikipedia.org/wiki/MapReduce#Implementations
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    30/137

    Hadoop Distributed File System

    HBase, the Hadoop Database

    The SMAQ stack for big data | 21

    http://labs.google.com/papers/bigtable.htmlhttp://labs.google.com/papers/bigtable.htmlhttp://hbase.apache.org/http://hbase.apache.org/http://labs.google.com/papers/bigtable.htmlhttp://hadoop.apache.org/hdfs/docs/current/hdfs_design.htmlhttp://hadoop.apache.org/hdfs/http://hadoop.apache.org/hdfs/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    31/137

    Hive

    Cassandra and Hypertable

    22 | Chapter 1:Data Science and Data Tools

    http://cassandra.apache.org/http://hypertable.org/http://www.zvents.com/http://hypertable.org/http://cassandra.apache.org/http://hadoop.apache.org/hive/http://incubator.apache.org/thrift/http://en.wikipedia.org/wiki/Representational_State_Transfer
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    32/137

    NoSQL database implementations of MapReduce

    The SMAQ stack for big data | 23

    https://wiki.basho.com/display/RIAK/Riakhttp://www.mongodb.org/http://code.google.com/p/hypertable/wiki/HiveExtensionhttp://code.google.com/p/hypertable/wiki/HiveExtensionhttp://wiki.apache.org/cassandra/HadoopSupporthttps://wiki.basho.com/display/RIAK/MapReducehttps://wiki.basho.com/display/RIAK/Riakhttp://www.mongodb.org/display/DOCS/MapReducehttp://www.mongodb.org/http://couchdb.apache.org/http://code.google.com/p/hypertable/wiki/HiveExtensionhttp://wiki.apache.org/cassandra/HadoopSupporthttp://wiki.apache.org/cassandra/HadoopSupport
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    33/137

    Integration with SQL databases

    Integration with streaming data sources

    Commercial SMAQ solutions

    24 | Chapter 1:Data Science and Data Tools

    http://github.com/facebook/scribehttp://archive.cloudera.com/cdh/3/flume-0.9.1+1/UserGuide.htmlhttp://github.com/cloudera/flumehttp://github.com/cwensel/cascading.jdbc/http://github.com/backtype/cascading-dbmigratehttp://www.cloudera.com/http://wiki.github.com/cloudera/sqoop/http://github.com/facebook/scribehttp://archive.cloudera.com/cdh/3/flume-0.9.1+1/UserGuide.htmlhttp://archive.cloudera.com/cdh/3/flume-0.9.1+1/UserGuide.htmlhttp://github.com/cloudera/flumehttp://github.com/backtype/cascading-dbmigratehttp://github.com/cwensel/cascading.jdbc/http://www.cloudera.com/http://wiki.github.com/cloudera/sqoop/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    34/137

    Query

    The SMAQ stack for big data | 25

    http://www.cloudera.com/company/open-source/http://www.cloudera.com/company/open-source/http://www.cloudera.com/company/open-source/http://www.cloudera.com/products-services/enterprise/http://www.cloudera.com/hadoop/http://www.cloudera.com/http://www.netezza.com/releases/2010/release071510.htmhttp://www.netezza.com/releases/2010/release071510.htmhttp://www.netezza.com/http://www.vertica.com/MapReducehttp://www.vertica.com/http://www.cloudera.com/company/open-source/http://www.cloudera.com/company/open-source/http://www.cloudera.com/products-services/enterprise/http://www.cloudera.com/hadoop/http://www.cloudera.com/http://www.netezza.com/releases/2010/release071510.htmhttp://www.netezza.com/releases/2010/release071510.htmhttp://www.netezza.com/http://www.vertica.com/MapReducehttp://www.vertica.com/http://www.asterdata.com/resources/mapreduce.phphttp://www.asterdata.com/resources/mapreduce.phphttp://www.asterdata.com/product/index.phphttp://www.greenplum.com/technology/mapreduce/http://www.greenplum.com/technology/mapreduce/http://www.greenplum.com/http://en.wikipedia.org/wiki/Data_warehousehttp://en.wikipedia.org/wiki/Data_warehouse
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    35/137

    Pig

    input = LOAD 'input/sentences.txt' USING TextLoader();words = FOREACH input GENERATE FLATTEN(TOKENIZE($0));grouped = GROUP words BY $0;counts = FOREACH grouped GENERATE group, COUNT(words);ordered = ORDER counts BY $0;STORE ordered INTO 'output/wordCount' USING PigStorage();

    26 | Chapter 1:Data Science and Data Tools

    http://hadoop.apache.org/pig/docs/r0.7.0/udf.htmlhttp://hadoop.apache.org/pig/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    36/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    37/137

    (defmapcatop split [sentence](seq (.split sentence "\\s+")))

    (? ?word)

    (c/count ?count))

    Search with Solr

    Conclusion

    28 | Chapter 1:Data Science and Data Tools

    http://lucene.apache.org/http://lucene.apache.org/solr/http://nathanmarz.com/blog/introducing-cascalog-a-clojure-based-query-language-for-hado.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    38/137

    Scraping, cleaning, and selling big data

    Scraping, cleaning, and selling big data | 29

    http://www.infochimps.com/datasets/http://blog.infochimps.com/2008/12/29/massive-scrape-of-twitters-friend-graph/http://www.infochimps.com/http://blog.infochimps.com/2008/12/29/massive-scrape-of-twitters-friend-graph/http://radar.oreilly.com/audreyw/index.htmlhttp://en.wikipedia.org/wiki/Trespass_to_chattels#United_States_lawhttp://www.infochimps.com/datasets/http://blog.infochimps.com/2008/12/29/massive-scrape-of-twitters-friend-graph/http://blog.infochimps.com/2008/12/29/massive-scrape-of-twitters-friend-graph/http://www.infochimps.com/http://radar.oreilly.com/audreyw/index.htmlhttp://radar.oreilly.com/2010/06/what-is-data-science.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    39/137

    30 | Chapter 1:Data Science and Data Tools

    http://en.wikipedia.org/wiki/Denial-of-service_attackhttp://radar.oreilly.com/2011/03/twitter-developers.htmlhttp://dev.twitter.com/pages/api_termshttp://www.copyright.gov/fls/fl102.htmlhttp://en.wikipedia.org/wiki/Denial-of-service_attackhttp://radar.oreilly.com/2011/03/twitter-developers.htmlhttp://dev.twitter.com/pages/api_termshttp://www.copyright.gov/fls/fl102.htmlhttp://www.copyright.gov/title17/92chap1.html#102http://www.copyright.gov/title17/92chap1.html#102
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    40/137

    Scraping, cleaning, and selling big data | 31

    http://www.spss.com/https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-infochimpshttp://www.wolfram.com/mathematica/http://en.wikipedia.org/wiki/XMLhttp://www.w3.org/RDF/http://www.spss.com/https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-infochimpshttp://www.oscon.com/oscon2011https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-infochimps
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    41/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    42/137

    Data hand tools

    Data hand tools | 33

    http://www.dataists.com/2010/09/a-taxonomy-of-data-science/http://www.gnu.org/software/octave/http://www.mathworks.com/http://wolfram.com/http://nosql-database.org/http://hadoop.apache.org/http://www.r-project.org/http://radar.oreilly.com/2010/06/what-is-data-science.htmlhttp://radar.oreilly.com/mikel/index.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    43/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    44/137

    $ grep '599 [A-Z][A-Z]' rudx-log.txt | colrm 1 72 | head -2VRMO...

    $ grep '599 [A-Z][A-Z]' rudx-log.txt | colrm 1 72 | sort |\uniq | head -2

    ADAL

    Data hand tools | 35

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    45/137

    $ grep '599 [A-Z][A-Z]' rudx-log.txt | colrm 1 72 | sort | uniq | wc38 38 342

    $ grep '599 [A-Z][A-Z]' rudx-log.txt | awk '{print $2 " " $11}' |\sort | uniq

    14000 AD14000 AL14000 AN...

    $ grep '599 [A-Z][A-Z]' rudx-log.txt | awk '{print $2 " " $11}' |\sort | uniq | grep 21000 | wc20 40 180

    $ grep '599 [A-Z][A-Z]' rudx-log.txt | awk '{print $2 " " $11}' |\sort | uniq | grep 14000 | wc26 52 234

    ...

    36 | Chapter 1:Data Science and Data Tools

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    46/137

    $ grep '599 [A-Z][A-Z]' `find . -name rudx-log.txt -print` |\awk '{print $2 " " $11}' | sort | uniq | grep 14000 | wc

    48 96 432

    ...

    ./2008/rudx-log.txt:QSO: 14000 CW 2008-03-15 1526 W1JQ 599 0054 \\UA6YW 599 AD./2009/rudx-log.txt:QSO: 14000 CW 2009-03-21 1225 W1JQ 599 0015 \\RG3K 599 VR...

    Data hand tools | 37

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    47/137

    $ find . -name rudx-log.txt -print | xargs grep '599 [A-Z][A-Z]' |\awk '{print $2 " " $11}' | grep 14000 | sort | uniq | wc

    48 96 432

    38 | Chapter 1:Data Science and Data Tools

    http://www.softpanorama.org/Tools/Find/using_exec_option_and_xargs_in_find.shtmlhttp://www.softpanorama.org/Tools/Find/using_exec_option_and_xargs_in_find.shtml
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    48/137

    $ find . -name rudx-log.txt -print | xargs grep '599 [A-Z][A-Z]' |\

    awk '{print $2 " " $11}' | pv | grep 14000 | sort | uniq | wc3.41kB 0:00:00 [ 20kB/s] [48 96 432

    Data hand tools | 39

    http://www.macports.org/ports.phphttp://www.ivarch.com/programs/pv.shtmlhttp://twitter.com/dataspora
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    49/137

    Hadoop: What it is, how it works, and what it can do

    40 | Chapter 1:Data Science and Data Tools

    http://developer.yahoo.com/hadoop/http://en.wikipedia.org/wiki/Nutchhttp://labs.google.com/papers/mapreduce.htmlhttp://labs.google.com/papers/gfs.htmlhttp://strataconf.com/strata2011/public/schedule/speaker/5259?cmp=il-radar-st11-hadoop-olsonhttp://strataconf.com/strata2011/public/schedule/speaker/5259?cmp=il-radar-st11-hadoop-olsonhttp://www.cloudera.com/http://hadoop.apache.org/http://hadoop.apache.org/http://radar.oreilly.com/jamest/index.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    50/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    51/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    52/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    53/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    54/137

    Four free data tools for journalists (and snoops) | 45

    http://www.nytimes.com/2010/11/28/business/28borker.htmlhttp://www.nytimes.com/2010/11/28/business/28borker.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    55/137

    bit.ly

    46 | Chapter 1:Data Science and Data Tools

    http://backtype.com/http://bit.ly/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    56/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    57/137

    The quiet rise of machine learning

    48 | Chapter 1:Data Science and Data Tools

    http://www.orbitz.com/http://www.estar.org.uk/wiki/index.php/Main_Pagehttp://www.astro.ex.ac.uk/http://www.astro.ex.ac.uk/http://www.astro.ex.ac.uk/https://twitter.com/#!/aallan/http://oreilly.com/catalog/9780596806446/http://www.astro.ex.ac.uk/people/aa/http://www.teleread.com/paul-biba/goodreads-revs-up-a-book-recommendation-engine/http://www.discovereads.com/http://radar.oreilly.com/2011/02/watson-machine-learning.htmlhttp://radar.oreilly.com/jennw/index.htmlhttp://radar.oreilly.com/jennw/index.htmlhttp://techcrunch.com/2011/03/29/gmail-to-roll-out-ads-that-learn-from-your-inbox/http://www.google.com/http://www.slideshare.net/jseidman/real-world-machine-learning-at-orbitz-strata-2011http://www.orbitz.com/http://www.estar.org.uk/wiki/index.php/Main_Pagehttp://www.astro.ex.ac.uk/http://www.astro.ex.ac.uk/http://www.astro.ex.ac.uk/people/aa/http://oreilly.com/catalog/9780596806446/https://twitter.com/#!/aallan/http://www.discovereads.com/http://www.teleread.com/paul-biba/goodreads-revs-up-a-book-recommendation-engine/http://www.goodreads.com/http://radar.oreilly.com/2011/02/watson-machine-learning.htmlhttp://radar.oreilly.com/jennw/index.htmlhttp://web.mailana.com/labs/bigdataforjournalists.pdf
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    58/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    59/137

    50 | Chapter 1:Data Science and Data Tools

    http://strataconf.com/http://en.wikipedia.org/wiki/Sensor_nodehttp://www.youtube.com/watch?v=7zpl_DZC2-g&feature=player_embeddedhttp://strataconf.com/http://en.wikipedia.org/wiki/Sensor_node
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    60/137

    Where the semantic web stumbled, linked data willsucceed

    Where the semantic web stumbled, linked data will succeed | 51

    http://radar.oreilly.com/2009/05/google-rich-snippets-semantic-web.htmlhttp://opengraphprotocol.org/http://radar.oreilly.com/tylerb/index.htmlhttp://radar.oreilly.com/tylerb/index.htmlhttp://linkeddata.org/http://radar.oreilly.com/2009/05/google-rich-snippets-semantic-web.htmlhttp://radar.oreilly.com/2009/05/google-rich-snippets-semantic-web.htmlhttp://radar.oreilly.com/2010/05/facebook-open-graph-and-the-se.htmlhttp://opengraphprotocol.org/http://en.wikipedia.org/wiki/Holy_Roman_Empirehttp://radar.oreilly.com/tylerb/index.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    61/137

    52 | Chapter 1:Data Science and Data Tools

    http://en.wikipedia.org/wiki/Named_entity_recognitionhttp://en.wikipedia.org/wiki/Named_entity_recognitionhttp://developer.yahoo.com/search/boss/structureddata.htmlhttp://data.ordnancesurvey.co.uk/id/7000000000037256http://data.ordnancesurvey.co.uk/id/7000000000037256http://evan.prodromou.name/RDFa_vs_microformatshttp://evan.prodromou.name/RDFa_vs_microformatshttp://data.nytimes.com/http://blog.ordnancesurvey.co.uk/2010/11/linked-data-at-ordnance-survey/http://en.wikipedia.org/wiki/Named_entity_recognitionhttp://foursquare.com/venue/18645http://www.yelp.com/biz/cin-cin-wine-bar-los-gatos-2http://developer.yahoo.com/search/boss/structureddata.htmlhttp://developer.yahoo.com/search/boss/structureddata.htmlhttp://data.ordnancesurvey.co.uk/id/7000000000037256http://data.ordnancesurvey.co.uk/id/7000000000037256http://evan.prodromou.name/RDFa_vs_microformatshttp://evan.prodromou.name/RDFa_vs_microformatshttp://en.wikipedia.org/wiki/Hcardhttp://en.wikipedia.org/wiki/RDFahttp://www.google.com/support/webmasters/bin/answer.py?answer=176035http://www.factual.com/http://developer.yahoo.com/geo/geoplanet/data/http://blog.ordnancesurvey.co.uk/2010/11/linked-data-at-ordnance-survey/http://blog.ordnancesurvey.co.uk/2010/11/linked-data-at-ordnance-survey/http://data.nytimes.com/http://data.nytimes.com/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    62/137

    Where the semantic web stumbled, linked data will succeed | 53

    http://www.bbc.co.uk/blogs/bbcinternet/2010/07/the_world_cup_and_a_call_to_ac.htmlhttp://www.insidefacebook.com/2010/11/09/aggregated-mentions-machine-reading/http://techcrunch.com/2010/10/27/aro-mobile/http://www.guardian.co.uk/open-platform/blog/linked-data-open-platformhttp://developer.yahoo.com/geo/geoplanet/guide/api-reference.html#api-concordancehttp://en.wikipedia.org/wiki/HCardhttp://developer.yahoo.com/geo/geoplanet/guide/api-reference.html#api-concordancehttp://blog.placecast.net/post/489490648/opening-the-placecast-match-apihttp://www.insidefacebook.com/2010/11/09/aggregated-mentions-machine-reading/http://www.bbc.co.uk/blogs/bbcinternet/2010/07/the_world_cup_and_a_call_to_ac.htmlhttp://www.bbc.co.uk/blogs/bbcinternet/2010/07/the_world_cup_and_a_call_to_ac.htmlhttp://techcrunch.com/2010/10/27/aro-mobile/http://www.guardian.co.uk/open-platform/blog/linked-data-open-platformhttp://developer.yahoo.com/geo/geoplanet/guide/api-reference.html#api-concordancehttp://developer.yahoo.com/geo/geoplanet/guide/api-reference.html#api-concordancehttp://en.wikipedia.org/wiki/HCardhttp://blog.placecast.net/post/489490648/opening-the-placecast-match-apihttp://gigaom.com/2010/05/07/the-great-open-database-of-place-pages-in-the-sky/http://gigaom.com/2010/05/07/the-great-open-database-of-place-pages-in-the-sky/http://viewer.opencalais.com/http://viewer.opencalais.com/http://www.headup.com/http://www.headup.com/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    63/137

    Social data is an oracle waiting for a question

    54 | Chapter 1:Data Science and Data Tools

    http://radar.oreilly.com/mslocum/index.htmlhttps://en.oreilly.com/where2011/public/regwith/whr11rad?cmp=il-radar-wh11-russell-social-datahttp://oreilly.com/catalog/0636920010203/http://twitter.com/ptwobrussellhttp://www.datameer.com/index.htmlhttp://www.needlebase.com/http://www.needlebase.com/http://aws.amazon.com/publicdatasets/http://radar.oreilly.com/2011/02/google-data-explorer.htmlhttp://radar.oreilly.com/mslocum/index.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    64/137

    Social data is an oracle waiting for a question | 55

    http://www.infochimps.com/http://gnip.com/https://en.oreilly.com/where2011/public/regwith/whr11rad?cmp=il-radar-wh11-russell-social-datahttp://gnip.com/http://www.infochimps.com/http://www.infochimps.com/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    65/137

    The challenges of streaming real-time data

    56 | Chapter 1:Data Science and Data Tools

    http://radar.oreilly.com/audreyw/index.htmlhttp://github.com/ptwobrussell/Mining-the-Social-Webhttp://radar.oreilly.com/audreyw/index.htmlhttp://github.com/ptwobrussell/Mining-the-Social-Webhttps://en.oreilly.com/where2011/public/regwith/whr11rad?cmp=il-radar-wh11-russell-social-data
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    66/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    67/137

    58 | Chapter 1:Data Science and Data Tools

    https://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-gnip-realtime-datahttps://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-gnip-realtime-datahttps://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-gnip-realtime-datahttps://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-gnip-realtime-data
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    68/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    69/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    70/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    71/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    72/137

    Theres no definition

    Time for the community to rally

    Why you cant really anonymize your data

    Why you cant really anonymize your data | 63

    http://radar.oreilly.com/petew/index.htmlhttp://radar.oreilly.com/petew/index.htmlhttp://www.datasciencetoolkit.org/http://www.datasciencetoolkit.org/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    73/137

    64 | Chapter 2:Data Issues

    http://33bits.org/2011/03/09/link-prediction-by-de-anonymization-how-we-won-the-kaggle-social-network-challenge/http://33bits.org/2011/03/09/link-prediction-by-de-anonymization-how-we-won-the-kaggle-social-network-challenge/http://www.kaggle.com/http://33bits.org/about/netflix-paper-home-page/http://33bits.org/about/netflix-paper-home-page/http://33bits.org/2011/03/09/link-prediction-by-de-anonymization-how-we-won-the-kaggle-social-network-challenge/http://33bits.org/2011/03/09/link-prediction-by-de-anonymization-how-we-won-the-kaggle-social-network-challenge/http://33bits.org/2011/03/09/link-prediction-by-de-anonymization-how-we-won-the-kaggle-social-network-challenge/http://www.kaggle.com/http://33bits.org/about/netflix-paper-home-page/http://33bits.org/about/netflix-paper-home-page/http://33bits.org/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    74/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    75/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    76/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    77/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    78/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    79/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    80/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    81/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    82/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    83/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    84/137

    CHAPTER 3

    The Application of Data: Products

    and Processes

    How the Library of Congress is building the Twitterarchive

    75

    http://blog.twitter.com/2010/04/tweet-preservation.htmlhttps://twitter.com/#!/BarackObama/status/1389362776http://bits.blogs.nytimes.com/2010/01/22/first-tweet-from-space/http://bits.blogs.nytimes.com/2010/01/22/first-tweet-from-space/http://blog.twitter.com/2010/04/tweet-preservation.htmlhttp://radar.oreilly.com/audreyw/index.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    85/137

    76 | Chapter 3:The Application of Data: Products and Processes

    http://www.loc.gov/folklife/https://groups.google.com/forum/#!topic/twitter-development-talk/Gs2VT4oE-oQ/overview
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    86/137

    How the Library of Congress is building the Twitter archive | 77

    http://www.archive.org/details/301workshttp://mehack.com/map-of-a-twitter-status-objecthttp://www.gnip.com/http://blog.twitter.com/2011/03/numbers.htmlhttp://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-loc-twitterhttp://www.archive.org/details/301workshttp://mehack.com/map-of-a-twitter-status-objecthttp://www.gnip.com/http://blog.twitter.com/2011/03/numbers.htmlhttp://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-loc-twitterhttp://www.oscon.com/oscon2011?cmp=il-radar-os11-loc-twitterhttps://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-loc-twitter
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    87/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    88/137

    Data journalism and data tools

    Data journalism, data tools, and the newsroom stack | 79

    http://radar.oreilly.com/2010/12/data-journalism.htmlhttp://www.knightfoundation.org/press-room/press-release/knight-foundation-media-innovation-contest-announc/http://gigaom.com/2011/06/22/future-of-media-when-big-data-meets-journalism/http://gigaom.com/2011/06/22/future-of-media-when-big-data-meets-journalism/http://radar.oreilly.com/2010/12/data-journalism.htmlhttp://radar.oreilly.com/2010/12/data-journalism.htmlhttp://www.knightfoundation.org/press-room/press-release/knight-foundation-media-innovation-contest-announc/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    89/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    90/137

    The newsroom stack

    Data journalism, data tools, and the newsroom stack | 81

    http://www.youtube.com/watch?v=CaXWWuNDHgE&feature=player_embeddedhttp://jonathanstray.com/the-editorial-search-enginehttp://www.niemanlab.org/2011/06/the-news-challenge-winning-panda-project-aims-to-make-research-easier-in-the-newsroom/https://docs.google.com/present/view?id=dft4sbfd_71fgd4fpg3
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    91/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    92/137

    The data analysis path is built on curiosity, followed by

    action

    The data analysis path is built on curiosity, followed by action | 83

    http://radar.oreilly.com/mslocum/index.htmlhttp://oreilly.com/catalog/9781449389796/http://oreilly.com/catalog/9781449389796/http://www.oreillynet.com/pub/au/933http://radar.oreilly.com/mslocum/index.htmlhttp://www.flickr.com/photos/blprnt/3291244820/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    93/137

    84 | Chapter 3:The Application of Data: Products and Processes

    http://oreilly.com/catalog/9781449389796/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    94/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    95/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    96/137

    How data and analytics can improve education | 87

    http://www.open.ac.uk/http://research.uow.edu.au/learningnetworks/seeing/snapp/index.htmlhttp://google.com/analyticshttp://piwik.org/http://www.moodle.org/http://desire2learn.com/http://www.open.ac.uk/http://research.uow.edu.au/learningnetworks/seeing/snapp/index.htmlhttp://piwik.org/http://google.com/analyticshttp://desire2learn.com/http://www.moodle.org/http://www.athabascau.ca/http://www.elearnspace.org/blog/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    97/137

    88 | Chapter 3:The Application of Data: Products and Processes

    https://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-siemens-education-datahttps://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-siemens-education-datahttp://en.wikipedia.org/wiki/Hawthorne_effecthttps://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-siemens-education-datahttps://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-siemens-education-datahttp://radian6.com/http://klout.com/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    98/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    99/137

    90 | Chapter 3:The Application of Data: Products and Processes

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    100/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    101/137

    Data science is a pipeline between academic disciplines

    92 | Chapter 3:The Application of Data: Products and Processes

    http://strataconf.com/stratany2011/public/schedule/speaker/104414?cmp=il-radar-st11-drew-conway-data-science-academichttp://strataconf.com/stratany2011?cmp=il-radar-st11-drew-conway-data-science-academichttp://strataconf.com/stratany2011?cmp=il-radar-st11-drew-conway-data-science-academichttp://strataconf.com/stratany2011/public/schedule/speaker/104414?cmp=il-radar-st11-drew-conway-data-science-academichttp://twitter.com/drewconwayhttp://www.drewconway.com/Drew_Conway/About.htmlhttp://radar.oreilly.com/audreyw/index.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    102/137

    Data science is a pipeline between academic disciplines | 93

    http://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-drew-conway-data-science-academichttp://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-drew-conway-data-science-academichttp://themonkeycage.org/http://themonkeycage.org/http://oreilly.com/python/http://oreilly.com/python/http://oreilly.com/catalog/9780596801717
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    103/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    104/137

    Data science is a pipeline between academic disciplines | 95

    https://www.mturk.com/mturk/welcomehttps://www.mturk.com/mturk/welcomehttp://en.wikipedia.org/wiki/Institutional_review_boardhttp://en.wikipedia.org/wiki/Institutional_review_boardhttp://orda.siuc.edu/human/http://orda.siuc.edu/human/https://www.mturk.com/mturk/welcomehttp://en.wikipedia.org/wiki/Institutional_review_boardhttp://en.wikipedia.org/wiki/Institutional_review_boardhttp://orda.siuc.edu/human/http://radar.oreilly.com/2011/02/big-data-metaphor.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    105/137

    Big data and open source unlock genetic secrets

    96 | Chapter 3:The Application of Data: Products and Processes

    http://www.benaroyaresearch.org/http://www.benaroyaresearch.org/http://www.oscon.com/oscon2011/public/schedule/speaker/109459?cmp=il-radar-os11-charlie-quinn-data-geneshttp://radar.oreilly.com/2011/04/fcc-website-reboot-open-source-cloud.htmlhttp://radar.oreilly.com/gov2/http://strataconf.com/http://strataconf.com/http://www.economist.com/node/15557443?story_id=15557443http://www.oscon.com/?cmp=il-radar-os11-charlie-quinn-data-geneshttp://www.oscon.com/oscon2011/public/schedule/detail/19186?cmp=il-radar-os11-charlie-quinn-data-geneshttp://www.oscon.com/oscon2011/public/schedule/detail/19186?cmp=il-radar-os11-charlie-quinn-data-geneshttp://www.benaroyaresearch.org/http://www.benaroyaresearch.org/http://www.oscon.com/oscon2011/public/schedule/speaker/109459?cmp=il-radar-os11-charlie-quinn-data-geneshttp://www.huffingtonpost.com/alexander-howard/first-international-open-_b_784440.htmlhttp://www.huffingtonpost.com/alexander-howard/first-international-open-_b_784440.htmlhttp://radar.oreilly.com/2011/04/fcc-website-reboot-open-source-cloud.htmlhttp://radar.oreilly.com/gov2/http://strataconf.com/http://www.economist.com/node/15557443?story_id=15557443http://radar.oreilly.com/alexh/index.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    106/137

    Big data and open source unlock genetic secrets | 97

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    107/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    108/137

    Big data and open source unlock genetic secrets | 99

    http://www.flickr.com/photos/jurvetson/3351973835/http://www.flickr.com/photos/jurvetson/3351973835/http://www.nih.gov/http://www.flickr.com/photos/jurvetson/3351973835/http://www.nih.gov/http://www.pubnet.org/http://www.oscon.com/oscon2011/public/schedule/detail/19186?cmp=il-radar-os11-charlie-quinn-data-geneshttp://www.oscon.com/oscon2011/public/schedule/detail/19186?cmp=il-radar-os11-charlie-quinn-data-genes
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    109/137

    Visualization deconstructed: Mapping Facebooksfriendships

    Mapping Facebooks friendships

    100 | Chapter 3:The Application of Data: Products and Processes

    http://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919http://radar.oreilly.com/2011/01/visualization-mapping-america.htmlhttp://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919http://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919http://paulbutler.org/http://radar.oreilly.com/2011/01/visualization-mapping-america.htmlhttp://radar.oreilly.com/sebastienp/index.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    110/137

    Visualization deconstructed: Mapping Facebooks friendships | 101

    http://apod.nasa.gov/apod/ap001127.htmlhttp://apod.nasa.gov/apod/ap001127.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    111/137

    102 | Chapter 3:The Application of Data: Products and Processes

    http://strataconf.com/?cmp=il-radar-st11-viz-facebook-friendshttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-viz-facebook-friendshttp://strataconf.com/?cmp=il-radar-st11-viz-facebook-friends
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    112/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    113/137

    104 | Chapter 3:The Application of Data: Products and Processes

    http://twitter.com/#search?q=%23teapartyhttp://twitter.com/#search?q=%23teapartyhttp://twitter.com/#search?q=%23teapartyhttp://twitter.com/#search?q=%23justinbieberhttp://aws.amazon.com/ec2/http://www.datameer.com/about/management.htmlhttp://analytics.google.com/http://www.phpmyadmin.net/home_page/index.phphttp://www.datameer.com/http://www.datameer.com/http://radar.oreilly.com/2010/06/what-is-data-science.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    114/137

    Data science democratized | 105

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    115/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    116/137

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    117/137

    108 | Chapter 4:The Business of Data

    https://en.oreilly.com/jumpstart2011/public/regwith/stj11rad?cmp=il-radar-st11-alistair_croll_bigdata_081011https://en.oreilly.com/jumpstart2011/public/regwith/stj11rad?cmp=il-radar-st11-alistair_croll_bigdata_081011https://en.oreilly.com/jumpstart2011/public/regwith/stj11rad?cmp=il-radar-st11-alistair_croll_bigdata_081011https://en.oreilly.com/jumpstart2011/public/regwith/stj11rad?cmp=il-radar-st11-alistair_croll_bigdata_081011https://en.oreilly.com/jumpstart2011/public/regwith/stj11rad?cmp=il-radar-st11-alistair_croll_bigdata_081011
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    118/137

    Big data and the innovators dilemma

    Theres no such thing as big data | 109

    http://online.wsj.com/article/SB10001424053111903885604576486330882679982.htmlhttp://online.wsj.com/article/SB10001424053111903885604576486330882679982.htmlhttp://online.wsj.com/article/SB10001424053111903885604576486330882679982.htmlhttp://www.mckinsey.com/mgi/publications/big_data/index.asphttp://en.wikipedia.org/wiki/Eureka_(word)http://ideas.economist.com/event/information-2011http://ideas.economist.com/event/information-2011http://online.wsj.com/article/SB10001424053111903885604576486330882679982.htmlhttp://online.wsj.com/article/SB10001424053111903885604576486330882679982.htmlhttp://www.mckinsey.com/mgi/publications/big_data/index.asphttp://en.wikipedia.org/wiki/Eureka_(word)http://ideas.economist.com/event/information-2011http://en.wikipedia.org/wiki/Disruptive_technology
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    119/137

    Building data startups: Fast, big, and focused

    Setting the stage: The attack of the exponentials

    110 | Chapter 4:The Business of Data

    http://www.slideshare.net/medriscoll/driscoll-strata-buildingdatastartups25may2011cleanhttp://strataconf.com/strata-may2011/public/schedule/detail/20623http://strataconf.com/strata-may2011/public/schedule/detail/20623http://www.slideshare.net/medriscoll/driscoll-strata-buildingdatastartups25may2011cleanhttp://radar.oreilly.com/michaeld/index.htmlhttp://bit.ly/jumpstart-AC
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    120/137

    Leveraging the big data stack

    Building data startups: Fast, big, and focused | 111

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    121/137

    Fast data

    112 | Chapter 4:The Business of Data

    http://radar.oreilly.com/2011/01/what-is-hadoop.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    122/137

    Big analytics

    Building data startups: Fast, big, and focused | 113

    http://www.accenture.com/us-en/Pages/index.aspxhttp://www.greenplum.com/http://www.dbms2.com/2011/05/23/databases-ram/http://www.accenture.com/us-en/Pages/index.aspxhttp://www.netezza.com/http://hbase.apache.org/http://labs.google.com/papers/bigtable.htmlhttp://labs.google.com/papers/bigtable.htmlhttp://www.postgresql.org/http://www.greenplum.com/http://www.dbms2.com/2011/05/23/databases-ram/http://www.mapr.com/http://www.fusionio.com/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    123/137

    Focused services

    114 | Chapter 4:The Business of Data

    http://www.metamarketsgroup.com/http://www.metamarketsgroup.com/http://klout.com/homehttp://www.news.me/http://flipboard.com/http://www.billguard.com/http://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-driscoll-data-startupshttp://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-driscoll-data-startupshttp://www.mckinsey.com/mgi/publications/big_data/index.asp
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    124/137

    Democratizing big data

    Data markets arent coming: Theyre already here

    Data markets arent coming: Theyre already here | 115

    http://strataconf.com/strata2011/public/schedule/detail/17604http://strataconf.com/strata2011/public/schedule/speaker/26?cmp=il-radar-st11-valeskihttp://strataconf.com/strata2011/public/schedule/detail/17602?cmp=il-radar-st11-valeskihttp://strataconf.com/strata2011/?cmp=il-radar-st11-valeskihttp://strataconf.com/strata2011/public/schedule/detail/17602?cmp=il-radar-st11-valeskihttp://www.delicious.com/http://twitter.com/http://www.facebook.com/http://twitter.com/#!/jvaleskihttp://gnip.com/http://radar.oreilly.com/julies/index.htmlhttp://infochimps.com/http://strataconf.com/strata2011/public/schedule/speaker/107129?cmp=il-radar-st11-valeskihttps://datamarket.azure.com/http://strataconf.com/strata2011/public/schedule/speaker/50595?cmp=il-radar-st11-valeskihttp://thomsonreuters.com/http://strataconf.com/strata2011/public/schedule/speaker/104234?cmp=il-radar-st11-valeskihttp://urbanmapping.com/http://urbanmapping.com/http://strataconf.com/strata2011/public/schedule/speaker/26?cmp=il-radar-st11-valeskihttp://strataconf.com/strata2011/public/schedule/detail/17604http://strataconf.com/strata2011/public/schedule/detail/17602?cmp=il-radar-st11-valeskihttp://strataconf.com/strata2011/public/schedule/detail/17602?cmp=il-radar-st11-valeskihttp://strataconf.com/strata2011/?cmp=il-radar-st11-valeskihttp://www.delicious.com/http://www.delicious.com/http://www.flickr.com/http://www.facebook.com/http://twitter.com/http://gnip.com/http://twitter.com/#!/jvaleskihttp://radar.oreilly.com/julies/index.htmlhttp://aboutfoursquare.com/foursquare-explains-how-explore-came-to-be/http://www.linkedin.com/answers/technology/information-technology/information-storage/TCH_ITS_IST/59136-2897253http://www.linkedin.com/answers/technology/information-technology/information-storage/TCH_ITS_IST/59136-2897253
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    125/137

    116 | Chapter 4:The Business of Data

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    126/137

    Data markets arent coming: Theyre already here | 117

    http://gnip.com/twitter/decahosehttp://gnip.com/twitter/halfhosehttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttp://dev.twitter.com/pages/streaming_api_concepts#samplinghttp://dev.twitter.com/pages/streaming_api_concepts#samplinghttp://gnip.com/twitter/spritzerhttp://gnip.com/twitter/halfhosehttp://gnip.com/twitter/decahosehttps://en.oreilly.com/strata2011/public/register?cmp=il-radar-st11-valeskihttp://strataconf.com/?cmp=il-radar-st11-valeski
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    127/137

    118 | Chapter 4:The Business of Data

    http://en.wikipedia.org/wiki/Customer_relationship_managementhttp://en.wikipedia.org/wiki/Botnethttp://en.wikipedia.org/wiki/Customer_relationship_managementhttp://en.wikipedia.org/wiki/Botnethttp://radar.oreilly.com/2010/10/the-black-market-for-data.htmlhttp://en.wikipedia.org/wiki/Value-added_reseller
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    128/137

    An iTunes model for data

    An iTunes model for data | 119

    http://www.web2expo.com/webexsf2011/public/schedule/detail/16684http://twitter.com/gilelbazhttp://www.factual.com/http://radar.oreilly.com/audreyw/index.htmlhttp://oreilly.com/catalog/9780596157128http://www.web2expo.com/webexsf2011/public/schedule/detail/16684http://www.factual.com/http://twitter.com/gilelbazhttp://radar.oreilly.com/audreyw/index.htmlhttp://oreilly.com/catalog/9780596157128
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    129/137

    120 | Chapter 4:The Business of Data

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    130/137

    An iTunes model for data | 121

    http://www.flickr.com/photos/ivanwalsh/5187183980/http://www.youtube.com/watch?v=X9RErxDRVW4http://www.flickr.com/photos/ivanwalsh/5187183980/http://www.flickr.com/photos/ivanwalsh/5187183980/http://www.youtube.com/watch?v=X9RErxDRVW4http://www.database.com/
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    131/137

    Data is a currency

    122 | Chapter 4:The Business of Data

    http://www.bloomberg.com/solutions/http://thomsonreuters.com/products_services/financial/financial_products/a-z/data_feeds/http://radar.oreilly.com/edd/index.htmlhttp://twitter.com/lockerprojecthttp://www.infochimps.com/http://thomsonreuters.com/products_services/financial/financial_products/a-z/data_feeds/http://www.bloomberg.com/solutions/http://radar.oreilly.com/edd/index.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    132/137

    Big data: An opportunity in search of a metaphor

    Big data: An opportunity in search of a metaphor | 123

    http://strataconf.com/strata2011http://radar.oreilly.com/tylerb/index.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    133/137

    124 | Chapter 4:The Business of Data

  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    134/137

    Data and the human-machine connection

    Data and the human-machine connection | 125

    http://www.pcworld.com/article/235846/as_twitter_turns_5_it_delivers_350_billion_tweets_each_day.htmlhttp://www.operasolutions.com/index.htmlhttp://www.operasolutions.com/profile_arnab_gupta.htmlhttp://radar.oreilly.com/julies/index.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    135/137

    126 | Chapter 4:The Business of Data

    http://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-gupta-interviewhttp://strataconf.com/public/content/landing?_discount=strata&cmp=il-radar-st11-gupta-interview
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    136/137

    Data and the human-machine connection | 127

    http://www-03.ibm.com/innovation/us/watson/index.htmlhttp://www-03.ibm.com/innovation/us/watson/index.html
  • 7/31/2019 Big Data Now Current Perspectives From OReilly Radar Copy

    137/137

    http://www.flickr.com/photos/pdenker/74684051/http://www.flickr.com/photos/pdenker/74684051/