Microservices, Containers, and Machine Learning

61
Microservices, Containers, and Machine Learning Paco Nathan, @pacoid

Transcript of Microservices, Containers, and Machine Learning

  • Microservices, Containers, and Machine Learning

    Paco Nathan, @pacoid

  • Downloads

  • oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

    follow the license agreement instructions

    then click the download for your OS

    need JDK instead of JRE (for Maven, etc.)

    JDK 6, 7, 8 is fine

    Downloads: Java JDK

  • For Python 2.7, check out Anaconda by Continuum Analytics for a full-featured platform:

    store.continuum.io/cshop/anaconda/

    Downloads: Python

  • Lets get started using Apache Spark, in just a few easy steps Download code from:

    databricks.com/spark-training-resources#itas

    or for a fallback: spark.apache.org/downloads.html

    !

    Also, the GitHub project:

    github.com/ceteri/spark-exercises/tree/master/exsto

    Downloads: Spark

  • Connect into the inflated spark directory, then run:

    ./bin/spark-shell!

    Downloads: Spark

  • Spark Deconstructed

  • // load error messages from a log into memory!// then interactively search for various patterns!// https://gist.github.com/ceteri/8ae5b9509a08c08a1132!!// base RDD!val lines = sc.textFile("hdfs://...")!!// transformed RDDs!val errors = lines.filter(_.startsWith("ERROR"))!val messages = errors.map(_.split("\t")).map(r => r(1))!messages.cache()!!// action 1!messages.filter(_.contains("mysql")).count()!!// action 2!messages.filter(_.contains("php")).count()

    Spark Deconstructed: Log Mining Example

  • Driver

    Worker

    Worker

    Worker

    Spark Deconstructed: Log Mining Example

    We start with Spark running on a cluster submitting code to be evaluated on it:

  • // base RDD!val lines = sc.textFile("hdfs://...")!!// transformed RDDs!val errors = lines.filter(_.startsWith("ERROR"))!val messages = errors.map(_.split("\t")).map(r => r(1))!messages.cache()!!// action 1!messages.filter(_.contains("mysql")).count()!!// action 2!messages.filter(_.contains("php")).count()

    Spark Deconstructed: Log Mining Example

    discussing the other part

  • Spark Deconstructed: Log Mining Example

    scala> messages.toDebugString!res5: String = !MappedRDD[4] at map at :16 (3 partitions)! MappedRDD[3] at map at :16 (3 partitions)! FilteredRDD[2] at filter at :14 (3 partitions)! MappedRDD[1] at textFile at :12 (3 partitions)! HadoopRDD[0] at textFile at :12 (3 partitions)

    At this point, take a look at the transformed RDD operator graph:

  • Driver

    Worker

    Worker

    Worker

    Spark Deconstructed: Log Mining Example

    // base RDD!val lines = sc.textFile("hdfs://...")!!// transformed RDDs!val errors = lines.filter(_.startsWith("ERROR"))!val messages = errors.map(_.split("\t")).map(r => r(1))!messages.cache()!!// action 1!messages.filter(_.contains("mysql")).count()!!// action 2!messages.filter(_.contains("php")).count()discussing the other part

  • Driver

    Worker

    Worker

    Worker

    block 1

    block 2

    block 3

    Spark Deconstructed: Log Mining Example

    // base RDD!val lines = sc.textFile("hdfs://...")!!// transformed RDDs!val errors = lines.filter(_.startsWith("ERROR"))!val messages = errors.map(_.split("\t")).map(r => r(1))!messages.cache()!!// action 1!messages.filter(_.contains("mysql")).count()!!// action 2!messages.filter(_.contains("php")).count()discussing the other part

  • Driver

    Worker

    Worker

    Worker

    block 1

    block 2

    block 3

    Spark Deconstructed: Log Mining Example

    // base RDD!val lines = sc.textFile("hdfs://...")!!// transformed RDDs!val errors = lines.filter(_.startsWith("ERROR"))!val messages = errors.map(_.split("\t")).map(r => r(1))!messages.cache()!!// action 1!messages.filter(_.contains("mysql")).count()!!// action 2!messages.filter(_.contains("php")).count()discussing the other part

  • Driver

    Worker

    Worker

    Worker

    block 1

    block 2

    block 3

    readHDFSblock

    readHDFSblock

    readHDFSblock

    Spark Deconstructed: Log Mining Example

    // base RDD!val lines = sc.textFile("hdfs://...")!!// transformed RDDs!val errors = lines.filter(_.startsWith("ERROR"))!val messages = errors.map(_.split("\t")).map(r => r(1))!messages.cache()!!// action 1!messages.filter(_.contains("mysql")).count()!!// action 2!messages.filter(_.contains("php")).count()discussing the other part

  • Driver

    Worker

    Worker

    Worker

    block 1

    block 2

    block 3

    cache 1

    cache 2

    cache 3

    process,cache data

    process,cache data

    process,cache data

    Spark Deconstructed: Log Mining Example

    // base RDD!val lines = sc.textFile("hdfs://...")!!// transformed RDDs!val errors = lines.filter(_.startsWith("ERROR"))!val messages = errors.map(_.split("\t")).map(r => r(1))!messages.cache()!!// action 1!messages.filter(_.contains("mysql")).count()!!// action 2!messages.filter(_.contains("php")).count()discussing the other part

  • Driver

    Worker

    Worker

    Worker

    block 1

    block 2

    block 3

    cache 1

    cache 2

    cache 3

    Spark Deconstructed: Log Mining Example

    // base RDD!val lines = sc.textFile("hdfs://...")!!// transformed RDDs!val errors = lines.filter(_.startsWith("ERROR"))!val messages = errors.map(_.split("\t")).map(r => r(1))!messages.cache()!!// action 1!messages.filter(_.contains("mysql")).count()!!// action 2!messages.filter(_.contains("php")).count()discussing the other part

  • // base RDD!val lines = sc.textFile("hdfs://...")!!// transformed RDDs!val errors = lines.filter(_.startsWith("ERROR"))!val messages = errors.map(_.split("\t")).map(r => r(1))!messages.cache()!!// action 1!messages.filter(_.contains("mysql")).count()!!// action 2!messages.filter(_.contains("php")).count()

    Driver

    Worker

    Worker

    Worker

    block 1

    block 2

    block 3

    cache 1

    cache 2

    cache 3

    Spark Deconstructed: Log Mining Example

    discussing the other part

  • Driver

    Worker

    Worker

    Worker

    block 1

    block 2

    block 3

    cache 1

    cache 2

    cache 3

    processfrom cache

    processfrom cache

    processfrom cache

    Spark Deconstructed: Log Mining Example

    // base RDD!val lines = sc.textFile("hdfs://...")!!// transformed RDDs!val errors = lines.filter(_.startsWith("ERROR"))!val messages = errors.map(_.split("\t")).map(r => r(1))!messages.cache()!!// action 1!messages.filter(_.contains(mysql")).count()!!// action 2!messages.filter(_.contains("php")).count()

    discussing the other part

  • Driver

    Worker

    Worker

    Worker

    block 1

    block 2

    block 3

    cache 1

    cache 2

    cache 3

    Spark Deconstructed: Log Mining Example

    // base RDD!val lines = sc.textFile("hdfs://...")!!// transformed RDDs!val errors = lines.filter(_.startsWith("ERROR"))!val messages = errors.map(_.split("\t")).map(r => r(1))!messages.cache()!!// action 1!messages.filter(_.contains(mysql")).count()!!// action 2!messages.filter(_.contains("php")).count()

    discussing the other part

  • GraphX

  • GraphX

    spark.apache.org/docs/latest/graphx-programming-guide.html

    !Key Points:

    ! graph-parallel systems

    importance of workflows

    optimizations

  • PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs J. Gonzalez, Y. Low, H. Gu, D. Bickson, C. Guestrin graphlab.org/files/osdi2012-gonzalez-low-gu-bickson-guestrin.pdf

    Pregel: Large-scale graph computing at Google Grzegorz Czajkowski, et al. googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html

    GraphX: Unified Graph Analytics on Spark Ankur Dave, Databricksdatabricks-training.s3.amazonaws.com/slides/graphx@sparksummit_2014-07.pdf

    Advanced Exercises: GraphX databricks-training.s3.amazonaws.com/graph-analytics-with-graphx.html

    GraphX

  • // http://spark.apache.org/docs/latest/graphx-programming-guide.html!!import org.apache.spark.graphx._!import org.apache.spark.rdd.RDD!!case class Peep(name: String, age: Int)!!val nodeArray = Array(! (1L, Peep("Kim", 23)), (2L, Peep("Pat", 31)),! (3L, Peep("Chris", 52)), (4L, Peep("Kelly", 39)),! (5L, Peep("Leslie", 45))! )!val edgeArray = Array(! Edge(2L, 1L, 7), Edge(2L, 4L, 2),! Edge(3L, 2L, 4), Edge(3L, 5L, 3),! Edge(4L, 1L, 1), Edge(5L, 3L, 9)! )!!val nodeRDD: RDD[(Long, Peep)] = sc.parallelize(nodeArray)!val edgeRDD: RDD[Edge[Int]] = sc.parallelize(edgeArray)!val g: Graph[Peep, Int] = Graph(nodeRDD, edgeRDD)!!val results = g.triplets.filter(t => t.attr > 7)!!for (triplet
  • TextRank Demo:

    !cdn.liber118.com/spark/ipynb/textrank/PySparkTextRank.ipynb

    !IPYTHON_OPTS="notebook --pylab inline" ./bin/pyspark!

    GraphX: demo

  • Workflows

  • evaluationoptimizationrepresentationcirca 2010

    ETL into cluster/cloud

    datadata

    visualize,reporting

    Data Prep

    Features

    Learners, Parameters

    UnsupervisedLearning

    Explore

    train set

    test set

    models

    Evaluate

    Optimize

    Scoringproduction

    datause

    cases

    data pipelines

    actionable resultsdecisions, feedback

    bar developers

    foo algorithms

    Typical Workflows:

  • Workflows: Scraper pipeline

    Typical data rates, e.g., for [email protected]:

    ~2K msgs/month

    ~6 MB as JSON

    ~13 MB parsed

    Three months list activity represents a graph of:

    1061 senders

    753,400 nodes

    1,027,806 edges

    A big graph! However, it satisfies definition for a graph-parallel system; lots of data locality to leverage

  • Workflows: A Few Notes about Microservices and Containers

    The Strengths and Weaknesses of Microservices Abel Avram http://www.infoq.com/news/2014/05/microservices

    DockerCon EU Keynote: State of the Art in Microservices Adrian Cockcroft https://blog.docker.com/2014/12/dockercon-europe-keynote-state-of-the-art-in-microservices-by-adrian-cockcroft-battery-ventures/

    Microservices ArchitectureMartin Fowler http://martinfowler.com/articles/microservices.html

  • Workflows: An Example

    Python-based service in a Docker container?

    Just Enough Math, IPython+DockerPaco Nathan, Andrew Odewahn, Kyle Kellyhttps://github.com/ceteri/jem-dockerhttps://registry.hub.docker.com/u/ceteri/jem/

    Docker JumpstartAndrew Odewahn http://odewahn.github.io/docker-jumpstart/

  • Workflows: A Brief Note about ETL in SparkSQL

    Spark SQL Data Sources API: Unified Data Access for the Spark Platform Michael Armbrust databricks.com/blog/2015/01/09/spark-sql-data-sources-api-unified-data-access-for-the-spark-platform.html

  • This Workflow: Microservices meet Parallel Processing

    services

    emailarchives community

    leaderboards

    SparkSQLData Prep

    Features

    Explore

    Scraper /Parser

    NLTKdata Unique

    Word IDs

    TextRank, Word2Vec,

    etc.

    communityinsights

    not so big data relatively big compute

  • Workflows: Scraper pipeline

    messageJSON

    Py

    filter quotedcontent

    Apacheemail listarchive

    urllib2

    crawl monthly list

    by date

    Py

    segmentparagraphs

  • Workflows: Scraper pipeline

    messageJSON

    Py

    filter quotedcontent

    Apacheemail listarchive

    urllib2

    crawl monthly list

    by date

    Py

    segmentparagraphs

    {! "date": "2014-10-01T00:16:08+00:00",! "id": "CA+B-+fyrBU1yGZAYJM_u=gnBVtzB=sXoBHkhmS-6L1n8K5Hhbw",! "next_thread": "CALEj8eP5hpQDM=p2xryL-JT-x_VhkRcD59Q+9Qr9LJ9sYLeLVg",! "next_url": "http://mail-archives.apache.org/mod_mbox/spark-user/201410.mbox/%3cCALEj8eP5hpQDM=p2xryL-JT-x_VhkRcD59Q+9Qr9LJ9sYLeLVg@mail.gmail.com%3e",! "prev_thread": "",! "sender": "Debasish Das ",! "subject": "Re: memory vs data_size",! "text": "\nOnly fit the data in memory where you want to run the iterative\nalgorithm....\n\nFor map-reduce operations, it's better not to cache if !}

  • TextBlob

    tag and lemmatize

    words

    TextBlob

    segment sentences

    TextBlob

    sentimentanalysis

    Py

    generate skip-grams

    parsedJSON

    messageJSON Treebank,

    WordNet

    Workflows: Parser pipeline

  • TextBlob

    tag and lemmatize

    words

    TextBlob

    segment sentences

    TextBlob

    sentimentanalysis

    Py

    generate skip-grams

    parsedJSON

    messageJSON Treebank,

    WordNet

    Workflows: Parser pipeline

    {! "graf": [ [1, "Only", "only", "RB", 1, 0], [2, "fit", "fit", "VBP", 1, 1 ] ... ],! "id": CA+B-+fyrBU1yGZAYJM_u=gnBVtzB=sXoBHkhmS-6L1n8K5Hhbw",! "polr": 0.2,! "sha1": "178b7a57ec6168f20a8a4f705fb8b0b04e59eeb7",! "size": 14,! "subj": 0.7,! "tile": [ [1, 2], [2, 3], [3, 4] ... ]! ]!}

    {! "date": "2014-10-01T00:16:08+00:00",! "id": "CA+B-+fyrBU1yGZAYJM_u=gnBVtzB=sXoBHkhmS-6L1n8K5Hhbw",! "next_thread": "CALEj8eP5hpQDM=p2xryL-JT-x_VhkRcD59Q+9Qr9LJ9sYLeLVg",! "next_url": "http://mail-archives.apache.org/mod_mbox/spark-user/201410.mbox/%3cCALEj8eP5hpQDM=p2xryL-JT-x_VhkRcD59Q+9Qr9LJ9sYLeLVg@mail.gmail.com%3e",! "prev_thread": "",! "sender": "Debasish Das ",! "subject": "Re: memory vs data_size",! "text": "\nOnly fit the data in memory where you want to run the iterative\nalgorithm....\n\nFor map-reduce operations, it's better not to cache if !}

  • Workflows: TextRank pipeline

    Spark

    create word graph

    RDD

    wordgraph

    NetworkX

    visualizegraph

    GraphX

    runTextRank

    Spark

    extractphrases

    rankedphrases

    parsedJSON

  • Workflows: TextRank pipeline

    "Compatibility of systems of linear constraints"

    [{'index': 0, 'stem': 'compat', 'tag': 'NNP','word': 'compatibility'}, {'index': 1, 'stem': 'of', 'tag': 'IN', 'word': 'of'}, {'index': 2, 'stem': 'system', 'tag': 'NNS', 'word': 'systems'}, {'index': 3, 'stem': 'of', 'tag': 'IN', 'word': 'of'}, {'index': 4, 'stem': 'linear', 'tag': 'JJ', 'word': 'linear'}, {'index': 5, 'stem': 'constraint', 'tag': 'NNS','word': 'constraints'}]

    compat

    system

    linear

    constraint

    1:

    2:

    3:

    TextRank: Bringing Order into Texts

    Rada Mihalcea, Paul Tarau

    http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf

  • https://en.wikipedia.org/wiki/PageRank

    Workflows: TextRank how it works

  • TextRank impl

  • TextRank impl: load parquet files

    import org.apache.spark.graphx._!import org.apache.spark.rdd.RDD!!val sqlCtx = new org.apache.spark.sql.SQLContext(sc)!import sqlCtx._!!val edge = sqlCtx.parquetFile("graf_edge.parquet")!edge.registerTempTable("edge")!!val node = sqlCtx.parquetFile("graf_node.parquet")!node.registerTempTable("node")!!// pick one message as an example; at scale we'd parallelize!val msg_id = "CA+B-+fyrBU1yGZAYJM_u=gnBVtzB=sXoBHkhmS-6L1n8K5Hhbw"

  • TextRank impl: use SparkSQL to collect node list + edge list

    val sql = """!SELECT node_id, root !FROM node !WHERE id='%s' AND keep='1'!""".format(msg_id)!!val n = sqlCtx.sql(sql.stripMargin).distinct()!val nodes: RDD[(Long, String)] = n.map{ p =>! (p(0).asInstanceOf[Int].toLong, p(1).asInstanceOf[String])!}!nodes.collect()!!val sql = """!SELECT node0, node1 !FROM edge !WHERE id='%s'!""".format(msg_id)!!val e = sqlCtx.sql(sql.stripMargin).distinct()!val edges: RDD[Edge[Int]] = e.map{ p =>! Edge(p(0).asInstanceOf[Int].toLong, p(1).asInstanceOf[Int].toLong, 0)!}!edges.collect()

  • TextRank impl: use GraphX to run PageRank

    // run PageRank!val g: Graph[String, Int] = Graph(nodes, edges)!val r = g.pageRank(0.0001).vertices!!r.join(nodes).sortBy(_._2._1, ascending=false).foreach(println)!!// save the ranks!case class Rank(id: Int, rank: Float)!val rank = r.map(p => Rank(p._1.toInt, p._2.toFloat))!rank.registerTempTable("rank")!!def median[T](s: Seq[T])(implicit n: Fractional[T]) = {! import n._! val (lower, upper) = s.sortWith(_

  • TextRank impl: join ranked words with parsed text

    var span:List[String] = List()!var last_index = -1!var rank_sum = 0.0!!var phrases:collection.mutable.Map[String, Double] = collection.mutable.Map()!!val sql = """!SELECT n.num, n.raw, r.rank!FROM node n JOIN rank r ON n.node_id = r.id !WHERE n.id='%s' AND n.keep='1'!ORDER BY n.num!""".format(msg_id)!!val s = sqlCtx.sql(sql.stripMargin).collect()

  • TextRank impl: pull strings for the top-ranked keyphrases

    s.foreach { x => ! //println (x)! val index = x.getInt(0)! val word = x.getString(1)! val rank = x.getFloat(2)! var isStop = false!! // test for break from past! if (span.size > 0 && rank < min_rank) isStop = true! if (span.size > 0 && (index - last_index > 1)) isStop = true!! // clear accumulation! if (isStop) {! val phrase = span.mkString(" ")! phrases += (phrase -> rank_sum)!! span = List()! last_index = index! rank_sum = 0.0! }!! // start or append! if (rank >= min_rank) {! span = span :+ word! last_index = index! rank_sum += rank! }!}!

  • TextRank impl: report the top keyphrases

    // summarize the text as a list of ranked keyphrases!val summary = sc.parallelize(phrases.toSeq)! .distinct()! .sortBy(_._2, ascending=false)

  • Reply Graph

  • Reply Graph: load parquet files

    import org.apache.spark.graphx._!import org.apache.spark.rdd.RDD!!val sqlCtx = new org.apache.spark.sql.SQLContext(sc)!import sqlCtx._!!val edge = sqlCtx.parquetFile("reply_edge.parquet")!edge.registerTempTable("edge")!!val node = sqlCtx.parquetFile("reply_node.parquet")!node.registerTempTable("node")!!edge.schemaString!node.schemaString

  • Reply Graph: use SparkSQL to collect node list + edge list

    val sql = "SELECT id, sender FROM node"!val n = sqlCtx.sql(sql).distinct()!val nodes: RDD[(Long, String)] = n.map{ p =>! (p(0).asInstanceOf[Long], p(1).asInstanceOf[String])!}!nodes.collect()!!val sql = "SELECT replier, sender, num FROM edge"!val e = sqlCtx.sql(sql).distinct()!val edges: RDD[Edge[Int]] = e.map{ p =>! Edge(p(0).asInstanceOf[Long], p(1).asInstanceOf[Long], p(2).asInstanceOf[Int])!}!edges.collect()

  • Reply Graph: use GraphX to run graph analytics

    // run graph analytics!val g: Graph[String, Int] = Graph(nodes, edges)!val r = g.pageRank(0.0001).vertices!r.join(nodes).sortBy(_._2._1, ascending=false).foreach(println)!!// define a reduce operation to compute the highest degree vertex!def max(a: (VertexId, Int), b: (VertexId, Int)): (VertexId, Int) = {! if (a._2 > b._2) a else b!}!!// compute the max degrees!val maxInDegree: (VertexId, Int) = g.inDegrees.reduce(max)!val maxOutDegree: (VertexId, Int) = g.outDegrees.reduce(max)!val maxDegrees: (VertexId, Int) = g.degrees.reduce(max)!!// connected components!val scc = g.stronglyConnectedComponents(10).vertices!node.join(scc).foreach(println)

  • Reply Graph: PageRank of top dev@spark email, 4Q2014

    (389,(22.690229478710016,Sean Owen ))!(857,(20.832469059298248,Akhil Das ))!(652,(13.281821379806798,Michael Armbrust ))!(101,(9.963167550803664,Tobias Pfeiffer ))!(471,(9.614436778460558,Steve Lewis ))!(931,(8.217073486575732,shahab ))!(48,(7.653814912512137,ll ))!(1011,(7.602002681952157,Ashic Mahtab ))!(1055,(7.572376489758199,Cheng Lian ))!(122,(6.87247388819558,Gerard Maas ))!(904,(6.252657820614504,Xiangrui Meng ))!(827,(6.0941062762076115,Jianshi Huang ))!(887,(5.835053915864531,Davies Liu ))!(303,(5.724235650446037,Ted Yu ))!(206,(5.430238461114108,Deep Pradhan ))!(483,(5.332452537151523,Akshat Aranya ))!(185,(5.259438927615685,SK ))!(636,(5.235941228955769,Matei Zaharia ))!!// seaaaaaaaaaan!!maxInDegree: (org.apache.spark.graphx.VertexId, Int) = (389,126)!maxOutDegree: (org.apache.spark.graphx.VertexId, Int) = (389,170)!maxDegrees: (org.apache.spark.graphx.VertexId, Int) = (389,296)

  • Reply Graph: What SSSP looks like in GraphX/Pregel

    github.com/ceteri/spark-exercises/blob/master/src/main/scala/com/databricks/apps/graphx/sssp.scala

  • Look Ahead: Where is this heading?

    Feature learning with Word2VecMatt Krzuswww.yseam.com/blog/WV.html

    rankedphrases

    GraphX

    runCon.Comp.

    MLlib

    runWord2Vec

    aggregatedby topic

    MLlib

    runKMeans

    topicvectors

    better than LDA?

    features models insights

  • Resources

  • Apache Spark developer certificate program

    http://oreilly.com/go/sparkcert

    defined by Spark experts @Databricks

    assessed by OReilly Media

    establishes the bar for Spark expertise

    certification:

  • MOOCs:

    Anthony Joseph UC Berkeley

    begins 2015-02-23

    edx.org/course/uc-berkeleyx/uc-berkeleyx-cs100-1x-introduction-big-6181

    Ameet Talwalkar UCLA

    begins 2015-04-14

    edx.org/course/uc-berkeleyx/uc-berkeleyx-cs190-1x-scalable-machine-6066

  • community:

    spark.apache.org/community.html

    events worldwide: goo.gl/2YqJZK

    !video+preso archives: spark-summit.org

    resources: databricks.com/spark-training-resources

    workshops: databricks.com/spark-training

  • http://spark-summit.org/

  • confs:Strata CA San Jose, Feb 18-20 strataconf.com/strata2015

    Spark Summit East NYC, Mar 18-19 spark-summit.org/east

    Big Data Tech Con Boston, Apr 26-28 bigdatatechcon.com

    Strata EULondon, May 5-7 strataconf.com/big-data-conference-uk-2015

    Spark Summit 2015 SF, Jun 15-17 spark-summit.org

  • books:

    Fast Data Processing with Spark Holden Karau Packt (2013) shop.oreilly.com/product/9781782167068.do

    Spark in Action Chris FreglyManning (2015*) sparkinaction.com/

    Learning Spark Holden Karau, Andy Konwinski, Matei ZahariaOReilly (2015*) shop.oreilly.com/product/0636920028512.do

  • presenter:

    Just Enough Math OReilly, 2014

    justenoughmath.compreview: youtu.be/TQ58cWgdCpA

    monthly newsletter for updates, events, conf summaries, etc.: liber118.com/pxn/

    Enterprise Data Workflows with Cascading OReilly, 2013

    shop.oreilly.com/product/0636920028536.do