Data Science in Future Tense

Data Science in Future Tense !

GalvanizeU Launch! 2014-10-29 gulaunch.splashthat.com

Paco Nathan @pacoid

Whither Data Science?

twitter.com/josh_wills/status/198093512149958656

issue: aristotelian perspectives in a non-linear world…

circa 2008: a large ad-tech firm, running one of the largest Hadoop instances in the cloud, execs said “Don’t bother” to dig into analysis of geo, clustering, time series, etc.

circa 2008: a large ad-tech firm, running one of the largest Hadoop instances in the cloud, execs said “Don’t bother” to dig into analysis of geo, clustering, time series, etc. !We did anyway.

circa 2008: a large ad-tech firm, running one of the largest Hadoop instances in the cloud, execs said “Don’t bother” to dig into analysis of geo, clustering, time series, etc. !We did anyway:

• people in SF don’t click online travel ads much, however, people in Dodge City do… a lot!

• largest customer segment: flag poles, portable generators, hammocks, sea salt, mail-order steaks, defibrillators

primary sources for the notion:

Cleveland, W. S., “Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics,” International Statistical Review (2001), 69, 21-26. http://cm.bell-labs.com/stat/doc/datascience.ps

Breiman L., “Statistical modeling: the two cultures”, Statistical Science (2001), 16:199-231. http://projecteuclid.org/euclid.ss/1009213726

…also good to mention John Tukey

we have a long, long way yet to go:

So many problems that we encounter in industry can be represented as graphs… !Tensors provide means for representing multiple-edge graphs, ostensibly solving for a general case… !Even so, how much time have you spent working with tensors for data science apps?

wikipedia.org

Historical Arc 1: The Alchemists…

“Who has the crystal ball?”

Arc 1: Who has the crystal ball?

TL;DR: Nods to some people who envisaged and modeled our shared future…

Theory, Eight Decades Ago: what can be computed?

Haskell Curry haskell.org

Alonso Churchwikipedia.org

John Backusacm.org

David Turnerwikipedia.org

Praxis, Four Decades Ago: algebra for applicative systems

Pattie MaesMIT Media Lab

Reality, Two Decades Ago: web apps, ML, machine data

Arc 1: Who has the crystal ball?

spark.apache.org

A Brief History: Functional Programming for Big Data

databricks.com/blog/2014/10/10/spark-petabyte-sort.html

A Brief History: Smashing The Previous Petabyte Sort Record

spark.apache.org

Historical Arc 2: An Oblivoir Of Origins…

“Why are we here?”

Arc 2: Why are we here?

TL;DR: We share the delightful role of… !!speaking truth to power

Reason 1: early 19th c. Prussian/Napoleonic “General Staff” organization => corporate IT silos !translated: too many people saying “That is not my concern.” !action: interdisciplinary teams tear down silos, surfacing insights

Reason 2: 19th-20th c. statistics emphasized defensibility in lieu of predictability !translated: defend one’s job, not boost top-line revenue !action: focus on predictability; if you need to defend your job, you should be working elsewhere

Reason 3: machine learning derives from several disciplines, but ultimately is a subset of optimization !translated: they couldn’t talk to each other very much, we have difficulty understanding them collectively !action: learn to leverage optimization theory, thoroughly

Reason 4: university math curricula are still tilted toward Cold War priorities !translated: 2-3 years calculus weeds out the better mechanical engineering candidates who can build the most cost-effective ICBMs !action: leadership must embrace how to leverage advanced math for business use cases

Reason 5: brogrammers tend to emphasize logical reasoning over analytic reasoning !translated: left-brained lopsidedness wins temporarily, then fails spectacularly !action: ask security to walk the brogrammers back to their cave

Reason 6: people can make intuitive decisions in ~4 dimensions at most, period !translated: product managers as Steve Jobs wannabes are poisonous !action: leverage data science, visualization, machine learning with distributed systems at scale to address the high dimensionality of data

Reason 7: embracing perpetual learning curves represents a promethean challenge !translated: learning is hard, and many organizations go to great lengths to minimize it !action: learn efficiently, continually, with a great thirst

Historical Arc 3: Be There Then…

“What happens next?”

Arc 4: What happens next?

TL;DR: Brace yourselves…

• Full stack… no, really

• You’ll work with functional programming and cloud-based notebooks

• Shift from modeling based on variance (batch) towards probabilistic approximation

• Early data scientists displace the old-school product managers

• IoT, drones, microsats: several orders of magnitude more data up ahead

• leave SF – the more interesting data science work to be accomplished is not here

Full stack… no, really

from visualization to virtualization, all points in-between

source: Microsoft

You’ll work with functional programming and cloud-based notebooks

http://databricks.com/product

Shift from modeling based on variance (batch) towards probabilistic approximation

highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/

Early data scientists displace the old-school product managers

IoT, drones, microsats: several orders of magnitude more data up ahead

airshipse.g., JP Aerospace, 40 km

atmostatse.g., Titan Aerospace, 20 km

microsatse.g., Planet Labs, 400 km

robotse.g., Blue River, 1 m

sensorse.g., Hortau, -0.3 m

dronese.g., HoneyComb, 120 m

Layered Sensing Networks

leave SF – the more interesting data science work to be accomplished is not here

Summary?

After we’ve cleaned up data, formulated workflows in terms of monoids, used graph representation, and parallelized with a wealth of linear algebra, much of the heavy-lifting that remains on the clusters is in optimization

For example, deep learning @Google uses many layers of neural nets trained with gradient descent optimization Taming Latency Variability and Scaling Deep Learning Jeff Dean @Google (2013) youtu.be/S9twUcX1Zp0

Vector Quantization:

One advantage of quantum algorithms is to run large gradient descent problems in constant time… Reworking high-ROI apps to leverage lots of ML and large clusters, then SGD represents the datacenter cost basis, notably that part that scales…

Want to slash costs exponentially? Plug in quantum for a game-changer, maybe

Fast quantum algorithm for numerical gradient estimation Stephen P. Jordan Phys. Rev. Lett. 95, 050501 (2005) arxiv.org/abs/quant-ph/0405146 dwavesys.com

Proposal: let’s drop clusters of quantum devices into lunar polar craters, so we can handle massive vector quantization workloads

• micro-kelvin environs

• near perpetual sunlight for energy sources

• park routers at L4

• approx. $15B to finance, i.e., ~6 days DoD budget

We’ll just put this here… a couple o’ Googly projects in progress:

qCraft: Quantum Physics In Minecraft plus.google.com/u/1/+QuantumAILab/posts/grMbaaDGChH

“We’re going back to the Moon. For good.”lunar.xprize.org

Resources

• spark.apache.org/community.html

• databricks.com/spark-training

• oreilly.com/go/sparkcert

Apache Spark community:

events:Strata EUBarcelona, Nov 19-21 strataconf.com/strataeu2014 Data Day Texas Austin, Jan 10 datadaytexas.com Strata CA San Jose, Feb 18-20 strataconf.com/strata2015 Spark Summit East NYC, Mar 18-19 spark-summit.org/east

Spark Summit 2015 SF, Jun 15-17 spark-summit.org

presenter:

Just Enough Math O’Reilly, 2014

justenoughmath.compreview: youtu.be/TQ58cWgdCpA

monthly newsletter for updates, events, conf summaries, etc.: liber118.com/pxn/

Enterprise Data Workflows with Cascading O’Reilly, 2013

shop.oreilly.com/product/0636920028536.do

Data Science in Future Tense

Technology

Transcript of Data Science in Future Tense

Future tense

The Future Tense

Future tense.

Future Tense Worksheets

Future Continuous Tense..

Future perfect tense and future perfect continuous tense

Simple Future Tense

The Farlex Grammar Book · 2019-03-16 · Future Tense (Approximation) Future Simple Tense Future Continuous Tense Future Perfect Tense Future Perfect Continuous Tense Aspect Perfective

Future Time vs Future Tense

Past Tense Present Tense Future Tense - Infomeduvsfajardo.sld.cu/sites/uvsfajardo.sld.cu/files/5.1.pdf · Past Tense Present Tense Future Tense Susan visited London last year. She

SIMPLE Simple Present Tense. Simple Past Tense. Simple Future Tense.

Tenses of Verbs Present Tense Past Tense Future Tense.

Lesson Plan— Simple Tense Past tense, present tense and future tense Peggy Wu 598202065.

THE PRESENT SIMPLE. Tense Present Tense Past tense Future Tense.

Eu future tense

Future Tense Key Words for Future Tense Next … Tomorrow In … On … At …

Future Tense (will)

Future Continuous Tense

FUTURE PERFECT TENSE

Contractions future tense