Data Tactics Analytics Brown Bag (Aug 22, 2013)

83
DT Brown Bag: A Primer in Analytics WELCOME! R 2 = 500; p<marty’s 1mile time asymptotically approaching perfect Thursday, August 22, 13
  • date post

    18-Oct-2014
  • Category

    Education

  • view

    29.576
  • download

    1

description

Data Tactics Analytics Brown Bag (Aug 22, 2013)

Transcript of Data Tactics Analytics Brown Bag (Aug 22, 2013)

Page 1: Data Tactics Analytics Brown Bag (Aug 22, 2013)

DT Brown Bag: A Primer in Analytics

WELCOME!

R2 = 500; p<marty’s 1mile time

asymptotically approaching perfect

Thursday, August 22, 13

Page 2: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Outline•EAT, Guten Appetit, Bon appetit, Buen apetito, Buon appetito!

•Words from the VP

•Why this brown-bag?

•Analytics Services:•Team Introduction; About YOU! •Why Analytics!?•Philosophy...

•Case Studies:•Case Study (Nathan D.)•Localview (Marty A.)•Case Study (me)

•Core Values: Analytical Insights•On the horizon...

Thursday, August 22, 13

Page 3: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Why this brown bag??Learning [close] at a pace similar to the pace at which we learn.

Learning and Educating from/to PMs, SWE, and OPs.

PM: Provide insights from FRIs/RFPs.PM: Atmospherics from our costumers.

SWE: Accessing data spaces.SWE: Integrating algorithms.

OP: How do you best consume the outputs of models?OP: What models are best to present to OPs?

PM: Program Managers, SWE: Software Engineers, OP: Operators

Thursday, August 22, 13

Page 4: Data Tactics Analytics Brown Bag (Aug 22, 2013)

ISW

USMADARPA%...%

Why this brown bag??

Thursday, August 22, 13

Page 5: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Data Tactics Analytics Practice The Team: (Nathan D., Shrayes R., David P., Adam VE., Andrew T., Geoffrey B., Rich H.)

Graduates from top universities...

Degrees include:mathematics, computer science, aeronautical engineering, astrophysics, electrical engineering, mechanical engineering, statistics, social science(s).

Base competencies (horizontals): Clustering, Association Rules, Regression, Naive Bayesian Classifier, Decision Trees, Time-Series, Text Analysis.

Going beyond the base (verticals)...Thursday, August 22, 13

Page 6: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Data Tactics Analytics Practice ABOUT YOU:

28 confirmed, 18 webex, 14 tentative (n:60 represent > 25% of the company)21 confirmed within the first 60 minutes....

Monsee Wood & Steve Moccio 1stCharles Fuller & Lenesto Page Last

Chris Zilligen: 3,120 (Longest resume)Catherine Schymanski: 284 (shortest resume)

Linguistic Standard:Jack Gustafson (FK: -126)Shrayes Ramesh (FK: -38)

...analytics team below the company average!! :)

Thursday, August 22, 13

Page 7: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Horizontals & Verticals

Clustering || Regression || Decision Trees || Text Analysis Association Rules || Naive Bayesian Classifier || Time Series Analysis

econ

ometr

ics

spatia

l econ

ometr

ics

graph

theo

ry alg

orithm

s

astrop

hysica

l time-s

eries a

nalys

is

path

plann

ing alg

orithm

s

bayes

ian st

atistics

const

rained

optim

izatio

ns

numeric

al inte

gratio

n tec

hniqu

es

PCA

GLM

hierar

chica

l mod

els

IRT

DLISA

latent

class

analy

sis

struc

tural e

quatio

n mod

eling

mixture

modelsSVM

maxent

CART

naive

bayes

classi

fier ICA

Thursday, August 22, 13

Page 8: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Data Tactics Analytics Practice

Progra

mming &

Scripting

Skills Mathematics & Statistics

Domain Expertise

DTAnalytics Traditional

ResearchDange

r Zon

e!

~statist

iculati

on

ML

[2] http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram[1] Statisticulation “How to Lie with Statistics” Darrell Huff

[3] https://portal.data-tactics-corp.com/sites/analytics/Wiki/AnalyticsFAQ.aspx

Thursday, August 22, 13

Page 9: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Why Analytics [Business]??? Why are analytics important?

(Business, Analytics, Practical)

"We need to stop reinventing the cloud and start using it!"

(Dave Boyd)

Thursday, August 22, 13

Page 10: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Why are analytics important? (Business, Analytics, Practical)

Analytics:

No Free Lunch (NFL) theorems: no algorithm performs better than any other when their performance is averaged uniformly over all possible problems of a particular type. Algorithms must be designed for a particular domain or style of problem, and that there is no such thing as a general purpose algorithm.

Why Analytics [Analytics]???

Thursday, August 22, 13

Page 11: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Marty doesn’t scale - none of us do.

Data Scales

Web Scales

Academic Publications Scale

IC Scales

N

t

t

Why Analytics [Practical]???

Thursday, August 22, 13

Page 12: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Why Analytics [Practical]??? Why are analytics important?

(Business, Analytics, Practical)

“…the alternative to good statistics is not “no statistics,” it’s bad statistics. People who argue

against statistical reasoning often end up backing up their arguments with whatever numbers they have at

their command, over- or under-adjusting in their eagerness to avoid anything systematic” Bill James

Thursday, August 22, 13

Page 13: Data Tactics Analytics Brown Bag (Aug 22, 2013)

"companies that have massive amounts of data without massive amounts of clue are going to be

displaced by startups that have less data but more clue" (Tim O’Reilly)

Philosophy:

Thursday, August 22, 13

Page 14: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Philosophy:

We are NOT “Data Agnostic” ...this should represent an early warning system about our culture. The IT notion of data is dead.

Thursday, August 22, 13

Page 15: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Analytics in Perspective...

http://datatactics.blogspot.com/2013/07/analytics-in-perspective-inquiry-into.html

Analytics in Perspective: An Inquiry into Modes of Inquiry

Thursday, August 22, 13

Page 16: Data Tactics Analytics Brown Bag (Aug 22, 2013)

“Analytics in Perspective” reflects how people arrive at decisions.

GOOD: Induction, Abduction, Circumscription, Counterfactuals.

BAD: Deduction, Speculation, Justification, Groupthink

Analytics in Perspective...

Thursday, August 22, 13

Page 17: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Identifying Smugglers

Leveraging Big Spatio-Temporal Data

Thursday, August 22, 13

Page 18: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Background: The Strait of Hormuz

Importance:• Oil• Embargo• Smuggling

Thursday, August 22, 13

Page 19: Data Tactics Analytics Brown Bag (Aug 22, 2013)

How to Catch Smugglers

In order to stop smugglers, we must identify:

1. Which boats are undertaking illicit activities2. Where illicit activities are taking place3. Points of departure/arrival of suspicious ships

Thursday, August 22, 13

Page 20: Data Tactics Analytics Brown Bag (Aug 22, 2013)

A Difficult Task: Too Much Data

AIS (transponder) provides ship-level data:• Ship location (lat-long)• Ship speed• Ship bearing• Ship “purpose”• Time stamp

About 0.5M pings from 1,300 boats between March 2012 and January 2013.

Thursday, August 22, 13

Page 21: Data Tactics Analytics Brown Bag (Aug 22, 2013)

A Difficult Task: Too Much Data

Thursday, August 22, 13

Page 22: Data Tactics Analytics Brown Bag (Aug 22, 2013)

A Difficult Task: Too Little Data

Individual pings or tracks not useful: no point of comparison

Similarly, small duration plots are too thin to provide analytic leverage.

Thursday, August 22, 13

Page 23: Data Tactics Analytics Brown Bag (Aug 22, 2013)

A Difficult Task: Too Little Data.

A single boat:

Thursday, August 22, 13

Page 24: Data Tactics Analytics Brown Bag (Aug 22, 2013)

A Difficult Task: Too Little Data.

A single day:

Thursday, August 22, 13

Page 25: Data Tactics Analytics Brown Bag (Aug 22, 2013)

A Difficult Task: Many Types of Boats

Thursday, August 22, 13

Page 26: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Solution: Analytics

Use a statistical model to discover patterns in the data…

…then identify observations (boat-times) that do not fit those patterns.

Goal: Identify boats, place, and times that exhibit or house discrepant behavior.

Thursday, August 22, 13

Page 27: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Characteristics of a Good Model

A good model for this data should:• Leverage all of the available data• Take advantage of local information (not global patterns)• Be able to accommodate a variety of patterns (shipping,

fishing, etc)• Be able to identify ships that are only occasionally deviant• Identify place-times where deviant activity occurs• Be estimable with reasonable computational resources

Thursday, August 22, 13

Page 28: Data Tactics Analytics Brown Bag (Aug 22, 2013)

The Model

A local, unsupervised-as-supervised learning, bagged, probability model.

A LUBaP model?

Thursday, August 22, 13

Page 29: Data Tactics Analytics Brown Bag (Aug 22, 2013)

The Model

A local, unsupervised-as-supervised learning, bagged, probability model.

We want to compare apples-to-apples; that is, treat nearby (spatio-temporally) boats the same, don't compare them to far-flung ones.

Assign each observation to a geographically constrained grid square.

Thursday, August 22, 13

Page 30: Data Tactics Analytics Brown Bag (Aug 22, 2013)

The Model

A local, unsupervised-as-supervised learning, bagged, probability model.

Thursday, August 22, 13

Page 31: Data Tactics Analytics Brown Bag (Aug 22, 2013)

The Model

A local, unsupervised-as-supervised learning, bagged, probability model.Let m denote the number of observations in a particular grid square. Then, in each square, add m additional observations with the following characteristics: •position, drawn from bivariate uniform distribution •speed, drawn with replacement from empirical distribution •time of observation, drawn from a uniform distribution

Now, the task is no longer unsupervised, but supervised.->Model the probability of a boat being a ``real'' boat.

Thursday, August 22, 13

Page 32: Data Tactics Analytics Brown Bag (Aug 22, 2013)

The Model

Thursday, August 22, 13

Page 33: Data Tactics Analytics Brown Bag (Aug 22, 2013)

The Model

Thursday, August 22, 13

Page 34: Data Tactics Analytics Brown Bag (Aug 22, 2013)

The Model

A local, unsupervised-as-supervised learning, bagged, probability model.•Turned outlier detection, a poorly structured problem, into modeling a binary target, a very well-understood problem

•Now, simply model the probability that each boat is “real”

•Apply logistic regression to each grid square

•Allow the flexibility (order) of the model fit (splines, interactions) to depend on the data density in each square (more data, richer model).

•logit(“real”) = f(speed, location, time)

Thursday, August 22, 13

Page 35: Data Tactics Analytics Brown Bag (Aug 22, 2013)

The Model

A local, unsupervised-as-supervised learning, bagged, probability model.

Problem: Predictions may be arbitrary due to random assignment and grid coarseness.

Thursday, August 22, 13

Page 36: Data Tactics Analytics Brown Bag (Aug 22, 2013)

The Model

A local, unsupervised-as-supervised learning, bagged, probability model.

Problem: Predictions may be arbitrary due to random assignment and grid coarseness.

Solution: 1. Create multiple grids with different positions.2. Re-run the local model in each square, for

each different grid.3. Aggregate the predicted probabilities for each

observation, in each grid, by averaging.

Thursday, August 22, 13

Page 37: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Computational Efficiency

Estimating a flexible model in each of ~300 grid squares, for each of 6 grids, means estimating ~1,800 logistic models!

Not a problem, because: • each one has limited amounts of data (most algorithms take

exponentially longer as a function of data size) • each local model is separate, allowing for parallel

processing

Computation on my laptop takes ~4 minutes after simple parallelization across cores.

Thursday, August 22, 13

Page 38: Data Tactics Analytics Brown Bag (Aug 22, 2013)

What is the Output from this Model?

•Predicted probability of each boat-time (i.e. observation) being a real boat.

•High probabilities indicate observations doing something “normal” or “predictable.”

•Low probabilities indicate observations doing something “discrepant.”

Ship ID Lat Long Speed Timestamp Pr

623432 24.546 55.005 9.8 1203221230 0.78

874627 24.716 55.108 12.4 1209242230 0.08

523881 25.128 54.807 4.2 1206120947 0.64

Thursday, August 22, 13

Page 39: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Value I: Location of Illicit Activities

Thursday, August 22, 13

Page 40: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Value II: Identify Devious Boats

Thursday, August 22, 13

Page 41: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Value III: Prioritized List of Suspect Boats

•Model generates probabilities on an interval scale

•Facilitates efficient use of scarce enforcement resources

Thursday, August 22, 13

Page 42: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Lessons Learned

Analytics is a powerful tool for identifying patterns in big data.

Identifying outliers is predicated on identifying patterns.

LUBaP models are a powerful tool for outlier detection.

This model utilizes no subject matter expertise and a simple probability model (implications: portable across domains; fast)

Thursday, August 22, 13

Page 43: Data Tactics Analytics Brown Bag (Aug 22, 2013)

What’s the Next Hot Thing?

Unsupervised Scaling of Text Data

Thursday, August 22, 13

Page 44: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Analyzing Text is Important

The preponderance of data created today is free text, not structured numerical data.

One thing people want to do with text is “scale” it; that is, rank order it according to an underlying continuum.

Examples: -put a numerical value on what each product reviewer thinks of a particular product-generate a measure of the extremism of Iranian clerics based on their writings

Thursday, August 22, 13

Page 45: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Analyzing Text is Difficult

Text data is unstructured, and messy.

“I thought I would love the iPhone, but it’s actually not that great.”

Standard approaches:

1. Dictionary: Create a numeric value for many content-laden words; compare texts to the dictionary.

2. Estimation: Hand-score many texts; use the scores as a basis for training a statistical model for other texts.

Thursday, August 22, 13

Page 46: Data Tactics Analytics Brown Bag (Aug 22, 2013)

A New Approach

Each author’s use of a word implies they “support” that word, as opposed to words they don’t use. The model, developed for scaling ideological positions of legislators from votes, can be applied to word use.

Benefits:1: No dictionary!2: Language invariant!

https://github.com/DataTacticsCorp/text-analysis

Thursday, August 22, 13

Page 47: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Preliminary Example

Pulled down 2000 tweets, 1000 each with the hashtags #prolife and #prochoice.

Drop the hashtags (no cheating!), pre-process the text data, and run the model.

Thursday, August 22, 13

Page 48: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Output

Thursday, August 22, 13

Page 49: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Output

Thursday, August 22, 13

Page 50: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Output

Thursday, August 22, 13

Page 51: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Local Events, Worldwide Impact

Thursday, August 22, 13

Page 52: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Localview

Localview also known as “Lv”, is a Cloud/Web based proprietary Dashboard with an advanced analytics framework – the desired end state is an integrated data mining, knowledge discovery and pattern recognition of social and spatial pattering. Lv will provide end-users with globally and locally available historical information as well as globally and locally available real-time social media data feed. This service includes; news, on the spot statistics using a proprietary Data Tactics Tool called

©

“ZoomStat”, historical facts, social media, economics, security, military, infrastructure, health, aid, natural disasters, war, entertainment, weather, transportation, and travel. All results will be analyzed, ingested, normalized, and then plotted on a dynamic and interactive global map.

Thursday, August 22, 13

Page 53: Data Tactics Analytics Brown Bag (Aug 22, 2013)

...by the numbers

7 volunteered & part time team members (NO OVERHEAD) first DEMO delivered in 86 days

832 hours of research & development time

Thursday, August 22, 13

Page 54: Data Tactics Analytics Brown Bag (Aug 22, 2013)

The Team: The Team

backend development frontend development data analysis development

Marty A

Joe AJoon K

Annie W Dave PRich H

Shenoa H

Thursday, August 22, 13

Page 55: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Evolution:

Thursday, August 22, 13

Page 56: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Evolution:

Thursday, August 22, 13

Page 57: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Development Process Lv Development Process

Thursday, August 22, 13

Page 58: Data Tactics Analytics Brown Bag (Aug 22, 2013)

End-Users:

Law Enforcement

IC & DoD Commercial

Thursday, August 22, 13

Page 59: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Directional Space Time Analytics

Base-Rate Fallacy

Thursday, August 22, 13

Page 60: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Directional Space Time Analytics

Data Tactics has been working on a set of problems that require considered solutions. The following method compares distributions at two points in time, with a

particular focus on changes in the overall morphology of the distribution as well as mobility of individual observations within the distribution over that same period of time and contextually accounting for neighborhood effects. These

dynamics are illuminating and communicate time and explicitly account for underlying spatial dimension (Wy). Based on the integration of a dynamic local space-time together with direction statistics these methods provide

insights on the role of spatial dependence and uncontrolled variance over time and space.

Thursday, August 22, 13

Page 61: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Directional Space Time Analytics

This analysis demonstrates the utility of directional space time analytics on regional stability distribution dynamics. Drawing on recent advances in geovisualization [1], we suggest a spatially explicit view of mobility.Based on the integration of a dynamic local indicator of spatial association together with directional statistics and mapped data points to each observation, this framework provides new insights on the role of spatial dependence in regional stability and change. These approaches have been illustrated with state level incomes in the U.S. (1969-2008), Gross Domestic Product (1960 - 2011) Failed State Index (2010 - 2012), and GMTI data (t0, t1).

[1] Murray, A. T., Liu, Y., Rey, S. J., and Anselin, L. (2010). Exploring movement object patterns.

Thursday, August 22, 13

Page 62: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Per Capita Gross Domestic Product

A measure of the total output of a country that takes the gross domestic product (GDP) and divides it by the number of people in the country. The per capita GDP is especially useful when comparing one country to another because it shows the relative performance of the countries. A rise in per capita GDP signals growth in the economy and tends to translate as an increase in productivity.

GDP is widely used by economists to gauge economic recession and recovery and an economy's general monetary ability to address externalities. It is not meant to measure externalities. It serves as a general metric for a nominal monetary standard of living and is not adjusted for costs of living within a region.

Gross Domestic Product

GDP = private consumption + gross investment + government spending + (exports − imports), or

Thursday, August 22, 13

Page 63: Data Tactics Analytics Brown Bag (Aug 22, 2013)

GDP per. CapitaTime Span: 1960 to 2011 (51 temporal bin(s), 1 year intervals): 2000 to 2011 (12 temporal bin(s), 1 year intervals);

Spatial Area: Global;

Original Sample: 202 obs;

Data processing: imputation;

Pruned Sample: 145 observations;

Method: Directional Local Indicator of Spatial Autocorrelation (Moran’s I) with space-time classifications of High-high (Hh), high-High, Low-Low (LL), High Low (HL), Low-High (LH);

Spatial Weights: knn4;

Thursday, August 22, 13

Page 64: Data Tactics Analytics Brown Bag (Aug 22, 2013)

> describe(dlisa$yr2000)

> describe(dlisa$yr2011)V. Name n mean sd median mad min max range skew kurtosis yr2000 145 5759 9534 1491 1831 87 46453 46366 2.12 3.72yr2011 145 13292 20621 4666 5841 231 114232 114001 2.46 6.54

Directional Space Time Analytics

Thursday, August 22, 13

Page 65: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Directional Space Time Analytics

https://vimeo.com/69775085

Thursday, August 22, 13

Page 66: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Directional Space Time Analytics

2000:2011 (12 temporal bin(s), 1 year intervals);

Thursday, August 22, 13

Page 67: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Directional Space Time Analytics

What is wrong with Vermont[1]?

- Seemingly nothing!- Lies within head of approximately normal distribution - Not an outlier in a classical statistical sense - Vermont remains below the US average but is closing the gap.

[1] State Median Income

Thursday, August 22, 13

Page 68: Data Tactics Analytics Brown Bag (Aug 22, 2013)

State Median IncomeTime Span: 1969 to 2008 (40 temporal bin(s), 1 year intervals)

Spatial Area: Contiguous United States;

Original Sample: 48 obs;

Method: Directional Local Indicator of Spatial Autocorrelation (Moran’s I) with space-time classifications of High-high (Hh), high-High, Low-Low (LL), High Low (HL), Low-High (LH);

Spatial Weights: Rook Contiguity;

Thursday, August 22, 13

Page 69: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Directional Space Time Analytics

1969:2008 (40 temporal bin(s), 1 year intervals)

Thursday, August 22, 13

Page 70: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Directional Space Time Analytics

1969:2008 (40 temporal bin(s), 1 year intervals)

Thursday, August 22, 13

Page 71: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Directional Space Time Analytics

1969:2008 (40 temporal bin(s), 1 year intervals)

Thursday, August 22, 13

Page 72: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Directional Space Time Analytics

Thursday, August 22, 13

Page 73: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Core Values:Localview as an ecosystem:

Most existing big data analyses of social media are confined to a single platform. However, most of the topics of interest to such studies, such as influence or information flow can rarely be confined to the Internet, let alone to a single platform. Understandable difficulty in obtaining high-quality multi-platform data does not mean that we can treat a single platform as a closed and insular system, as if human information flows were all gases in a chamber.

“Shapes of stories into computers...” Kurt VonnegutNate Silver - Cognition2; Small Multiples; Tukey vs. Tufte

http://kottke.org/11/09/kurt-vonnegut-explains-the-shapes-of-storiesThursday, August 22, 13

Page 74: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Core Values:Open-source software where possible. 

-Bigger data means bigger cost. -Scientific Python and R Computing Language reached maturity years ago.

Data = Rough + Smooth QualitiesRough = impulsive, spiky signal: outliers; Smooth = pervasive Leverage analytics to help understand patterns in data as well as outliers - so called rough and smooth elements of data. The “smooth” and the “rough” patterns in data are informative, depending on the specific questions customers have.

Local, as opposed to global or whole-map statistics:We believe that micro-level, local patterns are often of key interest, and can be obscured or distorted by attempts to fit global models to local data. 

Analytical Pluralism: Mutli-method approaches dominate single-method approaches.  Rather than craft a single statistical model to answer a customer question, we attack problems from several angles simultaneously, deriving insights from areas of overlap and divergence in the pattern of findings.

Methodological pathways:Blend nomothetic and idiographic approaches.

Thursday, August 22, 13

Page 75: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Core Values:

Thursday, August 22, 13

Page 76: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Analytical Resources:

https://portal.data-tactics-corp.com/sites/analytics/SitePages/Home.aspx

Thursday, August 22, 13

Page 77: Data Tactics Analytics Brown Bag (Aug 22, 2013)

https://github.com/DataTacticsCorp

Analytical Resources:

Thursday, August 22, 13

Page 78: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Analytical Resources:

http://datatactics.blogspot.com

Thursday, August 22, 13

Page 79: Data Tactics Analytics Brown Bag (Aug 22, 2013)

...on the horizon....On the Horizon:

DT & USMA Department of Systems Engineering partner together and leverage the Advanced Individual Academic Development Program.

Rstudio: analytics.data-tactics-corp.com; PostgreSQL: analytics.data-tactics-corp.com Port: 5432

https://github.com/rheimann/kiva-master

Thursday, August 22, 13

Page 80: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Data Tactics & US Military Academy: A Prime in Microfinance using KIVA

Rstudio: analytics.data-tactics-corp.com; PostgreSQL: analytics.data-tactics-corp.com Port: 5432

Understanding the complex nature of microfinance more completely: The US military is directly involved in microfinance (Iraq & Afghanistan), working primarily through Provincial Reconstruction Teams (PRTs).  Funded by the DoD and DoS; the operational requirements of these agencies create a need to demonstrate quick impact on economic recovery and therefore the goal is to report high numbers of loans. 

Technical complexities separate this data from other datasets: Heterogeneous forms: structured/unstructured/nominal,ordinal, quantitative/temporal/geographic/multi-lingual/multiple relationships(lenders to recipients) - multiple sectors/missing data. Data cleansing is hard!Big Data(ish): $420M (USD), 1.1 million lenders, 580,000 loans, 250 partners, 4.1M transactions, 3 WHOLE GBs. (https://vimeo.com/28413747)

Broad appeal:...government to defense to finance to banking to non-profit organizations to THE POOR.

https://github.com/rheimann/kiva-master

Thursday, August 22, 13

Page 81: Data Tactics Analytics Brown Bag (Aug 22, 2013)

...on the horizon....On the Horizon:

DT & The Institute for the Study of War will collaborate in a balanced but largely quantitative approach to analyzing revolutions and the role social media plays with particular focus on the Iraq Spring.

Thursday, August 22, 13

Page 82: Data Tactics Analytics Brown Bag (Aug 22, 2013)

...on the horizon....on the Horizon:

Data Science for Program Managers (late September / early October)

Analytics Brown Bag Volume II (October / Early November)

Thursday, August 22, 13

Page 83: Data Tactics Analytics Brown Bag (Aug 22, 2013)

Thank you...

83

Questions?Homepage: http://www.data-tactics.com

Blog: http://datatactics.blogspot.comTwitter: https://twitter.com/DataTactics

Or, me (Rich Heimann) at [email protected]

Thursday, August 22, 13