Lenddo - Data Driven NYC (27)

25
Empowering the Emerging Market Middle Class Big Data is not Big Database Jeff Stewart - CEO Naveen Agnihotri, PhD - CTO

description

Lenddo's CEO and CTO, Jeff Stewart and Naveen Agnihotri, presented at May's edition of Data Driven NYC, which focused on p2p lending.

Transcript of Lenddo - Data Driven NYC (27)

Page 1: Lenddo - Data Driven NYC (27)

Empowering the Emerging Market Middle Class

Big Data is not Big Database

Jeff Stewart - CEO Naveen Agnihotri, PhD - CTO

Page 2: Lenddo - Data Driven NYC (27)

“If you look 5 years out, every industry is going to be rethought in a social way”.

-Mark Zuckerberg, 2010

Page 3: Lenddo - Data Driven NYC (27)

● Founded in January 2011● Over 500K members around the world● Integrated with Facebook, LinkedIn, Google,

Yahoo, Twitter● Services oriented architecture (LAMP)

○ Front end (clients) in PHP○ Services in PHP and Python

● Technical team based in NY and PH● Data science team based in NY

LENDDO TECH FACTS

Page 4: Lenddo - Data Driven NYC (27)

Finance in the Age of Social Networks

Lenddo maintains the worlds largest Opt-in, TrustGraph, for trustworthiness and risk management

Lenddo is….

Social

Social sourcing & screeningPeer enforcement

New data sets

Algorithms

Unprecedented processing powerReal-time / ongoing risk management Targeting, underwriting & collections

Cloud

Rich risk analytic data setUnprecedented processing power

Global

Mobile

New datasets24/7 engagement

new cost structures

Page 5: Lenddo - Data Driven NYC (27)

Why Finance Works Better with Lenddo

Traditional

• Negative selection bias• Costly

• Fact verification time consuming • Scores incomplete or unavailable

• No peer enforcement• Labor intensive• Hard to maintain contact

DEMANDGENERATION

UNDERWRITING

COLLECTIONS

• Digital, fast and potentially viral• Less Expensive• Social nature cause positive selection bias

• Reduced Fraud and default • Big data and powerful algorithms• Larger addressable market • Easily automatable

• Potential for peer enforcement• Lower cost• More points of contact

With Lenddo

Page 6: Lenddo - Data Driven NYC (27)

Source: http://www.kpcb.com/insights/2013-internet-trends

ID Verification is easier online

Page 7: Lenddo - Data Driven NYC (27)
Page 8: Lenddo - Data Driven NYC (27)

● 100% infrastructure on AWS ● Store social data from all online social

networks● Opt-in Social data storage grows about 10

times faster than member data● Social data currently about 3.5 TB● Largest table (dataset) is > 2 billion records

LENDDO SOCIAL DATA

Page 9: Lenddo - Data Driven NYC (27)

GOOD AND BAD BORROWERS

26

n=1347

Page 10: Lenddo - Data Driven NYC (27)

CLUSTERS

27

Page 11: Lenddo - Data Driven NYC (27)

LOAN SCORE IMPROVEMENT

24

No NLP or network

Page 12: Lenddo - Data Driven NYC (27)

LOAN SCORE IMPROVEMENT

24

No NLP or network With NLP and network

Page 13: Lenddo - Data Driven NYC (27)

WORD CLUSTERS

17

Words associate closely together, and can be commonly associated with good or bad loans.

Page 14: Lenddo - Data Driven NYC (27)

WORDS AND LOAN QUALITY

18

% Association with BAD loans

% Association with GOOD loans

Page 15: Lenddo - Data Driven NYC (27)

● Started with MongoDB for social data storage● As use cases grew, we added indexes

SOCIAL DATA STORAGE HISTORY

Page 16: Lenddo - Data Driven NYC (27)

SOCIAL DATA STORAGE

User data

Social data

Page 17: Lenddo - Data Driven NYC (27)

SOCIAL DATA STORAGE

Social data User data

Page 18: Lenddo - Data Driven NYC (27)

● We moved to larger and larger servers○ At last iteration, used cr1.8xlarge server○ 32 CPUs, 244 GB RAM○ Still couldn’t keep up with index size

● Data acquisition speeds increased○ provisioned IOPS to the rescue!

● Total cost of social data storage: > $10,000 per month● And we want to grow faster!

SOCIAL DATA STORAGE HISTORY

Page 19: Lenddo - Data Driven NYC (27)

● Simple queries (by index)● Complex queries (by multiple indexes)● Pull out all data for a member● Aggregate all data for a member● Calculate score for a member● Aggregate all data for all members● Calculate score for all members

SOCIAL DATA STORAGE HISTORY

Page 20: Lenddo - Data Driven NYC (27)

?

REVELATION: 2013

Page 21: Lenddo - Data Driven NYC (27)

It’s“BIG DATA”

not“BIG DATABASE”

REVELATION: 2013

Page 22: Lenddo - Data Driven NYC (27)

● Moved all data to Amazon S3● Data model remains largely unchanged● Hadoop compatible storage format

○ Avro format○ Snappy compressed, chunked

● Created a small ‘cache’ type MongoDB○ stores recent data temporarily

● Using DynamoDB for longer-lived data that needs to be queried all the time

SOCIAL DATA REVAMP - 2013

Page 23: Lenddo - Data Driven NYC (27)

● Use the cache for data when it first arrives○ Data is available for quick computations and

● Move data from cache to S3 at the end of the day● Use EMR over S3 data for all aggregations● Created a EMR based map-reduce framework for data

science team● Standard EMR jobs for common queries:

○ All social data for a member○ Score one member○ Score all members

NEW SOCIAL DATA USAGE

Page 24: Lenddo - Data Driven NYC (27)

● Peace of mind○ No more database maintenance○ No more periodic server upgrades

● Scalability○ Storage and access remains identical for the next

10x growth● $$$

○ New cost: < $3000 per month: 70% less!○ Includes EMR clusters running routine jobs

WHAT DID WE GAIN?

Page 25: Lenddo - Data Driven NYC (27)

Thanks!