Download - Implementation challenges in Big Data - Dr. Nilesh Karnik

Transcript
Page 1: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2Implementation Challenges in Big Data Analytics

• Dr. Nilesh N. Karnik

Page 2: Implementation challenges in Big Data - Dr. Nilesh Karnik

Copyright 2013 RESTRICTED CIRCULATION 2

The Challenge of BIG Data

ADVANCED Analytics

SOLUTIONS in the Pipeline

What we will discuss

Page 3: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2

3

Big Data : Distributed Processing

OLD IDEA NEW IDEA

!

Page 4: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2

5

EXAMPLE 1: Task of storing books on a shelf

Simple, right?

Image source Flickr. Image copyright belongs with original artist.

Page 5: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2

6

EXAMPLE 1: Task of storing books on a shelf

And now?

Image source Flickr. Image copyright belongs with original artist.

Page 6: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2

7

Image source Flickr. Image copyright belongs with original artist.

Page 7: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2

8

EXAMPLE 2 : Summarizing a Report

SUMMER PROJECT REPORT

Simple, right?

Page 8: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2

9

EXAMPLE 2 : Summarizing a Report

And now?

Page 9: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2

10

EXAMPLE 3 : Baking a Cake

Simple, right?And now?

Image source PINTEREST. Image copyright belongs with original artist.

Page 10: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2

11

Advanced Analytics

• Well developed tool set for “small data” environment

• Challenges in Big Data environment

Page 11: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2

12

Advanced Analytics: MapReduce Difficulties

ITERATIVE

Image source Flickr. Image copyright belongs with original artist.

Page 12: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2

13

Advanced Analytics: MapReduce Difficulties

INCREMENTAL PROCESSING REQUIRES RESTART

Image source Flickr. Image copyright belongs with original artist.

Page 13: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2

14

Advanced Analytics: MapReduce Difficulties

BATCH LEARNING SCANS ALL DATA IN ONE GO

Page 14: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2

15

Some Solutions Data Scientists are working on

New frameworks• E.g., HaLoop*, PrIter# (Extensions of Hadoop)

• Percolator$ (Proprietary Google framework)

* Y. Bu, B. Howe, M. Balazinska, and M. Ernst, “HaLoop: Efficient iterative data processing on large clusters”, VLDB, 2010.# Y. Zhang, Q. Gao, L. Gao and C. Wang, “PrIter: A distributed framework for prioritized iterative computations”, SoCC, 2011. $ D. Peng and F. Dabek, “Large-scale incremental processing using distributed transactions and notifications”, OSDI, 2010

Page 15: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2

16

Some Solutions Data Scientists are working on

Smarter algorithms / Different implementations

• Random forest

• Parallelized Stochastic Gradient Descent

Page 16: Implementation challenges in Big Data - Dr. Nilesh Karnik

Copyright 2013 RESTRICTED CIRCULATION 17

@[email protected]

Page 17: Implementation challenges in Big Data - Dr. Nilesh Karnik

Aureus Claims Solution

Copyright 2013 RESTRICTED CIRCULATION

Footer Option 2

SINGAPOREAureus Analytics Pte. Ltd.17, Phillip Street,#05-01, Grand BuildingSingapore (048695)

INDIAAureus Analytics Pvt. Ltd.

706, Powai Plaza

Hiranandani Gardens, Powai

Mumbai – [email protected] www.aureusanalytics.com

Thank You!