From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next?...

13
From Big Data Management to Big Data Science 1

Transcript of From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next?...

Page 1: From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next? Real big data is widely available Only a few people know how to deal with it You’re

From Big Data Management

to Big Data Science

1

Page 2: From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next? Real big data is widely available Only a few people know how to deal with it You’re

What is next?

Real big data is widely available

Only a few people know how to deal with it

You’re now one of them

Applications

The project is a start

Keep your hands dirty

Consider using the public cloud (e.g., AWS,

Google Cloud, or Microsoft Azure)

2

Page 3: From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next? Real big data is widely available Only a few people know how to deal with it You’re

Job Market

https://www.techicy.com/5-best-programming-languages-to-watch-out-in-2019-for-data-science.html

3

Page 4: From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next? Real big data is widely available Only a few people know how to deal with it You’re

Data Science

Credits: Drew Conway 4

Page 5: From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next? Real big data is widely available Only a few people know how to deal with it You’re

Data Science

https://mashimo.wordpress.com/2016/05/28/big-data-data-science-and-machine-learning-explained/

5

Page 6: From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next? Real big data is widely available Only a few people know how to deal with it You’re

Data Scientist

6

Page 7: From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next? Real big data is widely available Only a few people know how to deal with it You’re

Next Steps

CS

Big data tools

Python/R/Scala

Math/Stats

Linear algebra

Correlation analysis

Hypothesis tests

Collaboration with domain experts

Visualization

Prototyping

7

Page 8: From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next? Real big data is widely available Only a few people know how to deal with it You’re

CS

https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize

8

Page 9: From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next? Real big data is widely available Only a few people know how to deal with it You’re

CS/Big Data

https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize

9

Page 10: From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next? Real big data is widely available Only a few people know how to deal with it You’re

Math/Stats

https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize

10

Page 11: From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next? Real big data is widely available Only a few people know how to deal with it You’re

Online Courses

https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize

11

Page 12: From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next? Real big data is widely available Only a few people know how to deal with it You’re

Data Analytics

https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize

12

Page 13: From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next? Real big data is widely available Only a few people know how to deal with it You’re

Big Data Landscape

Distributed

StorageHDFS

KV

stores

LSM

trees

Column

stores

Query

Processing

Map

ReduceRDD Hyracks

High level

APIsPig

Latin

Spark

SQLHBase

Big data

packages

Algebricks

MLlib GraphX SparkR

13