Data Science Overview

24
Previously known as Think Big. Move Fast.

description

La BuzzWord dell’ultimo anno è “Data Science”. Ma cosa significa realmente? Cosa fa un “Data Scientist”? Che strumenti sono messi a disposizione da Microsoft? E che altri strumenti ci sono oltre a Microsoft?

Transcript of Data Science Overview

Page 1: Data Science Overview

Previously known as

Think Big. Move Fast.

Page 2: Data Science Overview

Template designed by

brought to you by

Page 3: Data Science Overview

SolidQ• Born in 2002 in USA and Spain

• Established in 2007 in Italy

• More than 1000 customers and more than 200 consultants worldwide

• Dedicated to Data Management on the Microsoft Platform

• Books Authors, Conference Speakers, SQL Server MVPs and Regional Directors

• www.solidq.com

Page 4: Data Science Overview

Davide Mauri• 18 Years of experience on the SQL Server Platform• Specialized in Data Solution Architecture, Database Design, Performance

Tuning, Business Intelligence• Microsoft SQL Server MVP• President of UGISS (Italian SQL Server UG)• Mentor @ SolidQ• Video, Book & Article Author• Regular Speaker @ SQL Server events• Projects, Consulting, Mentoring & Training

Page 5: Data Science Overview

Data ScienceReinassance 2.0

Page 6: Data Science Overview

“Companies are collecting mountains of information about

you, to predict how likely you are to buy a product,

and using that knowledge to craft a marketing message

precisely calibrated to get you to do so”

Business Week Magazine

1994

Page 7: Data Science Overview

Data Science• Extraction of knowledge from data

• So, what’s new?

• Nothing. Except that it’s now economic and fast.

• It’s now applicable to everything. And we have a lot of data produced everyday that can be used to extract knowledge

Page 8: Data Science Overview

Data Science

DecisionsKnowledgeInformationData

Page 9: Data Science Overview

Data Science• A Sum Of

• Statistics• Mathematics• Machine Learning• Data Mining• Computer Programming• Data Engineering• Visualization• Data Warehousing• High Performance Computing

• To support (Informed) Decision Making• Data-Driven Decisions

Page 10: Data Science Overview

Data Scientist• IBM

• A data scientist represents an evolution from the business or data analyst role. • The formal training is similar, with a solid foundation typically in computer science and

applications, modeling, statistics, analytics and math. • What sets the data scientist apart is strong business acumen, coupled with the ability to

communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge.

• It's almost like a Renaissance individual who really wants to learn and bring change to an organization.

Page 11: Data Science Overview

Algorithms• Algorithms are the new gatekeepers

• http://www.slideshare.net/socialisten/algorithms-are-the-new-gatekeepers • There is simply too much data for a human to analyze!• They decide

• What we find• What we see• What we buy

• Data is the foundation upon which algorithm works• Better Data lease Better Results

• Data-Driven Decisions will be a MUST in the next years!• Data Scientists will help companies to leverage their most valuable asset: Data

Page 12: Data Science Overview

Modern Data Environment

MasterData

EDWData Mart

Big Data

UnstructuredData

BI Environment

Analytics Environment

StructuredData

Page 13: Data Science Overview

Big Data

The 3 V

No, the 4 V!!!

No, no, the 5 V!!!!!6V!!!

Page 14: Data Science Overview

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

Page 15: Data Science Overview

Big Data• Volume, Velocity, Variety, Veracity….V<your-v-here>

• Data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time

• Grid Computing, Parallel Computing needed• keep processing time reasonable• provide scalability

Page 16: Data Science Overview

Big Data Data• Paradigm: “Store Now, Figure Out Later”

• Data is the new resource. Never throw it away!

• Unstructured Data• Text Files• Images• Sounds

• Structured/Semi Structured Data• Sensors• Transactions• Logs

Page 17: Data Science Overview

Data Storage• RDBMS

• SQL Server

• Hadoop• HDInsight• Hortonworks Data Platform

• Distributed File (Eco)System• CSV• JSON• *.*

Page 18: Data Science Overview

Data Storage• Hadoop Ecosystem

http://hortonworks.com/hadoop-modern-data-architecture/

Page 19: Data Science Overview

Data Science & Big Data• Data Science != Big Data

• Data Science Not Only on Big Data

• Data Science can be applied to Big Data

• Data Science starts from Small Data• 1) find the algorithm that extract knowledge• 2) measure algorithm results and in terms of probability

Page 20: Data Science Overview

Machine Learning• Machine learning, a branch of artificial intelligence, concerns the construction

and study of systems that can learn from data. (Wikipedia)• For example, a machine learning system could be trained on email messages to learn to

distinguish between spam and non-spam messages. After learning, it can then be used to classify new email messages into spam and non-spam folders.

• Flavors• Supervised• Unsupervised

Page 21: Data Science Overview

Data Analysis• Common Data Scientists Tools

• R• Weka• Octave• Scikit-Learn

• Common Data Scientists Languages• Python• Scala• F#

Page 22: Data Science Overview
Page 23: Data Science Overview

Resources• https://www.coursera.org/

• Data Scientist Specialization

• https://www.khanacademy.org/ • Math

• http://www.osservatori.net/business_intelligence • Italian Big Data Market Analysis Resources

• http://www.solidq.com/consulting/• Data Science Services• Big Data / Business Intelligence / Data Warehousing

Page 24: Data Science Overview

Previously known as

Think Big. Move Fast.