Python in Data Science Work

Post on 09-Jan-2017

143 views 0 download

Transcript of Python in Data Science Work

PYTHON IN DATA SCIENCE WORKRICK BAHAGUE, DATA SCIENTIST RBAHAGUEJR@GMAIL.COM

Our Agenda

What is Data Science?

Introduction to Python

Python Tools for Data Science

A bit of Python for Big Data Processing

Questions

Data Science

Source: Python Data Analytics

Data Scientist asks relevant real world questions

Source: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

And hopefully, discovers

actionable recommendations

from data

TOOLS

WHAT IS PYTHON?

“THE NAME PYTHON COMES FROM THE SURREAL BRITISH

COMEDY GROUP MONTY PYTHON, NOT FROM THE SNAKE. PYTHON

PROGRAMMERS ARE AFFECTIONATELY CALLED

PYTHONISTAS, AND BOTH MONTY PYTHON AND SERPENTINE

REFERENCES USUALLY PEPPER PYTHON TUTORIALS AND

DOCUMENTATION.”Automate the Boring Stuff with Python

import antigravity

Installing Python

https://www.continuum.io/downloads

Launching Anaconda Python Distribution

When is data ready and prepared for analysis ?

Image source: http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-problem-abhishek-thakur/

Github: https://github.com/RickBahague/dspop

Sample Data Set: Github: https://github.com/veekun/pokedex

Pandas: Python Data Analysis Library

Import pandas library Reading/Writing Data Series DataFrame Selecting Internal Elements Assigning Values to Elements

Pandas: Python Data Analysis Library

Evaluating Values (unique, isin, value_counts, NaN) Filtering Values Transpose Operations between DataFrame and Series Statistics Functions, Correlation/Covariance

Scikit-learn & ML Basics

... learning from experience either with or without supervision of

humansMastering Machine Learning with scikit-learn

ML Flow

Image source: http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-problem-abhishek-thakur/

Machine Learning with Scikit-learn

Source: http://scikit-learn.org/stable/

A bit of Big Data Processing

Source: Python Data Analytics

Creative Commons License

Python in Data Science Work by Rick Bahague is licensed under a Creative Commons

Attribution-NonCommercial-ShareAlike 4.0 International License.

Based on a work at https://medium.com/@rbahaguejr.

Permissions beyond the scope of this license may be available at https://medium.com/

@rbahaguejr.