Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem

6
Discovering Data Science Design Patterns with Examples from R and Python Dmitrij Petrov Autumn 2017 30/11/2017 1 Dmitrij Petrov - Master Thesis Presentation - Autumn 2017 Outlining Master Thesis

Transcript of Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem

Page 1: Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem

Discovering Data Science Design Patterns

with Examples from R and Python

Dmitrij Petrov

Autumn 2017

30/11/2017 1Dmitrij Petrov - Master Thesis Presentation - Autumn 2017

Outlining Master Thesis

Page 2: Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem

Motivation• Design patterns capture best solutions to recurring issues in

• Architecture• Started the Pattern Language Movement

• Object-Oriented Programming• Seminal work for software analysis, design and implementation

• Cloud Computing, Database Modelling, etc.

• Data Science

30/11/2017

Page 3: Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem

Research Questions

• RQ1: What exactly does software ecosystem, data science and design pattern mean?

• RQ2: Which data science-oriented design patterns can be recognized?

• RQ3: What are the specific FOSS R and Python tools that can be used for solving common data mining problems?

30/11/2017 3Dmitrij Petrov - Master Thesis Presentation - Autumn 2017

Page 4: Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem

Methodology – 3D2P framework

Dmitrij Petrov - Master Thesis Presentation - Autumn 2017

Pattern prospecting

Pattern mining Pattern writing

- Literature Sources- General Inductive Approach &

Open/Axial Coding

- Discovery of patterns (i.e. best practises and their relationships)

Relevant works of: Thomas (‘06), Inventado & Scupelli (‘15), Meszaros & Doble (‘96)

- Follow PW guidelines for their documentation

Page 5: Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem

A Pattern Example – “Build Me Dataset”

“Build Me Dataset”

Dmitrij Petrov - Master Thesis Presentation - Autumn 2017

1. Pattern Name & Sketch2. Context: you want to process data from multiple data sources/formats

3. Problem: extracting/storing data in a common data structure

4. Solution: “table” “data frame”

5. Consequences: can be very simple but also slow

6. Known uses: modelling, visualization…

7. Examples: from R & Python ecosystem

30/11/2017 5

Page 6: Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem

Expected Outcomes

1. Aim to formulate Data Science design patterns

2. Data Science R and Python Toolkit Matrix• A holistic map of tools can simplify knowledge discovery process

30/11/2017 6Dmitrij Petrov - Master Thesis Presentation - Autumn 2017