Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015...

21
Seminar Thesis Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1

Transcript of Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015...

Page 1: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

Seminar ThesisEvangelos Pournaras, Izabela Moise, Dirk Helbing

Evangelos Pournaras, Izabela Moise, Dirk Helbing 1

Page 2: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

Outline

• Objective

• Setup Information

• Important Dates

• Seminar Proposal

• Assessment Criteria

• Guideline

Evangelos Pournaras, Izabela Moise, Dirk Helbing 2

Page 3: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

Objective

• Learn how to define a research question/hypothesis thatcan be tackled with data science.

• Develop experience about which methods of data scienceto apply and how.

• Put into practice the technical skills of data science.

• Learn how to present data science written and orally.

• Learn to collaborate.

TipYou need to reflect that you are more that a programmer or astatistician!

Evangelos Pournaras, Izabela Moise, Dirk Helbing 3

Page 4: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

Setup Information

• Form groups of 2-3 people.

• Create a GitHub repository with your project at:https://github.com/data-science-course

• The seminar proposal submission is the README file in yourGitHub repository.

• The use of the provided thesis template at GitHub isobligatory.

• The seminar thesis submission is the thesis.pdf fileaccompanied with all typesetting and software sources in yourGitHub repository.

Evangelos Pournaras, Izabela Moise, Dirk Helbing 4

Page 5: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

Important Dates

• Submission deadlines:1. Seminar proposal:18/03/2015 midnight2. Seminar thesis: 06/05/2015 midnight

• Oral presentations: 19/05/2015 & 26/05/2015

Meeting the deadlines is crucial!Unfair exceptions cannot be made.

Evangelos Pournaras, Izabela Moise, Dirk Helbing 5

Page 6: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

Seminar Proposal I

How can you choose a topic?

• Choose a research area/topic that you like!

• Choose a research area about which you want to learn more.

• Start with an interesting paper you read.

• Follow one of our recommendations.

• Still not sure? Come to talk to us!

Evangelos Pournaras, Izabela Moise, Dirk Helbing 6

Page 7: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

Seminar Proposal II

• Proposal size: approximately 1 page

• Proposal assessment:accept,accept with minor changes,revise

TipMake clear the research question/hypothesis you will study andoutline why you will adopt a data science approach.

Evangelos Pournaras, Izabela Moise, Dirk Helbing 7

Page 8: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

Assessment Criteria of Semester Thesis

• Scientific clarity - 25%

• Technical clarity - 25%

• Writing and Presentation- 25%

• Oral Presentation - 25%

• Participation in data collection - 10% Bonus!

Evangelos Pournaras, Izabela Moise, Dirk Helbing 8

Page 9: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

A Semester Thesis Guideline

... for a successful project!

1. Define the challenge

2. Define the outcome and its significance

3. Reason about a data-science approach

4. Select the data sources

5. Define evaluation metrics and measurements

6. Build the data analytics pipeline

7. Perform validation and evaluation

8. Draw conclusions and future work

Evangelos Pournaras, Izabela Moise, Dirk Helbing 9

Page 10: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

1. Define the challenge I

Techno-socio-economic systems: System that involvetechnological, social and economic components. Examples:

• Power/water/gas networks

• The Internet

• Transportation/traffic systems

• Financial/energy markets

• Food supply chains

Evangelos Pournaras, Izabela Moise, Dirk Helbing 10

Page 11: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

1. Define the challenge II

Define a research question or a hypothesis that can beaddressed with data science.

ExamplesHow power demand can be adjusted to meet available supply?How power demand can be adjusted to meet available supply usingrenewables?The power supply costs in Switzerland can be minimized with areal-time tariff pricing program.How can we predict the people’s location without using GPS data?

Evangelos Pournaras, Izabela Moise, Dirk Helbing 11

Page 12: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

2. Define the outcome of your project

• Understanding one or more phenomena.

• A more effective policy.

• A decision-making process/method.

• A mechanism, software prototype, algorithm, tool or othersystem that improves techno-socio-economic systems.

TipWhat is the impact and significance of this outcome and for whom?

Evangelos Pournaras, Izabela Moise, Dirk Helbing 12

Page 13: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

3. Reason about a data science approach

• Data science vs. agent-based simulation vs. analytic solutions.

• Why data science?

• Are other methods feasible as well?

• Combining data science with other methods?

• Which approaches are followed in literature?

• What limitations are expected with a data science approach?

Evangelos Pournaras, Izabela Moise, Dirk Helbing 13

Page 14: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

4. Select the data sources I

Availability of data sources in the course repository:

1. Compilation of existing data sources.– https://github.com/data-science-course/data-sources– Portals with open data portals.– Data available upon request.

2. Data collected during the course.– https://github.com/data-science-course/data-collection– Planetary Nervous System.– A participatory & collective data acquisition process.

Evangelos Pournaras, Izabela Moise, Dirk Helbing 14

Page 15: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

4. Select the data sources II

TipYou may explore the data sources first before you define a researchquestion or hypothesis.

Criteria for data selection:

• Data type: numerical/textual, sparse/temporal,aggregated/disaggregated, network (graph).

• Data quality, missing/noisy data.

• Use/composition of different datasets.

Evangelos Pournaras, Izabela Moise, Dirk Helbing 15

Page 16: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

5. Define evaluation metrics and measurements

Coming up with meaningful metrics that quantify the observationsand findings expected from the data analytics applied.

ExampleHow power demand can be adjusted to meet available supply?

Evaluation metrics: Matching distance/correlation

ExampleHow can we predict the people’s location without using GPS data?

Evaluation metrics: accuracy (standard error, root mean squareerror), density, dispersion, probabilities etc.

Evangelos Pournaras, Izabela Moise, Dirk Helbing 16

Page 17: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

6. Build the data analytics pipeline

Design a sequence of I/O operations applied in the data.Examples of stages:

• Data transformation (log, square root, power, multiplicativeinverse, linear map)

• Data pre-processing (filtering, noise reduction, outlierdetection)

• Data mining (clustering, regression analysis, classification,neural networks, etc.)

The operations can have loops and multiple stages.In theory they can be as complex as you need.Tools: Weka, RapidMiner

Evangelos Pournaras, Izabela Moise, Dirk Helbing 17

Page 18: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

7. Perform validation and evaluation

• Computations of evaluation metrics.

• Make sure you have some ’control data’ for comparisons.

• Testing of different data corresponding to different scenarios.• Be concrete and quantitative! e.g. A is better than B vs. The

performance of A is 20% higher than B– How can we predict the people’s location without using GPS

data?– Comparisons: Prediction via social media data vs. prediction

via traffic data.

• Choose & apply statistical tests to evaluate significance.– Coefficient of determination, residual analysis.

Evangelos Pournaras, Izabela Moise, Dirk Helbing 18

Page 19: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

8. Draw conclusions and future work

State evidently the answer in the defined research question or theconfirmation/rejection of the hypothesis.Discuss the implication and impact on the outcome of thisresearch.Discuss future work.

Evangelos Pournaras, Izabela Moise, Dirk Helbing 19

Page 20: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

Questions?

Evangelos Pournaras, Izabela Moise, Dirk Helbing 20

Page 21: Seminar Thesis - ETH Z · 1.Seminar proposal:18/03/2015 midnight 2.Seminar thesis: 06/05/2015 midnight Oral presentations: 19/05/2015 & 26/05/2015 Meeting the deadlines is crucial!

What is next?

• A case-study following the guideline.

• Application domain: Smart Grids

Evangelos Pournaras, Izabela Moise, Dirk Helbing 21