Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD...
-
Upload
georgina-richard -
Category
Documents
-
view
219 -
download
1
Transcript of Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD...
![Page 1: Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five.](https://reader038.fdocuments.us/reader038/viewer/2022103005/56649d955503460f94a7e408/html5/thumbnails/1.jpg)
Introduction to Data Science – INFO 480 – Drexel University’s
iSchoolSean P. Goggins, PhD
April 30, 2013
Week Five
![Page 2: Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five.](https://reader038.fdocuments.us/reader038/viewer/2022103005/56649d955503460f94a7e408/html5/thumbnails/2.jpg)
What is Data Science?
Storytelling
Database Theory – How you organize your data has a big influence on what you can do with it.
Agile Manifesto – Key thing is iterative development; it’s a technology value system.
Spiral Dynamics – What we view as fact and what we desire emerges from the data presented to us.
Credit: http://www.datascientists.net/what-is-data-science
![Page 3: Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five.](https://reader038.fdocuments.us/reader038/viewer/2022103005/56649d955503460f94a7e408/html5/thumbnails/3.jpg)
Tonight
Share Software for transformation on GitHub
Share How you approached the assigment with the class (individually) Ask questions
Make sure you understand everyone’s approach
Help each other – The result not the language or technique used to transform data are what matter
Use network scripts from week one to transform your transformed data (that’s right!) into networks. Groups of 3
![Page 4: Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five.](https://reader038.fdocuments.us/reader038/viewer/2022103005/56649d955503460f94a7e408/html5/thumbnails/4.jpg)
Week Five
Software Sharing #1 (Share scripts produced in week 3 using an open source software configuration management tool). Students will refine and then share their
scripts with other students Included in the assignment is a 500 word
explanation of how their script could be improved, optimized and adapted to other data of a similar type.
The “read me” file distributed with the script will explain to another user how to apply the script to the data distributed in assignment one. This will include specific, technical specifications.
![Page 5: Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five.](https://reader038.fdocuments.us/reader038/viewer/2022103005/56649d955503460f94a7e408/html5/thumbnails/5.jpg)
Using GitHub for Software Sharing
Creating a GitHub Account
Creating a GitHub Project
Using the GitHub Desktop client
Committing & Syncing
The Pull Request
Sharing Your Software! For my respository
Create a directory with your name under “student Files”
Put your assignment in there
Create a “pull request”
![Page 6: Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five.](https://reader038.fdocuments.us/reader038/viewer/2022103005/56649d955503460f94a7e408/html5/thumbnails/6.jpg)
Discuss Homework
Analysis Questions. Write up a short essay with tables or graphs if needed to describe how you would: Build a network using the scripts from week1
against the mention connections? Reply-To connections? In this sample data. What transformations are required? How would you filter the data? Use the actual data to ground your thinking. Feel free to actually write or modify the R code samples from the first two weeks to experiment. Some of you will be more comfortable doing this; some will be more comfortable addressing the question conceptually. This is OK.
![Page 7: Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five.](https://reader038.fdocuments.us/reader038/viewer/2022103005/56649d955503460f94a7e408/html5/thumbnails/7.jpg)
Individual Presentations
Informally by you!
![Page 8: Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five.](https://reader038.fdocuments.us/reader038/viewer/2022103005/56649d955503460f94a7e408/html5/thumbnails/8.jpg)
Remembering Networks
![Page 9: Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five.](https://reader038.fdocuments.us/reader038/viewer/2022103005/56649d955503460f94a7e408/html5/thumbnails/9.jpg)
Underpants Gnomes
With much discourtesy from the US TV Program “South Park”
Motivation
![Page 10: Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five.](https://reader038.fdocuments.us/reader038/viewer/2022103005/56649d955503460f94a7e408/html5/thumbnails/10.jpg)
Underpants GnomesMotivation
![Page 11: Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five.](https://reader038.fdocuments.us/reader038/viewer/2022103005/56649d955503460f94a7e408/html5/thumbnails/11.jpg)
Addressing The Underpants Gnome
Postulate
![Page 12: Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five.](https://reader038.fdocuments.us/reader038/viewer/2022103005/56649d955503460f94a7e408/html5/thumbnails/12.jpg)
12
Discussion Post•Read•Response
Classification•Open Coding•Axial Coding
Identification of Coordination Events•Time proximity•Topical proximity
Aggregation of Posts by
Topic
Weighted Network
Analysis of Interactions
Methodological Approach
Weight Connections Based on Time Distance, GroupedBy Topic and informed by analysis of time distance between posts.
Identify Key InformationBrokers
Group Informatics Described
![Page 13: Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five.](https://reader038.fdocuments.us/reader038/viewer/2022103005/56649d955503460f94a7e408/html5/thumbnails/13.jpg)
Network Transformation
Activity
![Page 14: Introduction to Data Science – INFO 480 – Drexel University’s iSchool Sean P. Goggins, PhD April 30, 2013 Week Five.](https://reader038.fdocuments.us/reader038/viewer/2022103005/56649d955503460f94a7e408/html5/thumbnails/14.jpg)
Week Six
Week 6: Sharing Data Preparation Results and Tools Readings and Assignments Due:
Presentation involves sharing data with other people in a way that is visually insightful. Students will be asked to bring an example of a visualization of data from a website or news organization, and make a short presentation about what makes the visualization insightful.
Data Visualization Example Presentation
Chapters 4-7 of “The Anarchist in the Library: How the Clash Between Freedom and Control is Hacking the Real World and Crashing the System”.