Social Web: (Big) Data Mining | summer 2014/2015 course syllabus

JAKUB RŮŽIČKA [email protected] cz.linkedin.com/in/littlerose

summer semester 2014/2015

SOCIAL WEB:(BIG) DATA MINING

bachelor‘s course | ISS FSV UK | JSB454

course syllabus

[version 1.1]

http://cz.linkedin.com/in/littlerose/

outline

Outline

General information Intended Learning Outcomes

Syllabus Types of Instruction

Requirements,Examination

& Assignments

Course literature& Documentations

General Information

Social Web: (Big) Data Mining

outline

Social Web: (Big) Data Mining

The course givesa professional and academic introduction to web & social

media data mining.

Emphasis is put on the intersection of data science,

humanities & ICT.

• PhDr. Mgr. Ing.Petr Soukup

• Jakub Růžičkaguarantors

• Jakub Růžička

• Petr Soukuplecturers

• 7 ECTS

• elective coursecredits

• 1 lecture (80min) &1 tutorial/seminar (80min) per week

lectures

Intended Learning Outcomes

in which way the course should make

your life better & improve your skills

outline

Upon completion of the course, the studentswill be able to

understand the intersection of data science, humanities & ICT within the realm of web & social

media (big) data mining

ask meaningful questions, perform basic analytical

operations regarding both, structured & unstructured web /

social media data and draw conclusions for decision making

understand basic concepts and conduct subsequent data preprocessing, analysis &

visualization related to social network analysis, web mining,

social media mining & text mining

take a positive approach towards data science &

computer programming, gain confidence in basic operations and use or modify a third party

(open) source code or an analytical procedure/tool

describe advanced data mining methods & applications for

further self education(or subsequent institutional

education)or professional/academic

specialization

Syllabus

course outline | topics covered

outline

Course Overviewlectures are followed by tutorials in order to put knowledge into practicethe exact dates & content of the lectures may be subject to change based on pace & requirements of the course group

• Introduction to Data Mining & Data Analysis | Data Science | Digital HumanitiesLecture #1

• Big Data | Types of Data | Data Formats | Information Retrieval | Business Intelligence | Law & Ethics of Data MiningLecture #2

• Introduction to Web Technologies for Non-Tech Students | Database Systems | Web Programming | Semantic Web | APIsLecture #3

• Graph Theory | Social Network Analysis | Statistical Procedures, Apps & ToolsLecture #4

• Pseudocoding | Introduction to Programming in Python & data mining alternatives comparison | Data Exploration & PreprocessingLecture #5

• Web Scraping | Data Cleaning & Processing | Python Implementation & Libraries, Statistical Procedures, Apps & ToolsLecture #6

• Social Media Mining | Data Cleaning & Processing | Python Implementation & Libraries, Statistical Procedures, Apps & ToolsLecture #7

• Text Mining | Natural Language Processing | Python Implementation & Libraries, Statistical Procedures, Apps & ToolsLecture #8

• Data Visualization | Data Storytelling | Electronic Publishing | Python Implementation & Libraries, Statistical Procedures, Apps & ToolsLecture #9

• Student Webinars Week | Introducing Various Free & Open Source Data Mining Software & AppsLecture #10

• Machine Learning, Recommender Systems & Other More Advanced Topics | Large-Scale DataSets | MapReduce, Hadoop, NoSQLLecture #11

• Course Review | Semestral Projects Consultation & Adjustments | The Remaining 99% of Data Science | Data Science BuzzwordsLecture #12

Types of Instruction

& workload

outline

Types of Instruction & Workload

the course consists of

• lectures

• tutorials/seminars

• guest lectures(possibly webinars)

• student webinars

background, how-to, support & inspiration

during lectures& tutorials/seminars and

online course materials forself-directed students

workload | 150 hours

• lectures 16h

• tutorials/seminars 16h

• assignments

• team project 70h

• webinar 20h

• self-study 28h

outline

Teaching Method & Related Information

storytelling

• the course topics will be tied togehter via obtaining real-time (& real-life) data for decision making of a fictional political party

• teams of 2-3 students will be formed as a response to a need of studying more specific area of the political campaign |teams will be differentiated based on a specific topic/area of interest rather than types of analyses

collaboration

• teamwork & knowledge sharing will be strongly encouraged & facilitated| collaboration has its downsides as well but since there are too many ‘individual work‘ courses & too few ‘team work‘ courses, let‘s try work together for a change

BYOD Bring Your Own Device

• several software packages requiring installation & personalization will be used within the course

• BYOD is therefore recommended

beginner quite =) friendly

• although the course might be challenging for students with no analytical or computing background (introductory-level courses or professional experience), most of the time, you won‘t be required to create/write your own computer code ‘from scratch‘ (that would require another course) but you‘ll be provided with a working code (explained in a pseudocode) that you‘ll customize

• user-level knowledge of social media is assumed

Requirements,Examination& Assignments

(I.) 30% Webinar collaborative, teams of 2-3

(II.) 70% Project/Research collaborative, teams of 2-4

* the percentage stands for the significance of

the assignment regarding the final grade

outline

Grading

the grade is calculated on WEBINAR (30%) and

PROJECT/RESEARCH defence (70%)

the course is gradedA (>=85%), B (>=70%),C (>=60%), D (>=50%),

or E (<50%)

A, B or C is needed to pass the course

outline

(I.) Webinar 30% collaborative, teams of 2-3 students

assignment

• 1) familiarize yourself (in brief) with an assigned data mining tool or application (you might also choose your own if approved by the lecturer) and introduce it

• 2) replicate an analysis (cite your source) using the tool and explain the procedure & background information

• 3) prepare a short (5-15min) live webinar for your classmates & answer their questions (questions regarding your particular analysis only)

• 4) let them do peer assessment of your work

motivation

• the volume of various data science free & open source procedures, tools & applications grows rapidly, so you definitely won‘t ‘be done‘ after passing this course

• the volume of open educational resources (text, video, interactive etc.) is huge, the tools are usually well-documented & include sample analyses provided by the creators or by its community

• you‘ ll learn most by a hands-on approachand you‘ll get feedback from your peers

• brief description of the tool

• what it is for

• how one can use it

• where one can get it & learn it20%

• replication of an analysis

• background information

• clarity of the procedure60%

• question responses

• only questions related to the particular analysis count (one doesn‘t become an expert on a tool replicating one analysis =))

20%

outline

(II.) Project/Research 70% collaborative, teams of 2-4 students

assignment

• 1) mine/scrape, analyze & visualize available structured & unstructured web & social media data related to your team‘s area of specialization within the fictional political party campaign planning

• 2) prepare an executive summary in a form of storyline highlighting the most important findings for decision making

• 3) defend your project/research (examination)

motivation

• preparation for conducting a commercialor academic research including web & social media data mining & related analyses

• an opportunity to try everything out ‘under supervision‘ & get feedback on your work

• practicing teamwork skills, organizing &division of labour within a larger work group / institution

• executive summary, clarity &coherence of the data story and meeting all requirements on analyses used(see the next slide)

30%

• appropriateness & correctness of mining procedures & analyses used and of your data interpretation, consideration of limitations of your outcomes (critical context)

40%

• answers to questions regarding procedures, analyses & other ‘technical‘ details of your project/research

30%

outline

Disscussed within a project defence& included in a project executive summary

the story of your data(for decision making within

your specialization)visualizations, descriptions,

theoretical background, interpretations & highlights

social network analysis web scraping social media mining

text mining & natural language processing

critical review of the project & limitations of the

generalizability of your research

analytical appendixwith a hyperlink to source

tables & datasets

‘technical‘ appendix computations, programming code, request, queries etc.

Course literature& Documentations

• you are not required to read any of the following, but you might find it handy when

looking for inspiration, reference, sample analyses, sample code or when some part

of the course takes your interest so that you want to follow up with more in-depth

self-directed study

• further online/paperback study resources, tutorials, libraries, applications & tools will

be introduced within specific topics of the course

outline

Books

GOLBECK, Jennifer. ANALYZING THE SOCIAL WEB. Amsterdam: Morgan

Kaufmann, 2013. ISBN 01-240-5531-1.

TSVETOVAT, Maksim and Alexander KOUZNETSOV. SOCIAL NETWORK ANALYSIS FOR STARTUPS. O'Reilly,

2011. ISBN 978-144-9306-465.

HANSEN, Derek, Ben SCHNEIDERMAN and Marc SMITH. ANALYZING SOCIAL MEDIA NETWORKS WITH NODEXL:

INSIGHTS FROM A CONNECTED WORLD. Burlington, MA: Morgan

Kaufmann, 2011. ISBN 01-238-2229-7.

MURRAY, Scott. INTERACTIVE DATA VISUALIZATION FOR THE WEB.

Sebastopol, CA: O'Reilly Media, 2013. ISBN 14-493-6108-0.

STEELE, Julie and Noah ILIINSKY. BEAUTIFUL VISUALIZATION.

Sebastopol, CA: O'Reilly, 2010. ISBN 14-493-7986-9.

FRY, Ben. VISUALIZING DATA. Sebastopol, CA: O´Reilly, 2007. ISBN 05-

965-1455-7.

http://www.amazon.com/Analyzing-Social-Web-Jennifer-Golbeck/dp/0124055311

http://www.amazon.com/Social-Network-Analysis-Startups-connections/dp/1449306462

http://www.amazon.com/Analyzing-Social-Media-Networks-NodeXL/dp/0123822297/ref=sr_1_1?ie=UTF8&qid=1397333850&sr=8-1&keywords=ANALYZING+SOCIAL+MEDIA+NETWORKS+WITH+NODEXL:+INSIGHTS+FROM+A+CONNECTED+WORLD

http://www.amazon.com/Interactive-Data-Visualization-Scott-Murray/dp/1449339735

http://www.amazon.com/Beautiful-Visualization-Looking-through-Practice/dp/1449379869/ref=sr_1_1?ie=UTF8&qid=1397333872&sr=8-1&keywords=BEAUTIFUL+VISUALIZATION

http://www.amazon.com/Visualizing-Data-Explaining-Processing-Environment/dp/0596514557/ref=sr_1_1?ie=UTF8&qid=1397333882&sr=8-1&keywords=Visualizing+data

outline

Books

MCKINNEY, Wes. PYTHON FOR DATA ANALYSIS: DATA WRANGLING WITH

PANDAS, NUMPY, AND IPYTHON. Beijing: O'Reilly Media. ISBN 978-

1449319793.

RUSSELL, Matthew A. MINING THE SOCIAL WEB: DATA MINING

FACEBOOK, TWITTER, LINKEDIN, GOOGLE , GITHUB, AND MORE. 2nd

ed. Sebastopol: O´Reilly, 2014. ISBN 978-1-449-36761-9.

JANERT, Philipp K. DATA ANALYSIS WITH OPEN SOURCE TOOLS.

Sebastopol, CA: O'Reilly. ISBN 05-968-0235-8.

LUTZ, Mark. LEARNING PYTHON. 5th ed. Beijing: O'Reilly Media, 2013. ISBN

978-1449355739.

BIRD, Steven, Ewan KLEIN and Edward LOPER. NATURAL LANGUAGE

PROCESSING WITH PYTHON. Beijing: O´Reilly, 2009. ISBN 978-0596516499.

PERKINS, Jacob. PYTHON TEXT PROCESSING WITH NLTK 2.0

COOKBOOK. Birmingham, UK: PacktPublishing, 2010. ISBN 978-1849513609.

http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793/ref=sr_1_1?ie=UTF8&qid=1397333782&sr=8-1&keywords=python+for+data+analysis

http://www.amazon.com/Mining-Social-Web-Facebook-LinkedIn/dp/1449367615/ref=sr_1_1?ie=UTF8&qid=1397333792&sr=8-1&keywords=mining+the+social+web

http://www.amazon.com/Data-Analysis-Open-Source-Tools/dp/0596802358/ref=sr_1_1?ie=UTF8&qid=1397333811&sr=8-1&keywords=data+analysis+with+open+source+tools

http://www.amazon.com/Learning-Python-Edition-Mark-Lutz/dp/1449355730

http://www.amazon.com/Natural-Language-Processing-Python-Steven/dp/0596516495

http://www.amazon.com/Python-Text-Processing-NLTK-Cookbook/dp/1849513600

outline

Books

O'NEIL, Cathy and SCHUTT, Rachel. DOING DATA SCIENCE. Sebastopol, CA:

O'Reilly, 2013. ISBN 14-493-5865-9.

RAJARAMAN, Anand and Jeffrey ULLMAN. MINING OF MASSIVE

DATASETS. Cambridge: Cambridge University Press, 2012. ISBN 11-070-

1535-9.

NORTH, Matthew. DATA MINING FOR THE MASSES. Global Text Project, 2012.

ISBN 06-156-8437-8.

PROVOST, Foster. DATA SCIENCE FOR BUSINESS: WHAT YOU NEED TO KNOW ABOUT DATA MINING AND

DATA-ANALYTIC THINKING. Sebastopol, CA: O´Reilly. ISBN 978-1-

449-36132-7.

MINELLI, Michael, Michael CHAMBERS and DHIRAJ, Ambiga. BIG DATA BIG ANALYTICS: EMERGING BUSINESS

INTELLIGENCE AND ANALYTIC TRENDS FOR TODAY'S BUSINESSES.

Wiley, 2013. ISBN 111814760X.

BOSLAUGH, Sarah. STATISTICS IN A NUTSHELL. 2nd ed. Farnham, Surrey, England: O'Reilly, 2012. ISBN 14-493-

1682-4.

http://www.amazon.com/Doing-Data-Science-Straight-Frontline/dp/1449358659/ref=sr_1_1?s=books&ie=UTF8&qid=1401972638&sr=1-1&keywords=doing+data+science

http://www.amazon.com/Mining-Massive-Datasets-Anand-Rajaraman/dp/1107015359/ref=sr_1_1?ie=UTF8&qid=1397333894&sr=8-1&keywords=MASSIVE+DATASETS

http://www.amazon.com/Data-Mining-Masses-Matthew-North/dp/0615684378/ref=sr_1_1?ie=UTF8&qid=1397333905&sr=8-1&keywords=DATA+MINING+FOR+THE+MASSES

http://www.amazon.com/Data-Science-Business-data-analytic-thinking/dp/1449361323/ref=sr_1_1?ie=UTF8&qid=1397333917&sr=8-1&keywords=DATA+SCIENCE+FOR+BUSINESS:+WHAT+YOU+NEED+TO+KNOW+ABOUT+DATA+MINING+AND+DATA-ANALYTIC+THINKING

http://www.amazon.com/Big-Data-Analytics-Intelligence-Businesses/dp/111814760X/ref=sr_1_1?ie=UTF8&qid=1397333935&sr=8-1&keywords=BIG+DATA+BIG+ANALYTICS:+EMERGING+BUSINESS+INTELLIGENCE+AND+ANALYTIC+TRENDS+FOR+TODAY'S+BUSINESSES

http://www.amazon.com/Statistics-Nutshell-Sarah-Boslaugh/dp/1449316824

outline

Docummentations

https://www.python.org/doc/

http://www.w3schools.com/

https://github.com/ http://stackexchange.com/sites#

http://stackoverflow.com/

https://developers.facebook.com/docs/

https://dev.twitter.com/docs

https://developer.linkedin.com/apis

http://instagram.com/developer/

https://developers.google.com/+/

https://developers.pinterest.com/

https://developer.foursquare.com/

http://flowingdata.com/

http://www.informationisbeautiful.net/

http://www.reddit.com/

https://www.statsoft.com/textbook

http://learnpythonthehardway.org/book/

http://www.programmableweb.com/

http://www.pythonapi.com/

https://www.python.org/doc/

http://www.w3schools.com/

https://github.com/

http://stackexchange.com/sites

http://stackoverflow.com/

https://developers.facebook.com/docs/

https://dev.twitter.com/docs

https://developer.linkedin.com/apis

http://instagram.com/developer/

https://developers.google.com/+/

https://developers.pinterest.com/

https://developer.foursquare.com/

http://flowingdata.com/

http://www.informationisbeautiful.net/

http://www.reddit.com/

https://www.statsoft.com/textbook

http://learnpythonthehardway.org/book/

http://www.programmableweb.com/

http://www.pythonapi.com/

outline

self-directed learners, those who prefer distance/blended learning, those who want to know more,or those who don‘t want to rely on one source of information only might want to

Complement/substitute different parts of the course on

Coursera MIT OpenCourseWare

Stanford ONLINE edX

KhanAcademy Codecademy and many other Google it & learn it

resources

or YouTube it & watch it =)

https://www.coursera.org/

http://ocw.mit.edu/index.htm

http://online.stanford.edu/

https://www.edx.org/

https://www.khanacademy.org/

http://www.codecademy.com/

https://www.google.com/

https://www.youtube.com/

JAKUB RŮŽIČKA [email protected] cz.linkedin.com/in/littlerose

summer semester 2014/2015

SOCIAL WEB:(BIG) DATA MINING

bachelor‘s course | ISS FSV UK | JSB454

course proposal

[version 1.1]

http://cz.linkedin.com/in/littlerose/

Social Web: (Big) Data Mining | summer 2014/2015 course syllabus

Education

Transcript of Social Web: (Big) Data Mining | summer 2014/2015 course syllabus