Social Web: (Big) Data Mining | summer 2014/2015 course syllabus
-
Upload
jakub-ruzicka -
Category
Education
-
view
2.319 -
download
7
description
Transcript of Social Web: (Big) Data Mining | summer 2014/2015 course syllabus
JAKUB RŮŽIČKA [email protected] cz.linkedin.com/in/littlerose
summer semester 2014/2015
SOCIAL WEB:(BIG) DATA MINING
bachelor‘s course | ISS FSV UK | JSB454
course syllabus
[version 1.1]
outline
Outline
General information Intended Learning Outcomes
Syllabus Types of Instruction
Requirements,Examination
& Assignments
Course literature& Documentations
General Information
Social Web: (Big) Data Mining
outline
Social Web: (Big) Data Mining
The course givesa professional and academic introduction to web & social
media data mining.
Emphasis is put on the intersection of data science,
humanities & ICT.
• PhDr. Mgr. Ing.Petr Soukup
• Jakub Růžičkaguarantors
• Jakub Růžička
• Petr Soukuplecturers
• 7 ECTS
• elective coursecredits
• 1 lecture (80min) &1 tutorial/seminar (80min) per week
lectures
Intended Learning Outcomes
in which way the course should make
your life better & improve your skills
outline
Upon completion of the course, the studentswill be able to
understand the intersection of data science, humanities & ICT within the realm of web & social
media (big) data mining
ask meaningful questions, perform basic analytical
operations regarding both, structured & unstructured web /
social media data and draw conclusions for decision making
understand basic concepts and conduct subsequent data preprocessing, analysis &
visualization related to social network analysis, web mining,
social media mining & text mining
take a positive approach towards data science &
computer programming, gain confidence in basic operations and use or modify a third party
(open) source code or an analytical procedure/tool
describe advanced data mining methods & applications for
further self education(or subsequent institutional
education)or professional/academic
specialization
Syllabus
course outline | topics covered
outline
Course Overviewlectures are followed by tutorials in order to put knowledge into practicethe exact dates & content of the lectures may be subject to change based on pace & requirements of the course group
• Introduction to Data Mining & Data Analysis | Data Science | Digital HumanitiesLecture #1
• Big Data | Types of Data | Data Formats | Information Retrieval | Business Intelligence | Law & Ethics of Data MiningLecture #2
• Introduction to Web Technologies for Non-Tech Students | Database Systems | Web Programming | Semantic Web | APIsLecture #3
• Graph Theory | Social Network Analysis | Statistical Procedures, Apps & ToolsLecture #4
• Pseudocoding | Introduction to Programming in Python & data mining alternatives comparison | Data Exploration & PreprocessingLecture #5
• Web Scraping | Data Cleaning & Processing | Python Implementation & Libraries, Statistical Procedures, Apps & ToolsLecture #6
• Social Media Mining | Data Cleaning & Processing | Python Implementation & Libraries, Statistical Procedures, Apps & ToolsLecture #7
• Text Mining | Natural Language Processing | Python Implementation & Libraries, Statistical Procedures, Apps & ToolsLecture #8
• Data Visualization | Data Storytelling | Electronic Publishing | Python Implementation & Libraries, Statistical Procedures, Apps & ToolsLecture #9
• Student Webinars Week | Introducing Various Free & Open Source Data Mining Software & AppsLecture #10
• Machine Learning, Recommender Systems & Other More Advanced Topics | Large-Scale DataSets | MapReduce, Hadoop, NoSQLLecture #11
• Course Review | Semestral Projects Consultation & Adjustments | The Remaining 99% of Data Science | Data Science BuzzwordsLecture #12
Types of Instruction
& workload
outline
Types of Instruction & Workload
the course consists of
• lectures
• tutorials/seminars
• guest lectures(possibly webinars)
• student webinars
background, how-to, support & inspiration
during lectures& tutorials/seminars and
online course materials forself-directed students
workload | 150 hours
• lectures 16h
• tutorials/seminars 16h
• assignments
• team project 70h
• webinar 20h
• self-study 28h
outline
Teaching Method & Related Information
storytelling
• the course topics will be tied togehter via obtaining real-time (& real-life) data for decision making of a fictional political party
• teams of 2-3 students will be formed as a response to a need of studying more specific area of the political campaign |teams will be differentiated based on a specific topic/area of interest rather than types of analyses
collaboration
• teamwork & knowledge sharing will be strongly encouraged & facilitated| collaboration has its downsides as well but since there are too many ‘individual work‘ courses & too few ‘team work‘ courses, let‘s try work together for a change
BYOD Bring Your Own Device
• several software packages requiring installation & personalization will be used within the course
• BYOD is therefore recommended
beginner quite =) friendly
• although the course might be challenging for students with no analytical or computing background (introductory-level courses or professional experience), most of the time, you won‘t be required to create/write your own computer code ‘from scratch‘ (that would require another course) but you‘ll be provided with a working code (explained in a pseudocode) that you‘ll customize
• user-level knowledge of social media is assumed
Requirements,Examination& Assignments
(I.) 30% Webinar collaborative, teams of 2-3
(II.) 70% Project/Research collaborative, teams of 2-4
* the percentage stands for the significance of
the assignment regarding the final grade
outline
Grading
the grade is calculated on WEBINAR (30%) and
PROJECT/RESEARCH defence (70%)
the course is gradedA (>=85%), B (>=70%),C (>=60%), D (>=50%),
or E (<50%)
A, B or C is needed to pass the course
outline
(I.) Webinar 30% collaborative, teams of 2-3 students
assignment
• 1) familiarize yourself (in brief) with an assigned data mining tool or application (you might also choose your own if approved by the lecturer) and introduce it
• 2) replicate an analysis (cite your source) using the tool and explain the procedure & background information
• 3) prepare a short (5-15min) live webinar for your classmates & answer their questions (questions regarding your particular analysis only)
• 4) let them do peer assessment of your work
motivation
• the volume of various data science free & open source procedures, tools & applications grows rapidly, so you definitely won‘t ‘be done‘ after passing this course
• the volume of open educational resources (text, video, interactive etc.) is huge, the tools are usually well-documented & include sample analyses provided by the creators or by its community
• you‘ ll learn most by a hands-on approachand you‘ll get feedback from your peers
• brief description of the tool
• what it is for
• how one can use it
• where one can get it & learn it20%
• replication of an analysis
• background information
• clarity of the procedure60%
• question responses
• only questions related to the particular analysis count (one doesn‘t become an expert on a tool replicating one analysis =))
20%
outline
(II.) Project/Research 70% collaborative, teams of 2-4 students
assignment
• 1) mine/scrape, analyze & visualize available structured & unstructured web & social media data related to your team‘s area of specialization within the fictional political party campaign planning
• 2) prepare an executive summary in a form of storyline highlighting the most important findings for decision making
• 3) defend your project/research (examination)
motivation
• preparation for conducting a commercialor academic research including web & social media data mining & related analyses
• an opportunity to try everything out ‘under supervision‘ & get feedback on your work
• practicing teamwork skills, organizing &division of labour within a larger work group / institution
• executive summary, clarity &coherence of the data story and meeting all requirements on analyses used(see the next slide)
30%
• appropriateness & correctness of mining procedures & analyses used and of your data interpretation, consideration of limitations of your outcomes (critical context)
40%
• answers to questions regarding procedures, analyses & other ‘technical‘ details of your project/research
30%
outline
Disscussed within a project defence& included in a project executive summary
the story of your data(for decision making within
your specialization)visualizations, descriptions,
theoretical background, interpretations & highlights
social network analysis web scraping social media mining
text mining & natural language processing
critical review of the project & limitations of the
generalizability of your research
analytical appendixwith a hyperlink to source
tables & datasets
‘technical‘ appendix computations, programming code, request, queries etc.
Course literature& Documentations
• you are not required to read any of the following, but you might find it handy when
looking for inspiration, reference, sample analyses, sample code or when some part
of the course takes your interest so that you want to follow up with more in-depth
self-directed study
• further online/paperback study resources, tutorials, libraries, applications & tools will
be introduced within specific topics of the course
outline
Books
GOLBECK, Jennifer. ANALYZING THE SOCIAL WEB. Amsterdam: Morgan
Kaufmann, 2013. ISBN 01-240-5531-1.
TSVETOVAT, Maksim and Alexander KOUZNETSOV. SOCIAL NETWORK ANALYSIS FOR STARTUPS. O'Reilly,
2011. ISBN 978-144-9306-465.
HANSEN, Derek, Ben SCHNEIDERMAN and Marc SMITH. ANALYZING SOCIAL MEDIA NETWORKS WITH NODEXL:
INSIGHTS FROM A CONNECTED WORLD. Burlington, MA: Morgan
Kaufmann, 2011. ISBN 01-238-2229-7.
MURRAY, Scott. INTERACTIVE DATA VISUALIZATION FOR THE WEB.
Sebastopol, CA: O'Reilly Media, 2013. ISBN 14-493-6108-0.
STEELE, Julie and Noah ILIINSKY. BEAUTIFUL VISUALIZATION.
Sebastopol, CA: O'Reilly, 2010. ISBN 14-493-7986-9.
FRY, Ben. VISUALIZING DATA. Sebastopol, CA: O´Reilly, 2007. ISBN 05-
965-1455-7.
outline
Books
MCKINNEY, Wes. PYTHON FOR DATA ANALYSIS: DATA WRANGLING WITH
PANDAS, NUMPY, AND IPYTHON. Beijing: O'Reilly Media. ISBN 978-
1449319793.
RUSSELL, Matthew A. MINING THE SOCIAL WEB: DATA MINING
FACEBOOK, TWITTER, LINKEDIN, GOOGLE , GITHUB, AND MORE. 2nd
ed. Sebastopol: O´Reilly, 2014. ISBN 978-1-449-36761-9.
JANERT, Philipp K. DATA ANALYSIS WITH OPEN SOURCE TOOLS.
Sebastopol, CA: O'Reilly. ISBN 05-968-0235-8.
LUTZ, Mark. LEARNING PYTHON. 5th ed. Beijing: O'Reilly Media, 2013. ISBN
978-1449355739.
BIRD, Steven, Ewan KLEIN and Edward LOPER. NATURAL LANGUAGE
PROCESSING WITH PYTHON. Beijing: O´Reilly, 2009. ISBN 978-0596516499.
PERKINS, Jacob. PYTHON TEXT PROCESSING WITH NLTK 2.0
COOKBOOK. Birmingham, UK: PacktPublishing, 2010. ISBN 978-1849513609.
outline
Books
O'NEIL, Cathy and SCHUTT, Rachel. DOING DATA SCIENCE. Sebastopol, CA:
O'Reilly, 2013. ISBN 14-493-5865-9.
RAJARAMAN, Anand and Jeffrey ULLMAN. MINING OF MASSIVE
DATASETS. Cambridge: Cambridge University Press, 2012. ISBN 11-070-
1535-9.
NORTH, Matthew. DATA MINING FOR THE MASSES. Global Text Project, 2012.
ISBN 06-156-8437-8.
PROVOST, Foster. DATA SCIENCE FOR BUSINESS: WHAT YOU NEED TO KNOW ABOUT DATA MINING AND
DATA-ANALYTIC THINKING. Sebastopol, CA: O´Reilly. ISBN 978-1-
449-36132-7.
MINELLI, Michael, Michael CHAMBERS and DHIRAJ, Ambiga. BIG DATA BIG ANALYTICS: EMERGING BUSINESS
INTELLIGENCE AND ANALYTIC TRENDS FOR TODAY'S BUSINESSES.
Wiley, 2013. ISBN 111814760X.
BOSLAUGH, Sarah. STATISTICS IN A NUTSHELL. 2nd ed. Farnham, Surrey, England: O'Reilly, 2012. ISBN 14-493-
1682-4.
outline
Docummentations
https://www.python.org/doc/
http://www.w3schools.com/
https://github.com/ http://stackexchange.com/sites#
http://stackoverflow.com/
https://developers.facebook.com/docs/
https://dev.twitter.com/docs
https://developer.linkedin.com/apis
http://instagram.com/developer/
https://developers.google.com/+/
https://developers.pinterest.com/
https://developer.foursquare.com/
http://flowingdata.com/
http://www.informationisbeautiful.net/
http://www.reddit.com/
https://www.statsoft.com/textbook
http://learnpythonthehardway.org/book/
http://www.programmableweb.com/
http://www.pythonapi.com/
outline
self-directed learners, those who prefer distance/blended learning, those who want to know more,or those who don‘t want to rely on one source of information only might want to
Complement/substitute different parts of the course on
Coursera MIT OpenCourseWare
Stanford ONLINE edX
KhanAcademy Codecademy and many other Google it & learn it
resources
or YouTube it & watch it =)
JAKUB RŮŽIČKA [email protected] cz.linkedin.com/in/littlerose
summer semester 2014/2015
SOCIAL WEB:(BIG) DATA MINING
bachelor‘s course | ISS FSV UK | JSB454
course proposal
[version 1.1]