Mir Lec1 Print
Transcript of Mir Lec1 Print
-
7/24/2019 Mir Lec1 Print
1/42
Multimedia TechnologyLecture 1: Overview and Arrangement
Lecturer: Dr.Wan-Lei Zhao
Autumn Semester 2015
Contact: [email protected] / 4 2
All rights are reserved by Wan-Lei zhao
-
7/24/2019 Mir Lec1 Print
2/42
About this Course
Outline
1 About this Course
2 Syllabus
3 Course plan
4 Brief History about IR and Web
5 Brief History about WWW
2 / 4 2
All rights are reserved by Wan-Lei zhao
-
7/24/2019 Mir Lec1 Print
3/42
About this Course
Major subjects Deal with information such as text, image and video
Text retrieval, content-based image retrieval and video retrieval
Focus on how to retrieve above mentioned information Popular machine learning approaches will be covered
K-means, SVM and decision tree Popular model fitting approaches will be covered
RANSAC and Hough transform
Popular algorithms in computer vision will be covered
SIFT, BoVW and Hamming Embedding Objectives
Bring you into this interesting topic Get you familiar with basic & popular algorithms in this field Able to build a simple but workable search engine on your own Able to apply algorithms to solve the problems in your field
3 / 4 2
All rights are reserved by Wan-Lei zhao
-
7/24/2019 Mir Lec1 Print
4/42
Syllabus
Text Retrieval (42 hours) Brief History about IR and Web Pre-processing on Text Information Three Retrieval Models
Boolean, vector and probability models
Evaluation Measure Web Search Parallel Computing in IR
Machine Learning Approaches (22 hours)
K-means Spectral clustering Decision Tree K-Nearest Neighbour
Support Vector Machine (SVM) Nearest Neighbour Search (12 hours)
R-Tree KD-Tree
Locality Sensitive Hashing Product Quantizer
4 / 4 2
All rights are reserved by Wan-Lei zhao
-
7/24/2019 Mir Lec1 Print
5/42
Syllabus
Model Fitting RANSAC Hough Transform
Image & Video Retrieval (22 hours) Challenges & Trends
Image Features: SIFT and et al. BoVW Framework Fisher Kernel Framework Challenges in Video Retrieval
Temporal Verification Approach Image Classification and MISC (12 hours)
Challenges & Trends One-against-all Framework Tricks in model training Convolutional Neural Network
5 / 4 2
All rights are reserved by Wan-Lei zhao
-
7/24/2019 Mir Lec1 Print
6/42
Syllabus
Course work in the lab (32 hours) Three experiments Subjects that you learn in the class Keep secret until the lab time Each time, it is also aquiz 10 marksfor each experiment NOteam work!!! Late submission is allowed, but with 30% discount
Presentation of the course project (22 hours)
Two course projects Implement after class Team work is encouraged, butsize(team)4 15 minutes for each team to present their project
A hardcopy of the project report is also required
6 / 4 2
All rights are reserved by Wan-Lei zhao
S ll b
-
7/24/2019 Mir Lec1 Print
7/42
Syllabus
Prerequisites of this course
Data Structure You have to be familiar with it Otherwise, you are not suggested to take this course
Good at C/C++ It will be used in the lab It is recommended for your course project
Basic knowledge about Internet Internet protocols Mechanism of WWW HTML and Javascript
Matlab is a plus It will be used in the lab Even you do not know, it does not matter You will learn its basics during this course
7 / 4 2
All rights are reserved by Wan-Lei zhao
S ll b
-
7/24/2019 Mir Lec1 Print
8/42
Syllabus
Teaching assistant for this course
Mr. Zhihui Chen will be in charge of the course project related issues
Miss Haihui Liu helps to do proofreading on the course materials
Experiment lectures are held in Labotrary building, Room 501
Time slot: 2:30pm -4:20pm, in the 6th, 8th and 10th weeks I will remind you one week ahead
8 / 4 2
All rights are reserved by Wan-Lei zhao
Syllabus
-
7/24/2019 Mir Lec1 Print
9/42
Syllabus
Course website
Platform of online teaching in XMU URL: l.xmu.edu.cn, please go to there and register the course Password: 007
9 / 4 2
All rights are reserved by Wan-Lei zhao
Syllabus
-
7/24/2019 Mir Lec1 Print
10/42
Syllabus
Language in the Class
English or Chinese?
You might be uncomfortable at
the beginning Me too:)
Several advantages: Computer science is defined in
English Get you guys used to English
10/42
All rights are reserved by Wan-Lei zhao
Syllabus
-
7/24/2019 Mir Lec1 Print
11/42
Syllabus
Intersection of four disciplines
Related (top-ranked) Conferences: ACM SIGIR, WWW ACM MM, ACM ICMR & ACM ICME IEEE CVPR & ECCV IEEE ICCV, IEEE ACCV, IEEE ACCV & BMVC ICML & AAAI
11/42
All rights are reserved by Wan-Lei zhao
Syllabus
-
7/24/2019 Mir Lec1 Print
12/42
Syllabus
Related (top-ranked) Journals: IEEE Trans. on Knowledge and Data Engineering IEEE Trans. on Pattern Analysis and Machine Intelligence International Journal of Computer Vision IEEE Trans. on Multimedia IEEE Trans. on Image Processing Computer Vision and Image Understanding
Reference Books R. Baeza-Yates and et al., Modern Information Retrieval: The
Concepts and Technology behind Search (2nd edition) Richard Szeliski, Computer Vision: Algorithms and Applications Lecture notes of Machine Learning by Dr. Andrew Ng, from
Stanford University
Related papers will be suggested to read as assignment Online Resources:
Youku Wikipedia Baidu Baike
12/42
All rights are reserved by Wan-Lei zhao
Syllabus
-
7/24/2019 Mir Lec1 Print
13/42
y
Question: can our brain understand how our brain works? We are going to have a taste that how tough this question is from
two aspects
1 Computer Vision2 Machine Learning
13/42
All rights are reserved by Wan-Lei zhao
Course plan
-
7/24/2019 Mir Lec1 Print
14/42
p
Evaluation: 3 lab experiments + 2 course projectsS= 30% + 35% + 35%
About course projects Implemented in C, C++/Python, Matlab If you do not know Python or Matlab, learn it!!
Sample codes will be given, you only need to fill blanks Team work is encouraged for the two course projects Team leader will be marked 5 credits higher or lower depending on the
performance
Report (only the second one) and presentation (both) are required (inEnglish if possible)
Failure is acceptable but nocheatingorplagiarism
If it happens, you are OUT!! Any questions?
14/42
All rights are reserved by Wan-Lei zhao
Course plan
-
7/24/2019 Mir Lec1 Print
15/42
Be an Active Learner
Level 1 Catch the concept
Level 2
Understand the idea Know how to use it
Level 3 Able to re-implement the algorithms Knows where it works Knows where it fails
15/42
All rights are reserved by Wan-Lei zhao
Brief History about IR and Web
-
7/24/2019 Mir Lec1 Print
16/42
Outline
1 About this Course
2 Syllabus
3
Course plan
4 Brief History about IR and Web
5 Brief History about WWW
16/42
All rights are reserved by Wan-Lei zhao
Brief History about IR and Web
-
7/24/2019 Mir Lec1 Print
17/42
Human Languages (1)
7,000 languages in the world
90% of these languages are used by less than 100,000 people
Based on your knowledge and imagination Please list out top-5 most popularly used languages
Give the rank also, do it now ...
17/42
All rights are reserved by Wan-Lei zhao
Brief History about IR and Web
-
7/24/2019 Mir Lec1 Print
18/42
Human Languages (1)
7,000 languages in the world 90% of these languages are used by less than 100,000 people
Language Population Category Region
Mandarin 1.2 billion isolating language China
English 508 million reflecting language UK, North America
Hindi 497 million reflecting language India & Pakistan
Spanish 392 million reflecting language Span & South AmericaRussian 277 million reflecting language Russia & East Europe
Mainly talk about retrieval on English documents Mention a little about processing on Chinese documents
18/42
All rights are reserved by Wan-Lei zhao
Brief History about IR and Web
-
7/24/2019 Mir Lec1 Print
19/42
Human Languages (2)
Figure : Weights of real impact to the world.
In terms of real influence, the rank changes1
Influence: economically, politically, size of population and number ofcountries
1Conducted by Webb.19/42
All rights are reserved by Wan-Lei zhao
Brief History about IR and Web
-
7/24/2019 Mir Lec1 Print
20/42
Distribution of World Languages
Pay attention that not all the languages have their written forms
20/42
All rights are reserved by Wan-Lei zhao
Brief History about IR and Web
-
7/24/2019 Mir Lec1 Print
21/42
Evolution of Storage Media
Egyptian papyrus2 Babylonian clay tablet (3000 B.C.) Chinese Oracle (1400 B.C.)
In 105 A.D., paper was invented in China
2It is not paper in real sense.21/42
All rights are reserved by Wan-Lei zhao
Brief History about IR and Web
-
7/24/2019 Mir Lec1 Print
22/42
Story of Rosetta Stone
Written in both acient Egyptian and Greek, discovered in 1799
in 196 BC on behalf of King Ptolemy V.
Key to understanding of acient Egyptian J.-F. Champollion decoded the language
22/42
All rights are reserved by Wan-Lei zhao
Brief History about IR and Web
-
7/24/2019 Mir Lec1 Print
23/42
library comes from Latin word liber, means book
bibliothek comes from Greek word biblion, means book writtenon papyrus
23/42
All rights are reserved by Wan-Lei zhao
Brief History about IR and Web
-
7/24/2019 Mir Lec1 Print
24/42
Spread of ancient civilizations
Five ancient civilizations: ancient Egypt, ancient Babylion, ancientIndia, ancient China, ancient Maya
24/42
All rights are reserved by Wan-Lei zhao
Brief History about IR and Web
-
7/24/2019 Mir Lec1 Print
25/42
The first library (as far as we know) was established in north Syria,around 3000 BC
Later, Empire Assyria built Library Nineveh (current Mosul) in 612BC
Best well-known library was built by Alexander the Great about 350
BC in Egypt
In China, library appeared around 800 BC
25/42
All rights are reserved by Wan-Lei zhao
Brief History about IR and Web
E l i f S M di
-
7/24/2019 Mir Lec1 Print
26/42
Evolution of Storage Media
After the advent of computer
26/42
All rights are reserved by Wan-Lei zhao
Brief History about IR and Web
IR i diff
-
7/24/2019 Mir Lec1 Print
27/42
IR in two different eras
before WWW WWW era
Media text document, TV, film & CD in electronic forms
Publishing months or years hoursStorage books & papers disc, DVD and etc & web
Indexing title, author, keywords and date and contents
Interface library browser
According to IBM, 90% of the knowledge in the world are created inlast two years
Powerful IR system is required to coordinate the distribution ofinformation/knowledge
27/42
All rights are reserved by Wan-Lei zhao
Brief History about WWW
Th Bi h f WWW
-
7/24/2019 Mir Lec1 Print
28/42
The Birth of WWW
1981-1991: the invention of the Web In 1980, Tim Berners-Lee worked in CERN (European Organization for
Nuclear Research) Manage information for physicists such that they can share In 1984, he returned to CERN In 1989, he wrote a proposal about large hypertext database By Christmas 1990, he built all necessary elements for web HTTP, HTML, web browser and httpd
28/42
All rights are reserved by Wan-Lei zhao
Brief History about WWW
Th th f W ld Wid W b
-
7/24/2019 Mir Lec1 Print
29/42
The growth of World Wide Web
Early times of growth (1991-1995) Microsoft has its first browser: Cello Mosaic (from UIUC) is the first successful browser W3C was founded by Berners-Lee in 1994 at MIT
Commercialize (1996-1998) More and more dot-coms appeared
Boom and Bust (1999-2001)
More and more dot-coms appeared Internet becomes popular in China Many currently well-known companies were established: Baidu,Alibaba Search Engines were born
29/42
All rights are reserved by Wan-Lei zhao
Brief History about WWW
Th th f W ld Wid W b
-
7/24/2019 Mir Lec1 Print
30/42
The growth of World Wide Web
Early times of growth (1991-2001) First version of Java was released in 1995 First version of PHP was released in 1995 JavaScript was invented by Netscape in 1995 Static web to dynamic web Strong support for multimedia
30/42
All rights are reserved by Wan-Lei zhao
Brief History about WWW
WWW is everywhere
-
7/24/2019 Mir Lec1 Print
31/42
WWW is everywhere
Ubiquitous web (2002-present) Introduction of Web 2.0 is the milestone Wikipedia was born in 2001 Flickr was born in 2004 Facebook was born in 2004 Youtube was born in 2006 Twitter was born in 2006 Smartphone was released in 2007
All technologies and media are intertwined to reshape the world
Impact on our daily life of many aspects
IR becomes the main interface to them all
31/42
All rights are reserved by Wan-Lei zhao
Brief History about WWW
Semantic Web
-
7/24/2019 Mir Lec1 Print
32/42
Semantic Web
Web 3.0 (20??) Proposed by Berners-Lee3
Websites are linked by semantic meta data Machine builds the link automatically Requires technology of natural language understanding Still a vague concept
Automatic documenting, e.g. books and recipes
3Weaving the Web: The Original Design and Ultimate Destiny of the World WideWeb, in American Scientific, 2000
32/42
All rights are reserved by Wan-Lei zhao
Brief History about WWW
Statistics on WWW
-
7/24/2019 Mir Lec1 Print
33/42
Statistics on WWW
100M
1B
2B
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Number
Year
Num. of websites and users (2000-1013)
Num. of sitesNum. of users
The growth rate of user is much higher than that of websites The growth rate of clicks would be even much higher
33/42
All rights are reserved by Wan-Lei zhao
Brief History about WWW
Challenges in Modern Information Retrieval
-
7/24/2019 Mir Lec1 Print
34/42
Challenges in Modern Information Retrieval
How to bridge such a semantic gap
A word is worth a thousand pictures
A picture is worth a thousand of words
34/42
All rights are reserved by Wan-Lei zhao
Brief History about WWW
Scalability in the age of BIG data (1)
-
7/24/2019 Mir Lec1 Print
35/42
Scalability in the age ofBIG data (1)
A glance at big data today 1.1billion websites until Nov. 2014 >3,000images uploaded to Flickr in every minute4
>200,000videos uploaded per day to YouTube (>1,000years) TV News: thousands hours of programs broadcasted each day >100 billion photos in Facebook till Jun. 2011
Challenges: facilitate fast browsing and sharing How to store? How to organize? How to retrieve?
4Statistics was collected on Apr. 28th 2010.35/42
All rights are reserved by Wan-Lei zhao
Brief History about WWW
Scalability in the age of BIG data (2)
-
7/24/2019 Mir Lec1 Print
36/42
Scalability in the age ofBIG data (2)
Given the thickness of one photo: 0.2 mm 36/42All rights are reserved by Wan-Lei zhao
Brief History about WWW
Top Rank Search Engines
-
7/24/2019 Mir Lec1 Print
37/42
Top Rank Search Engines
Google takes lions share of the market
Baidu is not in the rank (unfortunately)5
5Cited from: http://www.ebizmba.com/articles/search-engines37/42
All rights are reserved by Wan-Lei zhao
Brief History about WWW
Sketch the framework of a search engine
-
7/24/2019 Mir Lec1 Print
38/42
Sketch the framework of a search engine
Draw a framework about a search engine in 5 minutes
Put all elements you could figure out, do it now ...
38/42
All rights are reserved by Wan-Lei zhao
Brief History about WWW
Framework of a search engine
-
7/24/2019 Mir Lec1 Print
39/42
Framework of a search engine
Observations Information are highly distributed in Internet The indexer (search engine) keeps information in a centralized manner
39/42
All rights are reserved by Wan-Lei zhao
Brief History about WWW
Structure of a crawler
-
7/24/2019 Mir Lec1 Print
40/42
Structure of a crawler
Observations
Crawler plays very important role Experiences of using Baidu and Google
40/42
All rights are reserved by Wan-Lei zhao
-
7/24/2019 Mir Lec1 Print
41/42
Q & A
41/42
All rights are reserved by Wan-Lei zhao
-
7/24/2019 Mir Lec1 Print
42/42
Thanks for your attention!
42/42
All rights are reserved by Wan-Lei zhao