Jian-hua Yeh (葉建華 -...
Transcript of Jian-hua Yeh (葉建華 -...
-
Introduction to Google
Jian-hua Yeh (葉建華)[email protected]
-
2
Lecture Outline
• What are Google’s services?
• Inventing Google
• Current status of Google
• GMail service
• Google Office?
• iGoogle?
• What Google can not do
-
3
The Google Services
• Web search
• Images search
• Video search
• News search
• Maps
• Mail
• More?
-
10 Cool Things You Can Do With Google
-
5
1. Basic Searching
http://www.google.com/
-
6
Basic Searching Step-by-Step
Select search term(s)
Enter search term(s) into search box
Click Search or Press Enter key
Browse Results
http://www.google.com/
-
7
2. Advanced Searching
Go to www.googleguide.com for more on how to use Google’s Basic and Advanced Search
Click on “Advanced Search” on main Google Page
http://www.google.com/advanced_search?hl=en
-
8
Better Searches, Better Results
Exact Phrase [“one small step for man”]
Excluded Words [bass –fishing, virus -computer]
Similar Words [~mobile phone]
Multiple Words (or) [Maui OR Hawaii]
Multiple Words (and) [vacation Hawaii]
-----------------------------------------------------------
“I’m feeling lucky” [takes you directly to first web page returned for your query]
-
9
3. Definitions
“define ______”or “define: ____”
Definitions gathered from around the Web
-
10
Define “Blog”
http://www.google.com/
-
11
4. Calculator
Addition +
Subtraction –
Multiplication *
Division /
Percentages %of
Exponents ^
http://www.google.com/
-
12
“15.99 + 32.50 + 13.25”
http://www.google.com/
-
13
5. Numbers
Phone #s
Tracking #s
VIN #s
UPC codes
Area Codes
More…
http://www.google.com/
-
14
Examples of Number Searches
Phone numbers
Area codes
Tracking packages by #
UPC Codes
VIN #s
-
15
6. Movies
Showtimes
“movies 91360”
Reviews
Buy Tickets Online
http://www.google.com/
-
16
7. Stocks
Find reports on specific stocks
Compare stocks by entering multiple stock symbols
http://www.google.com/
-
17
8. Weather
Weather forecasts for specific regions of the world
Example: “weather 91360”
http://www.google.com/
-
18
9. Travel
Airport weather and delays
Airline Flight Information
Examples: “lax airport” AND “United 164”
http://www.google.com/
-
19
10. Pizza!
Find local businesses by typing in a keyword (like “pizza”) and your zipcode
http://www.google.com/
-
More?Yes, there are more…
-
21
-
22
Lecture Outline
• What are Google’s services?
• Inventing Google
• Current status of Google
• GMail service
• Google Office?
• iGoogle?
• What Google can not do
-
Inventing Google
-
24
Inventing Google
• Sergey & Larry - Ph.D. students at Stanford University• Prototype (1998)
– http://google.stanford.edu– 24,000,000 pages (8,058,044,651 today)
• Google– “We chose our system name, Google, because it is a common spelling
of googol, or 10100 and fits well with our goal of building very large-scale search engines.”
• Page Rank– An objective measure of its citation importance that corresponds well
with people’s subjective idea of importance.
http://google.stanford.edu/
-
25
Google’s Mission
“Organize the world’s information and make it universally accessible and useful.”
-
26
Google’s Goal
“To provide a much higher level of service to all those who seek information, whether they're at a desk in Boston, driving through Bonn, or strolling in Bangkok.”
-
27
Business Ethics
1. Focus on the user and all else will follow.
2. It's best to do one thing really, really well.
3. Fast is better than slow.
4. Democracy on the web works.
5. You don't need to be at your desk to need an answer.
6. You can make money without doing evil.
7. There's always more information out there.
8. The need for information crosses all borders.
9. You can be serious without a suit.
10. Great just isn't good enough.
-
28
Inventing Google: Foundation
• PageRank*:– We assume page A has pages T1...Tn which point
to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d... Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
A
T1
Tn
…
C1
Cn
*) Larry Page
-
29
Inventing Google: Foundation
• Page Rank formula informally– PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))– PageRank can be thought of as a model of user behavior. – We assume there is a "random surfer" who is given a web page
at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page.
– The probability that the random surfer visits a page is its PageRank.
• High PR has a page if…– there are many pages that point to it– or if there are some pages that point to it and have a high PR
– Note recursive weight propagation through web link structure.– Note that the PageRanks form a probability distribution over web
pages, so the sum of all web pages’ PageRanks will be one.– Damping factor d is the probability at each page the "random
surfer" will get bored and request another random page. • Personalization ☺
-
30
Inventing Google: Foundation
• PageRank relevancy tuning– Page title– Anchor text– Meta– Font
• Size• Weight
– Capitalization– …
-
31
Inventing Google: Anatomy
-
32
Inventing Google: Anatomy
• URL Server– Providers list of URLs to be fetched to crawlers
• Google Crawlers (GoogleBot)– Multiple distributed crawlers
• Own DNS cache• 300 connections open at once
– Send fetched pages to Store Server– Originally written in Python
• Store Server– Compresses and stores files to repository.– DOCID is created for each page.
• Repository– Stores fetched pages for further processing by Indexer
-
33
Inventing Google: Anatomy• Indexer
– Reads pages from Repository (uncompress)– Parses each document (Flex on top of own stack):
• Page converted to set of Hits (position, font, capitalization, title/achor/meta) / 2B• Added to Document Index
– Hits are distributed to Barrels (i.e. one document to multiple barrels)– Every link found in page is stored to Anchors file
• Forward and Inverted Barrels (2*64)– Forward Index
• Barrel keeps range of Hits sorted by DOCIDs• (DOCID, (WORDID, word’s Hit reference+)+)
– Processed by Sorter:• Generates inverted index from forward index – sorts Hits by WORDIDs• Creates (WORDID, offsets) used by Lexicon
– Inverted Index (short/full)• (WORDID, (DOCID reference, Hit list reference)+))• Short: DOCIDs sorted by/contains just quality Hits (word in title, anchor,...); optimal single word search• Full: DOCIDs sorted by DOCID; optimal Hit lists merging i.e. multi-word search
• Anchors file– Anchor (from, to, text)
• URL Resolver– Reads anchors file:
• Relation 2 absolute URL conversion + DOCID assignment• Creates links file
• Links file– (url, target: DOCID)
-
34
Inventing Google: Anatomy
• Searcher uses…– Lexicon
• Keeps map saying which Barrel to use.• Originally kept in memory (256MB).
– IMHO now must be used something like Multi-level VM Page Table– It is is/was of fixed size (14,000,000 words)
– Barrels• Each barrel keeps range of WORDIDs• WORID 2 DOCID map
– PageRank pool• Keeps counted page rank for each DOCID
– Doc Index• DOCID ordered information about each document
– (DOCID, status, repository pointer, checksum, stat, URL, title)
-
Cluster Innards
http://www.google.com/intl/cs/technology/pigeonrank.html
-
36
Cluster Innards: Global Google
• Over 30 Google clusters around the world.– DNS based & geo location driven load-balancing:
• Domain Name: GOOGLE.COMRegistrar: ALLDOMAINS.COM INC. Whois Server: whois.alldomains.com Referral URL: http://www.alldomains.com Name Server: NS2.GOOGLE.COM Name Server: NS1.GOOGLE.COM Name Server: NS3.GOOGLE.COM Name Server: NS4.GOOGLE.COM Status: REGISTRAR-LOCK Updated Date: 03-oct-2002 Creation Date: 15-sep-1997 Expiration Date: 14-sep-2011
• 2005, May 7: Google DNS hack speculations• Total PCs
• > 5,000 in 2000• >15,000 in 2003• >79,000* in 2004
*) I’m not sure about this number, it was taken from an external resource.
-
37
Cluster Innards: HW
• Basics cluster design insights– Reliability in SW rather then server-class HW.
• Commodity PCs used to build high-end computing cluster at a low end prices.
• Example:– $287,000 – 176x 2GHz Xeon, 176GB RAM, 7TB HDD– $758,000 – 8x 2GHZ Xeon, 64GB RAM, 8TB HDD
– Design is tailored for best aggregate request throughput, not peak server response time – individual request parallelization.
• Google has inexpensively built out its computing infrastructure by using thousands of "commodity" servers–
-
38
Cluster Innards: HW
• Optimistically, a consumer PC might crash once in three years from a software glitch or hardware problem. – "At Google scale...if you have thousands of PCs, you can
expect one (failure) a day,…"• 1,000,000s not 1,000,000,000s of dollars.
– “The trick is to make these racks of hardware work together and to ensure that the failure of one machine doesn't derail an operation.”
• Switched Ethernet– Commodity networking hardware is used - typically either 100
megabits/second or 1 gigabit/second at the machine level, but averaging considerably less in overall bisectionbandwidth.
– Locality optimizations (GFS)
-
39
Cluster Innards: SW
• Stripped-down version of Linux, which is based on the Red Hatdistribution but is really just the operating system kernelmodified for Google.
• Google File System is optimized for handling large blocks of data.– 64MB block– The file system was designed to assume that a failure, such as a
failed disk or unplugged network cable, can happen at any time. – Data is replicated in three places, and there is a "master" machine
that can locate copies of a piece of data, such as a keyword index, if the original is out of commission.
• Google has created "batch" job scheduling software that acts as a sort of taskmaster for millions of operations called the Global Work Queue.
• Another important engineering feat done by Google is to make writing programs that run across thousands of servers very straightforward…
-
40
Lecture Outline
• What are Google’s services?
• History of Google
• Current status of Google
• GMail service
• Google Office?
• iGoogle?
• What Google can not do
-
41
YEAR MONTH EVENT
1995 March Sergey Brin and Larry Page meet at a Stanford University spring gathering of Ph.D. computer science candidates.
1996 Jan-Dec Brin and Page create BackRub, the precursor to the Google search engine.
1998 September Google is incorporated and takes up residence in a Menlo Park, Calif., garage with four employees, after Brin and Page put their studies on hold and raise $1 million in funding from family, friends and "angel" investors. Google answers 10,000 search queries per day.
1999 Feb-June $25 million in funding from venture capital funds Sequoia Capital and Kleiner Perkins Caufield & Byers; eight employees; Google answers 500,000 searches per day.
2000 May-June Google, answering 18 million search queries a day, becomes the largest search engine on the Web. Internet media company Yahoo picks Google as its default search results provider.
2001 March-April Eric Schmidt, CEO of Novell and a former chief technology officer at Sun Microsystems, joins Google as chairman.
July-August Schmidt is appointed CEO while Page becomes president, products and Brin becomes president, technology.
September Google announces that it has achieved profitability.
-
42
2002 Jan- Feb Google announces the availability of “Google Search Appliance”.
March Google launches a beta version of Google News, which provides news stories from numerous global providers.
Nov. – Dec. Web index now includes 4 billion web documents.
2003 Jan – Feb Google acquires Pyra Labs, creator of the Web self-publishing tool Blogger.
May – June Google launches Ad Sense, an advertising program that delivers ads based on the content of Web sites.
2004 March – April Gmail, a free web based email service is launched.
July Google acquires Picasa, Inc. a digital photo management company.
August IPO of “GOOG” on NASDAQ at $85 per share, raising $1.7 billion.
November Google search index is now 8 billion pages
2005 March “Google Maps” is launched.
July GOOG share price passes $300 and becomes the world’s largest media by market value of approximate $85 bn.
-
43
Strategic Analysis
• Market share in online searches: 56.03%– Who are the competitors?
-
44
Strategic Analysis
• Market share in online searches: 56.03%
-
45
Strategic Analysis
• Number of searches a day: 4.03 billion
• Web page indexed: 25 billion
• Images indexed: 1.3 billion
-
46
Corporate Now
• Employees: 12000+
-
47
Financial Success
• Market capitalization: 166 billion USD
• Two years after going public, stock is 5-fold
• 10.06 billion in revenues in 2006, 3.077 billion profits in 2006
-
48
Comparison With Yahoo
-
49
Google Stock Growth vs. Industry vs. DJ
-
50
Google Competitors
-
51
-
52
Acquisitions and Mergers
-
53
Google “people” and environment/culture
People have to be extremely intelligent and usually have doctorates; people come into Google with Forward thinking, innovative and “out‐of ‐the box” strategies.
Search
Quality, popularity, overwhelming awareness of name and what thecompany is and does.
Google's Brand equity
“Google” is now a verb in Webster’s dictionary.
2003 Most recognized brand of the year.
Core Competencies
-
54
Corporate Culture
-
55
Googleplex
-
56
Spend 20% of their work time on projects that interest them.
Half of new product launches originated from 20% time.
Some of Google's newer services, such as Gmail, Google News, Orkut, and AdSense originated from these independent endeavors.
20% Time Philosophy
-
57
So, What Is This?
-
58
The Answer Is…
-
59
Lecture Outline
• What are Google’s services?
• History of Google
• Current status of Google
• GMail service
• Google Office?
• iGoogle?
• What Google can not do
• How library compete with Google?
-
60
Cool things you can do with Gmail(gmail.com)
-
61
From Gmail to….
-
62
Google Calendar
-
63
Google Docs
-
64
Google Docs
Revisions
-
65
Google Docs Revision
-
66
Google Docs Revision
-
67
Photos
-
68
Groups
-
69
Picture
Your Picture
-
70
Searching Mail
-
71
Sending & Receiving Mail
Auto Save
Click here to
reply
-
72
Receiving & Attaching File
-
73
Receiving PPT & MP3
-
74
Starred
-
75
Labels
-
76
Chatting
-
77
Chatting
-
78
That’s just some of the cool things you can do!
-
79
Lecture Outline
• What are Google’s services?
• History of Google
• Current status of Google
• GMail service
• Google Office?
• iGoogle?
• What Google can not do
• How library compete with Google?
-
80
The Web2.0 MergersBy To Date/Scale Attribute
2005/01, 2M USD Online photos
Social bookmark
Web-based
word processing
Video blog
News MySpace 2005/07, 0.58B USD blog
2005/12, N/A
2006/03, N/A
2006/10, 1.65B USD
Filckr
Del.icio.us
Writely
YouTube
Yahoo!
Yahoo!
Google
Google
-
81
Google Office – What Is Writely?
• Writely is merged by Google in 2006/03– A web-based word processing service provider
– Spelling checking, etc.
– MS-Word documents can be processed
– Software installation is not necessary
-
82
Writely can be run on any online Windows or Macintosh computers with one of the following browsers:
IE 5.5+ (available on Windows platform only)Mozilla 1.4+ (available on both Mac, Windows and Linux platform)Firefox 1.0.6+ (available on both Mac and Windows and Linux platform)
Writely-able Environment
-
83
Functions of Writely
• Upload MS-Word documents, HTML pages, or text files.
• Create new documents.
• Based on WYSIWYG editing style for document formatting and spelling checking.
• Share documents with others based on email.
• Cooperative document editing online
• File revision history, including version rollback.
• Publish document publicly, or set permission on document display.
• Download documents in MS-Word, HTML or ZIP format.
• Publish document to blog.
-
84
The “Autosave” Feature
• Autosave function automatically performed in Writely, ten seconds a time.– It is quite safe on software or hardware failure.
-
85
Compare Google office and MS office
-
86
Google vs. Microsoft
Googel Office Microsoft Office
Gmail & Calendar Outlook
Writely(Google Docs) Word
Google Spreadsheet Excel
Google Base Access
Googel Thumbstacks PowerPoint
FreeFree $350-$499
-
87
Google vs. Other Web Services
Google.com Yahoo.comGroups Yes Yes
YesYesYesYesYesYes
Friends No Yes YesKnowledge No Yes No
Blog Yes Yes YesMail Yes Yes Yes
Directory Yes Yes YesBid No Yes No
Shop No Yes NoYes
YesYesYesYesYesYesYes
Yes
MSN.com
Picasa YesTalk Yes
Upload Video NoMaps YesNews Yes
Upload Images No
Froogle No
-
88
Google Spreadsheet
-
89
Google Office Advantage?
• Security
• Privacy
• Physical connection quality
• Internet quality
• Free of charge
How about offline editing?
-
90
Lecture Outline
• What are Google’s services?
• History of Google
• Current status of Google
• GMail service
• Google Office?
• iGoogle?
• What Google can not do
• How library compete with Google?
-
91
iGoogle: the Personal Organizer Page
-
92
Considering a POP
• Why do I need a POP?
• What is its Purpose?
• What content do I want to include?
• Who do I want to view my POP?
• Where will I publish?
• How will I promote it?
• How could my learners use one?
-
93
Building a POP in Google
• Step 1: Open your browser and locate Googlewww.google.com
-
94
Getting an account
• Step 2:Select the Sign In icon
• Step 3:Create a Google account
-
95
Accessing your account
• Step 4:Click the sign in icon once more
• Step 5:enter your email address and password
-
96
Personalising your home
• Step 6:Select the Personalised Home icon
-
97
Adding a tab
• Step 7:Select the Add a tab icon
• Step 8:key in a title and click ok
• Note:If you leave the tick in place Google will use a typical template for the tab
-
98
Sample template tab
-
99
Moving widgets around
1. Select the widget by its title
2. Drag to a new position in your page
-
100
Editing your bookmarks
1. Select Edit 2. Add a link to your
favourite web space3. Save
-
101
Expanding a widget
1. Select the + symbol to expand
2. Select the –symbol to contract
-
102
Adding widgets
Add more widgets to a tab by clicking
on Add stuff
-
103
Adding stuff
Add a widget by clicking on the Add it
now icon
-
104
Check out the new widget
-
105
Make your iGoogle your home page
-
106
Lecture Outline
• What are Google’s services?
• History of Google
• Current status of Google
• GMail service
• Google Office?
• iGoogle?
• What Google can not do
-
107
What Google Can Not Do
• Google is still a traditional search application?– What is traditional search?
-
108
Traditional Search Principle
-
109
Traditional Search Principle
-
110
Traditional Search Principle
-
111
Google Is Trying to…
• Add shallow linguistics to traditional search
-
112
But…
-
113
Semantic Approaches to Search
• Beyond bag-of-words, use terms and conceptsinstead.
• Ontology can help user to:– Formulate semantic query
– Refine previous query
– Browse concept domain
– Formulate related query
– Interoperability between search applications
– Semantic indexing of documents
-
114
Ontology in Semantic Exploration
• Use graphical ontologies for query formulation– Semantic annotations of documents
– Construct queries graphically
– Use ontological structures to expand query
– Use ontology to visualize search results
-
115
Query Formulation
• Queries expanded from ontological structures
-
116
Query Refinement
• Use ontological structures to explore the domain
-
117
Ontology-driven Query Interpretation
-
118
Training Ontology for Search
-
119
Personalized Ontology
-
120
Semantic Search Query
-
Conclusion
Is Google good, bad, or evil?
-
122
Introduction to GoogleLecture OutlineThe Google Services10 Cool Things You Can Do With Google1. Basic SearchingBasic Searching Step-by-Step2. Advanced SearchingBetter Searches, Better Results3. DefinitionsDefine “Blog”4. Calculator“15.99 + 32.50 + 13.25”5. NumbersExamples of Number Searches6. Movies7. Stocks8. Weather9. Travel10. Pizza!More?�Yes, there are more…Lecture OutlineInventing GoogleGoogle’s MissionGoogle’s GoalBusiness EthicsInventing Google: FoundationInventing Google: FoundationInventing Google: FoundationInventing Google: AnatomyInventing Google: AnatomyInventing Google: AnatomyInventing Google: AnatomyCluster Innards: Global GoogleCluster Innards: HWCluster Innards: HWCluster Innards: SWLecture OutlineStrategic AnalysisStrategic AnalysisStrategic AnalysisCorporate NowFinancial SuccessComparison With YahooGoogle Stock Growth vs. Industry vs. DJGoogle CompetitorsAcquisitions and MergersCore CompetenciesCorporate CultureGoogleplex20% Time PhilosophySo, What Is This?The Answer Is…Lecture OutlineCool things you can do with Gmail�(gmail.com)From Gmail to….Google CalendarGoogle DocsGoogle DocsGoogle Docs RevisionGoogle Docs RevisionPhotosGroupsPictureSearching MailSending & Receiving MailReceiving & Attaching FileReceiving PPT & MP3StarredLabelsChattingChattingThat’s just some of the cool things you can do!Lecture OutlineThe Web2.0 MergersGoogle Office – What Is Writely?Writely-able EnvironmentFunctions of WritelyThe “Autosave” FeatureCompare Google office and MS officeGoogle vs. MicrosoftGoogle vs. Other Web ServicesGoogle SpreadsheetGoogle Office Advantage?Lecture OutlineiGoogle: the Personal Organizer PageConsidering a POPBuilding a POP in GoogleGetting an accountAccessing your accountPersonalising your homeAdding a tabSample template tabMoving widgets aroundEditing your bookmarksExpanding a widgetAdding widgetsAdding stuffCheck out the new widgetMake your iGoogle your home pageLecture OutlineWhat Google Can Not DoTraditional Search PrincipleTraditional Search PrincipleTraditional Search PrincipleGoogle Is Trying to…But…Semantic Approaches to SearchOntology in Semantic ExplorationQuery FormulationQuery RefinementOntology-driven Query InterpretationTraining Ontology for SearchPersonalized OntologySemantic Search QueryConclusion