Jian-hua Yeh (葉建華 -...

122
Introduction to Google Jian-hua Yeh (葉建華) [email protected]

Transcript of Jian-hua Yeh (葉建華 -...

  • Introduction to Google

    Jian-hua Yeh (葉建華)[email protected]

  • 2

    Lecture Outline

    • What are Google’s services?

    • Inventing Google

    • Current status of Google

    • GMail service

    • Google Office?

    • iGoogle?

    • What Google can not do

  • 3

    The Google Services

    • Web search

    • Images search

    • Video search

    • News search

    • Maps

    • Mail

    • More?

  • 10 Cool Things You Can Do With Google

  • 5

    1. Basic Searching

    http://www.google.com/

  • 6

    Basic Searching Step-by-Step

    Select search term(s)

    Enter search term(s) into search box

    Click Search or Press Enter key

    Browse Results

    http://www.google.com/

  • 7

    2. Advanced Searching

    Go to www.googleguide.com for more on how to use Google’s Basic and Advanced Search

    Click on “Advanced Search” on main Google Page

    http://www.google.com/advanced_search?hl=en

  • 8

    Better Searches, Better Results

    Exact Phrase [“one small step for man”]

    Excluded Words [bass –fishing, virus -computer]

    Similar Words [~mobile phone]

    Multiple Words (or) [Maui OR Hawaii]

    Multiple Words (and) [vacation Hawaii]

    -----------------------------------------------------------

    “I’m feeling lucky” [takes you directly to first web page returned for your query]

  • 9

    3. Definitions

    “define ______”or “define: ____”

    Definitions gathered from around the Web

  • 10

    Define “Blog”

    http://www.google.com/

  • 11

    4. Calculator

    Addition +

    Subtraction –

    Multiplication *

    Division /

    Percentages %of

    Exponents ^

    http://www.google.com/

  • 12

    “15.99 + 32.50 + 13.25”

    http://www.google.com/

  • 13

    5. Numbers

    Phone #s

    Tracking #s

    VIN #s

    UPC codes

    Area Codes

    More…

    http://www.google.com/

  • 14

    Examples of Number Searches

    Phone numbers

    Area codes

    Tracking packages by #

    UPC Codes

    VIN #s

  • 15

    6. Movies

    Showtimes

    “movies 91360”

    Reviews

    Buy Tickets Online

    http://www.google.com/

  • 16

    7. Stocks

    Find reports on specific stocks

    Compare stocks by entering multiple stock symbols

    http://www.google.com/

  • 17

    8. Weather

    Weather forecasts for specific regions of the world

    Example: “weather 91360”

    http://www.google.com/

  • 18

    9. Travel

    Airport weather and delays

    Airline Flight Information

    Examples: “lax airport” AND “United 164”

    http://www.google.com/

  • 19

    10. Pizza!

    Find local businesses by typing in a keyword (like “pizza”) and your zipcode

    http://www.google.com/

  • More?Yes, there are more…

  • 21

  • 22

    Lecture Outline

    • What are Google’s services?

    • Inventing Google

    • Current status of Google

    • GMail service

    • Google Office?

    • iGoogle?

    • What Google can not do

  • Inventing Google

  • 24

    Inventing Google

    • Sergey & Larry - Ph.D. students at Stanford University• Prototype (1998)

    – http://google.stanford.edu– 24,000,000 pages (8,058,044,651 today)

    • Google– “We chose our system name, Google, because it is a common spelling

    of googol, or 10100 and fits well with our goal of building very large-scale search engines.”

    • Page Rank– An objective measure of its citation importance that corresponds well

    with people’s subjective idea of importance.

    http://google.stanford.edu/

  • 25

    Google’s Mission

    “Organize the world’s information and make it universally accessible and useful.”

  • 26

    Google’s Goal

    “To provide a much higher level of service to all those who seek information, whether they're at a desk in Boston, driving through Bonn, or strolling in Bangkok.”

  • 27

    Business Ethics

    1. Focus on the user and all else will follow.

    2. It's best to do one thing really, really well.

    3. Fast is better than slow.

    4. Democracy on the web works.

    5. You don't need to be at your desk to need an answer.

    6. You can make money without doing evil.

    7. There's always more information out there.

    8. The need for information crosses all borders.

    9. You can be serious without a suit.

    10. Great just isn't good enough.

  • 28

    Inventing Google: Foundation

    • PageRank*:– We assume page A has pages T1...Tn which point

    to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d... Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

    A

    T1

    Tn

    C1

    Cn

    *) Larry Page

  • 29

    Inventing Google: Foundation

    • Page Rank formula informally– PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))– PageRank can be thought of as a model of user behavior. – We assume there is a "random surfer" who is given a web page

    at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page.

    – The probability that the random surfer visits a page is its PageRank.

    • High PR has a page if…– there are many pages that point to it– or if there are some pages that point to it and have a high PR

    – Note recursive weight propagation through web link structure.– Note that the PageRanks form a probability distribution over web

    pages, so the sum of all web pages’ PageRanks will be one.– Damping factor d is the probability at each page the "random

    surfer" will get bored and request another random page. • Personalization ☺

  • 30

    Inventing Google: Foundation

    • PageRank relevancy tuning– Page title– Anchor text– Meta– Font

    • Size• Weight

    – Capitalization– …

  • 31

    Inventing Google: Anatomy

  • 32

    Inventing Google: Anatomy

    • URL Server– Providers list of URLs to be fetched to crawlers

    • Google Crawlers (GoogleBot)– Multiple distributed crawlers

    • Own DNS cache• 300 connections open at once

    – Send fetched pages to Store Server– Originally written in Python

    • Store Server– Compresses and stores files to repository.– DOCID is created for each page.

    • Repository– Stores fetched pages for further processing by Indexer

  • 33

    Inventing Google: Anatomy• Indexer

    – Reads pages from Repository (uncompress)– Parses each document (Flex on top of own stack):

    • Page converted to set of Hits (position, font, capitalization, title/achor/meta) / 2B• Added to Document Index

    – Hits are distributed to Barrels (i.e. one document to multiple barrels)– Every link found in page is stored to Anchors file

    • Forward and Inverted Barrels (2*64)– Forward Index

    • Barrel keeps range of Hits sorted by DOCIDs• (DOCID, (WORDID, word’s Hit reference+)+)

    – Processed by Sorter:• Generates inverted index from forward index – sorts Hits by WORDIDs• Creates (WORDID, offsets) used by Lexicon

    – Inverted Index (short/full)• (WORDID, (DOCID reference, Hit list reference)+))• Short: DOCIDs sorted by/contains just quality Hits (word in title, anchor,...); optimal single word search• Full: DOCIDs sorted by DOCID; optimal Hit lists merging i.e. multi-word search

    • Anchors file– Anchor (from, to, text)

    • URL Resolver– Reads anchors file:

    • Relation 2 absolute URL conversion + DOCID assignment• Creates links file

    • Links file– (url, target: DOCID)

  • 34

    Inventing Google: Anatomy

    • Searcher uses…– Lexicon

    • Keeps map saying which Barrel to use.• Originally kept in memory (256MB).

    – IMHO now must be used something like Multi-level VM Page Table– It is is/was of fixed size (14,000,000 words)

    – Barrels• Each barrel keeps range of WORDIDs• WORID 2 DOCID map

    – PageRank pool• Keeps counted page rank for each DOCID

    – Doc Index• DOCID ordered information about each document

    – (DOCID, status, repository pointer, checksum, stat, URL, title)

  • Cluster Innards

    http://www.google.com/intl/cs/technology/pigeonrank.html

  • 36

    Cluster Innards: Global Google

    • Over 30 Google clusters around the world.– DNS based & geo location driven load-balancing:

    • Domain Name: GOOGLE.COMRegistrar: ALLDOMAINS.COM INC. Whois Server: whois.alldomains.com Referral URL: http://www.alldomains.com Name Server: NS2.GOOGLE.COM Name Server: NS1.GOOGLE.COM Name Server: NS3.GOOGLE.COM Name Server: NS4.GOOGLE.COM Status: REGISTRAR-LOCK Updated Date: 03-oct-2002 Creation Date: 15-sep-1997 Expiration Date: 14-sep-2011

    • 2005, May 7: Google DNS hack speculations• Total PCs

    • > 5,000 in 2000• >15,000 in 2003• >79,000* in 2004

    *) I’m not sure about this number, it was taken from an external resource.

  • 37

    Cluster Innards: HW

    • Basics cluster design insights– Reliability in SW rather then server-class HW.

    • Commodity PCs used to build high-end computing cluster at a low end prices.

    • Example:– $287,000 – 176x 2GHz Xeon, 176GB RAM, 7TB HDD– $758,000 – 8x 2GHZ Xeon, 64GB RAM, 8TB HDD

    – Design is tailored for best aggregate request throughput, not peak server response time – individual request parallelization.

    • Google has inexpensively built out its computing infrastructure by using thousands of "commodity" servers–

  • 38

    Cluster Innards: HW

    • Optimistically, a consumer PC might crash once in three years from a software glitch or hardware problem. – "At Google scale...if you have thousands of PCs, you can

    expect one (failure) a day,…"• 1,000,000s not 1,000,000,000s of dollars.

    – “The trick is to make these racks of hardware work together and to ensure that the failure of one machine doesn't derail an operation.”

    • Switched Ethernet– Commodity networking hardware is used - typically either 100

    megabits/second or 1 gigabit/second at the machine level, but averaging considerably less in overall bisectionbandwidth.

    – Locality optimizations (GFS)

  • 39

    Cluster Innards: SW

    • Stripped-down version of Linux, which is based on the Red Hatdistribution but is really just the operating system kernelmodified for Google.

    • Google File System is optimized for handling large blocks of data.– 64MB block– The file system was designed to assume that a failure, such as a

    failed disk or unplugged network cable, can happen at any time. – Data is replicated in three places, and there is a "master" machine

    that can locate copies of a piece of data, such as a keyword index, if the original is out of commission.

    • Google has created "batch" job scheduling software that acts as a sort of taskmaster for millions of operations called the Global Work Queue.

    • Another important engineering feat done by Google is to make writing programs that run across thousands of servers very straightforward…

  • 40

    Lecture Outline

    • What are Google’s services?

    • History of Google

    • Current status of Google

    • GMail service

    • Google Office?

    • iGoogle?

    • What Google can not do

  • 41

    YEAR MONTH EVENT

    1995 March Sergey Brin and Larry Page meet at a Stanford University spring gathering of Ph.D. computer science candidates.

    1996 Jan-Dec Brin and Page create BackRub, the precursor to the Google search engine.

    1998 September Google is incorporated and takes up residence in a Menlo Park, Calif., garage with four employees, after Brin and Page put their studies on hold and raise $1 million in funding from family, friends and "angel" investors. Google answers 10,000 search queries per day.

    1999 Feb-June $25 million in funding from venture capital funds Sequoia Capital and Kleiner Perkins Caufield & Byers; eight employees; Google answers 500,000 searches per day.

    2000 May-June Google, answering 18 million search queries a day, becomes the largest search engine on the Web. Internet media company Yahoo picks Google as its default search results provider.

    2001 March-April Eric Schmidt, CEO of Novell and a former chief technology officer at Sun Microsystems, joins Google as chairman.

    July-August Schmidt is appointed CEO while Page becomes president, products and Brin becomes president, technology.

    September Google announces that it has achieved profitability.

  • 42

    2002 Jan- Feb Google announces the availability of “Google Search Appliance”.

    March Google launches a beta version of Google News, which provides news stories from numerous global providers.

    Nov. – Dec. Web index now includes 4 billion web documents.

    2003 Jan – Feb Google acquires Pyra Labs, creator of the Web self-publishing tool Blogger.

    May – June Google launches Ad Sense, an advertising program that delivers ads based on the content of Web sites.

    2004 March – April Gmail, a free web based email service is launched.

    July Google acquires Picasa, Inc. a digital photo management company.

    August IPO of “GOOG” on NASDAQ at $85 per share, raising $1.7 billion.

    November Google search index is now 8 billion pages

    2005 March “Google Maps” is launched.

    July GOOG share price passes $300 and becomes the world’s largest media by market value of approximate $85 bn.

  • 43

    Strategic Analysis

    • Market share in online searches: 56.03%– Who are the competitors?

  • 44

    Strategic Analysis

    • Market share in online searches: 56.03%

  • 45

    Strategic Analysis

    • Number of searches a day: 4.03 billion

    • Web page indexed: 25 billion

    • Images indexed: 1.3 billion

  • 46

    Corporate Now

    • Employees: 12000+

  • 47

    Financial Success

    • Market capitalization: 166 billion USD

    • Two years after going public, stock is 5-fold

    • 10.06 billion in revenues in 2006, 3.077 billion profits in 2006

  • 48

    Comparison With Yahoo

  • 49

    Google Stock Growth vs. Industry vs. DJ

  • 50

    Google Competitors

  • 51

  • 52

    Acquisitions and Mergers

  • 53

    Google “people” and environment/culture

    People have to be extremely intelligent and usually have doctorates; people come into Google with Forward thinking, innovative and “out‐of ‐the box” strategies.

    Search

    Quality, popularity, overwhelming awareness of name and what thecompany is and does.

    Google's Brand equity

    “Google” is now a verb in Webster’s dictionary.

    2003 Most recognized brand of the year.

    Core Competencies

  • 54

    Corporate Culture

  • 55

    Googleplex

  • 56

    Spend  20%  of  their  work  time  on  projects  that  interest them. 

    Half of new product launches originated from 20% time.

    Some  of Google's  newer  services,  such  as Gmail, Google News,  Orkut,  and  AdSense originated  from  these independent endeavors.

    20% Time Philosophy

  • 57

    So, What Is This?

  • 58

    The Answer Is…

  • 59

    Lecture Outline

    • What are Google’s services?

    • History of Google

    • Current status of Google

    • GMail service

    • Google Office?

    • iGoogle?

    • What Google can not do

    • How library compete with Google?

  • 60

    Cool things you can do with Gmail(gmail.com)

  • 61

    From Gmail to….

  • 62

    Google Calendar

  • 63

    Google Docs

  • 64

    Google Docs

    Revisions

  • 65

    Google Docs Revision

  • 66

    Google Docs Revision

  • 67

    Photos

  • 68

    Groups

  • 69

    Picture

    Your Picture

  • 70

    Searching Mail

  • 71

    Sending & Receiving Mail

    Auto Save

    Click here to 

    reply

  • 72

    Receiving & Attaching File

  • 73

    Receiving PPT & MP3

  • 74

    Starred

  • 75

    Labels

  • 76

    Chatting

  • 77

    Chatting

  • 78

    That’s just some of the cool things you can do!

  • 79

    Lecture Outline

    • What are Google’s services?

    • History of Google

    • Current status of Google

    • GMail service

    • Google Office?

    • iGoogle?

    • What Google can not do

    • How library compete with Google?

  • 80

    The Web2.0 MergersBy To Date/Scale Attribute

    2005/01, 2M USD Online photos

    Social bookmark

    Web-based

    word processing

    Video blog

    News MySpace 2005/07, 0.58B USD blog

    2005/12, N/A

    2006/03, N/A

    2006/10, 1.65B USD

    Filckr

    Del.icio.us

    Writely

    YouTube

    Yahoo!

    Yahoo!

    Google

    Google

  • 81

    Google Office – What Is Writely?

    • Writely is merged by Google in 2006/03– A web-based word processing service provider

    – Spelling checking, etc.

    – MS-Word documents can be processed

    – Software installation is not necessary

  • 82

    Writely can be run on any online Windows or Macintosh computers with one of the following browsers:

    IE 5.5+ (available on Windows platform only)Mozilla 1.4+ (available on both Mac, Windows and Linux platform)Firefox 1.0.6+ (available on both Mac and Windows and Linux platform)

    Writely-able Environment

  • 83

    Functions of Writely

    • Upload MS-Word documents, HTML pages, or text files.

    • Create new documents.

    • Based on WYSIWYG editing style for document formatting and spelling checking.

    • Share documents with others based on email.

    • Cooperative document editing online

    • File revision history, including version rollback.

    • Publish document publicly, or set permission on document display.

    • Download documents in MS-Word, HTML or ZIP format.

    • Publish document to blog.

  • 84

    The “Autosave” Feature

    • Autosave function automatically performed in Writely, ten seconds a time.– It is quite safe on software or hardware failure.

  • 85

    Compare Google office and MS office

  • 86

    Google vs. Microsoft

    Googel Office Microsoft Office

    Gmail & Calendar Outlook

    Writely(Google Docs) Word

    Google Spreadsheet Excel

    Google Base Access

    Googel Thumbstacks PowerPoint

    FreeFree $350-$499

  • 87

    Google vs. Other Web Services

    Google.com Yahoo.comGroups Yes Yes

    YesYesYesYesYesYes

    Friends No Yes YesKnowledge No Yes No

    Blog Yes Yes YesMail Yes Yes Yes

    Directory Yes Yes YesBid No Yes No

    Shop No Yes NoYes

    YesYesYesYesYesYesYes

    Yes

    MSN.com

    Picasa YesTalk Yes

    Upload Video NoMaps YesNews Yes

    Upload Images No

    Froogle No

  • 88

    Google Spreadsheet

  • 89

    Google Office Advantage?

    • Security

    • Privacy

    • Physical connection quality

    • Internet quality

    • Free of charge

    How about offline editing?

  • 90

    Lecture Outline

    • What are Google’s services?

    • History of Google

    • Current status of Google

    • GMail service

    • Google Office?

    • iGoogle?

    • What Google can not do

    • How library compete with Google?

  • 91

    iGoogle: the Personal Organizer Page

  • 92

    Considering a POP

    • Why do I need a POP?

    • What is its Purpose?

    • What content do I want to include?

    • Who do I want to view my POP?

    • Where will I publish?

    • How will I promote it?

    • How could my learners use one?

  • 93

    Building a POP in Google

    • Step 1: Open your browser and locate Googlewww.google.com

  • 94

    Getting an account

    • Step 2:Select the Sign In icon

    • Step 3:Create a Google account

  • 95

    Accessing your account

    • Step 4:Click the sign in icon once more

    • Step 5:enter your email address and password

  • 96

    Personalising your home

    • Step 6:Select the Personalised Home icon

  • 97

    Adding a tab

    • Step 7:Select the Add a tab icon

    • Step 8:key in a title and click ok

    • Note:If you leave the tick in place Google will use a typical template for the tab

  • 98

    Sample template tab

  • 99

    Moving widgets around

    1. Select the widget by its title

    2. Drag to a new position in your page

  • 100

    Editing your bookmarks

    1. Select Edit 2. Add a link to your

    favourite web space3. Save

  • 101

    Expanding a widget

    1. Select the + symbol to expand

    2. Select the –symbol to contract

  • 102

    Adding widgets

    Add more widgets to a tab by clicking

    on Add stuff

  • 103

    Adding stuff

    Add a widget by clicking on the Add it

    now icon

  • 104

    Check out the new widget

  • 105

    Make your iGoogle your home page

  • 106

    Lecture Outline

    • What are Google’s services?

    • History of Google

    • Current status of Google

    • GMail service

    • Google Office?

    • iGoogle?

    • What Google can not do

  • 107

    What Google Can Not Do

    • Google is still a traditional search application?– What is traditional search?

  • 108

    Traditional Search Principle

  • 109

    Traditional Search Principle

  • 110

    Traditional Search Principle

  • 111

    Google Is Trying to…

    • Add shallow linguistics to traditional search

  • 112

    But…

  • 113

    Semantic Approaches to Search

    • Beyond bag-of-words, use terms and conceptsinstead.

    • Ontology can help user to:– Formulate semantic query

    – Refine previous query

    – Browse concept domain

    – Formulate related query

    – Interoperability between search applications

    – Semantic indexing of documents

  • 114

    Ontology in Semantic Exploration

    • Use graphical ontologies for query formulation– Semantic annotations of documents

    – Construct queries graphically

    – Use ontological structures to expand query

    – Use ontology to visualize search results

  • 115

    Query Formulation

    • Queries expanded from ontological structures

  • 116

    Query Refinement

    • Use ontological structures to explore the domain

  • 117

    Ontology-driven Query Interpretation

  • 118

    Training Ontology for Search

  • 119

    Personalized Ontology

  • 120

    Semantic Search Query

  • Conclusion

    Is Google good, bad, or evil?

  • 122

    Introduction to GoogleLecture OutlineThe Google Services10 Cool Things You Can Do With Google1. Basic SearchingBasic Searching Step-by-Step2. Advanced SearchingBetter Searches, Better Results3. DefinitionsDefine “Blog”4. Calculator“15.99 + 32.50 + 13.25”5. NumbersExamples of Number Searches6. Movies7. Stocks8. Weather9. Travel10. Pizza!More?�Yes, there are more…Lecture OutlineInventing GoogleGoogle’s MissionGoogle’s GoalBusiness EthicsInventing Google: FoundationInventing Google: FoundationInventing Google: FoundationInventing Google: AnatomyInventing Google: AnatomyInventing Google: AnatomyInventing Google: AnatomyCluster Innards: Global GoogleCluster Innards: HWCluster Innards: HWCluster Innards: SWLecture OutlineStrategic AnalysisStrategic AnalysisStrategic AnalysisCorporate NowFinancial SuccessComparison With YahooGoogle Stock Growth vs. Industry vs. DJGoogle CompetitorsAcquisitions and MergersCore CompetenciesCorporate CultureGoogleplex20% Time PhilosophySo, What Is This?The Answer Is…Lecture OutlineCool things you can do with Gmail�(gmail.com)From Gmail to….Google CalendarGoogle DocsGoogle DocsGoogle Docs RevisionGoogle Docs RevisionPhotosGroupsPictureSearching MailSending & Receiving MailReceiving & Attaching FileReceiving PPT & MP3StarredLabelsChattingChattingThat’s just some of the cool things you can do!Lecture OutlineThe Web2.0 MergersGoogle Office – What Is Writely?Writely-able EnvironmentFunctions of WritelyThe “Autosave” FeatureCompare Google office and MS officeGoogle vs. MicrosoftGoogle vs. Other Web ServicesGoogle SpreadsheetGoogle Office Advantage?Lecture OutlineiGoogle: the Personal Organizer PageConsidering a POPBuilding a POP in GoogleGetting an accountAccessing your accountPersonalising your homeAdding a tabSample template tabMoving widgets aroundEditing your bookmarksExpanding a widgetAdding widgetsAdding stuffCheck out the new widgetMake your iGoogle your home pageLecture OutlineWhat Google Can Not DoTraditional Search PrincipleTraditional Search PrincipleTraditional Search PrincipleGoogle Is Trying to…But…Semantic Approaches to SearchOntology in Semantic ExplorationQuery FormulationQuery RefinementOntology-driven Query InterpretationTraining Ontology for SearchPersonalized OntologySemantic Search QueryConclusion