Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr...

37
Programming for WWW Programming for WWW (ICE 1338) (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko .AT. i cu . ac.kr Information and Communications University (ICU)

Transcript of Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr...

Page 1: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

Programming for WWWProgramming for WWW(ICE 1338)(ICE 1338)

Lecture #3Lecture #3 June 30, 2004

In-Young Koiko .AT. icu.ac.kr

Information and Communications University (ICU)

Page 2: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 2 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

AnnouncementsAnnouncements

Please send the instructor Please send the instructor your team your team informationinformation

Please send the instructor Please send the instructor your informationyour information for for creating a Unix account in the Vega server if creating a Unix account in the Vega server if you haven’t already done so you haven’t already done so

Submit your Submit your homework#1homework#1 by sending the by sending the instructor either instructor either the URLthe URL of your homepage or of your homepage or the HTML sourcethe HTML source

Page 3: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 3 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Review of the Previous LectureReview of the Previous Lecture

RFC (Request for Comments)RFC (Request for Comments) HTTP (Hypertext Transfer Protocol)HTTP (Hypertext Transfer Protocol) HTML (Hypertext Markup Language)HTML (Hypertext Markup Language)

Page 4: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 4 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Contents of Today’s LectureContents of Today’s Lecture

Cascading Style SheetCascading Style Sheet Web-based Information IntegrationWeb-based Information Integration

Page 5: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 5 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Cascading Style Sheets (CSS)Cascading Style Sheets (CSS)

Provide a method of imposing consistency on the Provide a method of imposing consistency on the style of Web pagesstyle of Web pages c.f., Styles in MS Word or PowerPoint documentsc.f., Styles in MS Word or PowerPoint documents

Not part of HTML, but can be Not part of HTML, but can be embedded in HTML embedded in HTML documentsdocuments

Can impose a standard style on a Can impose a standard style on a whole documentwhole document, , or even a whole or even a whole collection of documentscollection of documents

Most of the style attributes (e.g, color, align, Most of the style attributes (e.g, color, align, size, ...) in HTML are deprecated from HTML 4.0size, ...) in HTML are deprecated from HTML 4.0

http://www.w3.org/Style/CSS/http://www.w3.org/Style/CSS/

Page 6: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 6 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

CSS ExampleCSS Example<html><html> <head><title>ICE1338</title><head><title>ICE1338</title> <style type = "text/css"><style type = "text/css"> <!--<!-- p { font-size: 16; color: blue; background-color: yellow }p { font-size: 16; color: blue; background-color: yellow }

h2, h3 { font-size: 16; color: red }h2, h3 { font-size: 16; color: red } -->--> </style></style> </head></head> <body><body>

<br/><br/><h2>Programming for WWW</h2> <h2>Programming for WWW</h2> ……<p>In this course, students will learn the core <i>concepts</i> and <p>In this course, students will learn the core <i>concepts</i> and <i>technologies</i> behind the <b>World Wide Web (Web)</b>, and <i>technologies</i> behind the <b>World Wide Web (Web)</b>, and <i>practice</i> the languages and tools to build <u>Web-based contents <i>practice</i> the languages and tools to build <u>Web-based contents and services</u>. </p>and services</u>. </p><h3 id="info">Related Information</h3><h3 id="info">Related Information</h3><h4>W3C</h4><h4>W3C</h4><blockquote>The <a href="http://www.w3.org">World Wide Web <blockquote>The <a href="http://www.w3.org">World Wide Web Consortium (W3C)</a> develops interoperable technologies Consortium (W3C)</a> develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full (specifications, guidelines, software, and tools) to lead the Web to its full potential.</blockquote>potential.</blockquote><h3 id="students">Students</h3><h3 id="students">Students</h3>Here are the pictures of the students:<br/>Here are the pictures of the students:<br/>……

<hr/><hr/> <tt>Information and Communications University</tt><tt>Information and Communications University</tt> </body></body></html></html>

Page 7: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 7 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

CSS LevelsCSS Levels CSS1CSS1 (1996, W3C) (1996, W3C)

InlineInline stylesstyles – specified for a specific occurrence of – specified for a specific occurrence of a a tagtag and apply only to that tag and apply only to that tag

CSS2CSS2 (1996) (1996) Document-levelDocument-level stylesstyles – applied to the – applied to the whole whole

documentdocument in which they appear (in the in which they appear (in the HTML headHTML head)) CSS3CSS3 (1998) (1998)

External stylesExternal styles – can be applied to any number of – can be applied to any number of documentsdocuments

When more than one style sheetWhen more than one style sheet applies to a applies to a specific specific tag in a document, the tag in a document, the lowest levellowest level style sheet has style sheet has precedenceprecedence

Page 8: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 8 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

CSS1 (Inline)CSS1 (Inline) Style sheet appears as the value of the styleStyle sheet appears as the value of the style

attributeattribute General form:General form:

style = "property_1: value_1; style = "property_1: value_1; property_2: value_2;property_2: value_2; … … property_n: value_n”property_n: value_n”

Scope of an inline style sheet is the content of Scope of an inline style sheet is the content of the tagthe tag

e.g., e.g., <p style = “font-size: 12pt; color: blue; <p style = “font-size: 12pt; color: blue; background-color: white”> … </p>background-color: white”> … </p>

Page 9: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 9 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

CSS2 (Document Level)CSS2 (Document Level) Style sheet appears as a list of rules that are Style sheet appears as a list of rules that are

thethe content of a content of a <style><style> tag tag The <style> tag must include the type attribute,The <style> tag must include the type attribute, set to set to

""text/csstext/css““ The list of rules must be The list of rules must be placed in an HTMLplaced in an HTML

commentcomment, because it is not HTML, because it is not HTML Comments in the rule list must have a different form Comments in the rule list must have a different form

–– use use C comments (C comments (/*…*//*…*/)) General form:General form:

<style type = "text/css"><style type = "text/css"> <!--<!-- rule listrule list -->--> </style></style>

Page 10: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 10 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

CSS2 (Document Level) CSS2 (Document Level) cont.cont.

Form of the rules:Form of the rules:

selectorselector { { list of property/valueslist of property/values }} The selector is The selector is a tag namea tag name or or a list of taga list of tag namesnames, ,

separated by commasseparated by commas (e.g., (e.g., h1, h3 h1, h3,, p p)) Each property/value pair has the form:Each property/value pair has the form: property: property:

valuevalue, and p, and pairs are separated by semicolonsairs are separated by semicolons e.g., e.g., <style type = "text/css"><style type = "text/css"> <!--<!-- p { font-size: 12pt; background-color: white }p { font-size: 12pt; background-color: white }

h2, h3 { font-size: 16pt; color: blue }h2, h3 { font-size: 16pt; color: blue } -->--> </style></style>

Page 11: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 11 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

CSS3 (External)CSS3 (External)

Form is Form is a list of style rulesa list of style rules, as in the content of a , as in the content of a <style> tag for document-level style sheets<style> tag for document-level style sheets

Written as text files with the MIME type Written as text files with the MIME type text/csstext/css A A <link><link> tag is used to specify that the browser tag is used to specify that the browser

is to fetch and use an external style sheet fileis to fetch and use an external style sheet file<link rel = stylesheet type = "text/css"<link rel = stylesheet type = "text/css"

href = "http://www.wherever.org/termpaper.css">href = "http://www.wherever.org/termpaper.css">

</link></link> External style sheets can be validated atExternal style sheets can be validated at

http://jigsaw.w3.org/css-validator/validator-upload.htmlhttp://jigsaw.w3.org/css-validator/validator-upload.html

Page 12: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 12 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Style ClassesStyle Classes Used to allow different occurrences of Used to allow different occurrences of the same the same

tag to use different styletag to use different style specifications specifications A style class has a name, which is attached to a A style class has a name, which is attached to a

tag nametag namee.g., p.e.g., p.narrownarrow { { property/value listproperty/value list } }

p.p.widewide { { property/value listproperty/value list } } The class you want on a particular occurrence The class you want on a particular occurrence

of a tag is specified with the of a tag is specified with the class attributeclass attribute of of the tagthe tag

ee.g., .g., <p class = "narrow"><p class = "narrow"> ... ... </p></p>

Page 13: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 13 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Generic Style ClassesGeneric Style Classes

A generic class can be defined if you A generic class can be defined if you want want a style to apply to more than one a style to apply to more than one kind of tagkind of tag

A generic class must be named, and the A generic class must be named, and the name must begin with a periodname must begin with a period

e.g., e.g., .really-big { .really-big { font-size: 36pt; font-style: italicfont-size: 36pt; font-style: italic } } Use it as if it were a normal style classUse it as if it were a normal style class

ee.g., .g., <h1 class = "really-big"> … </h1><h1 class = "really-big"> … </h1> ...... <p class = "really-big"> … </p><p class = "really-big"> … </p>

Page 14: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 14 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

CSS PropertiesCSS Properties There are There are 56 different properties56 different properties in 6 categories: in 6 categories:

Fonts, Colors and backgrounds, Text, Boxes and Fonts, Colors and backgrounds, Text, Boxes and layouts, Lists, Tagslayouts, Lists, Tags

Property Value FormsProperty Value Forms KeywordsKeywords –– left, small, … (Not case sensitive) left, small, … (Not case sensitive) LengthLength –– numbers, maybe with decimal points numbers, maybe with decimal points

Units: Units: pxpx (pixels), (pixels), inin (inches), (inches), cmcm (centimeters), (centimeters), mmmm (millimeters), (millimeters), ptpt (points), (points), pcpc (picas = 12 points), (picas = 12 points), emem (height (height of the letter ‘m’), of the letter ‘m’), x-heightx-height (height of the letter ‘x’) (height of the letter ‘x’)

No space is allowed between the number and the unit No space is allowed between the number and the unit specification (e.g., 1.5 in is illegal!)specification (e.g., 1.5 in is illegal!)

PercentagePercentage – – a number followed immediately by “a number followed immediately by “%%”” URL valuesURL values –– urlurl(protocol://server/pathname)(protocol://server/pathname)

Page 15: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 15 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

CSS Properties CSS Properties –– cont. cont.

Property Value Forms Property Value Forms (cont.)(cont.)

ColorsColors Color nameColor name –– e.g., white, blue e.g., white, blue rgb(n1, n2, n3)rgb(n1, n2, n3) –– e.g., rgb(255, 255, 255) e.g., rgb(255, 255, 255)

Numbers can be decimal or percentagesNumbers can be decimal or percentages

Hex formHex form –– e.g., #FFFFFF e.g., #FFFFFF

Property values are inherited by all Property values are inherited by all nested tags, unless overridennested tags, unless overriden

Page 16: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 16 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Font PropertiesFont Properties font-familyfont-family

e.g., e.g., font-family: Arial, font-family: Arial, 굴림굴림 , , Helvetica, CourierHelvetica, Courier font-sizefont-size

Values: medium, smaller, 120%, 12pt, Values: medium, smaller, 120%, 12pt, …… font-stylefont-style

Values: italic, oblique, normalValues: italic, oblique, normal font-weightfont-weight

Values: bolder, lighter, bold, normal, 100, 200, Values: bolder, lighter, bold, normal, 100, 200, …, 900…, 900 ffontont –– For specifying a list of font properties For specifying a list of font properties

e.g., e.g., font: bolder 14pt Arial Helveticafont: bolder 14pt Arial Helvetica Order must be: Order must be: style, weight, size, name(s)style, weight, size, name(s)

text-decorationtext-decoration Values: line-through, overline, underline, noneValues: line-through, overline, underline, none

Page 17: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 17 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

List PropertiesList Properties

list-style-typelist-style-type Unordered listsUnordered lists –– Bullet can be a Bullet can be a discdisc

(default), a (default), a squaresquare, or a , or a circlecircle; set it on ; set it on either the <ul> or <li> tageither the <ul> or <li> tag e.g., e.g., <li style = "list-style-type: square"> <li style = "list-style-type: square"> … </li>… </li> Could Could use an image for the bulletsuse an image for the bullets in an in an

unordered listunordered list e.g., e.g., <li style = "list-style-image: url(bird.jpg)"><li style = "list-style-image: url(bird.jpg)">

Ordered listsOrdered lists –– list-style-type can be used to list-style-type can be used to change the sequence values (change the sequence values (decimaldecimal, , upper-alphaupper-alpha, , lower-alphalower-alpha, , upper-romanupper-roman, , lower-romanlower-roman))

Page 18: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 18 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Text AlignmentsText Alignments text-indenttext-indent –– allows indentation of text allows indentation of text

Values: a length or a % valueValues: a length or a % value text-aligntext-align –– allows alignment of text allows alignment of text

Values: Values: leftleft (the default), (the default), centercenter, , rightright, or , or justifyjustify floatfloat –– makes text to flow around another element makes text to flow around another element

Values: Values: leftleft, , rightright, and , and nonenone (the default) (the default) e.g.,e.g., <img src = "c210.jpg" style = "float: right" /> <img src = "c210.jpg" style = "float: right" />

Page 19: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 19 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Other PropertiesOther Properties http://www.w3.org/TR/CSS21/propidx.htmlhttp://www.w3.org/TR/CSS21/propidx.html

Page 20: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 20 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

CSS ProfilesCSS Profiles

CSS Mobile Profile 1.0CSS Mobile Profile 1.0 is for devices is for devices such as mobile phones and PDAssuch as mobile phones and PDAs

CSS Print ProfileCSS Print Profile is still a draft. It is is still a draft. It is aimed at low-cost printersaimed at low-cost printers

CSS TV Profile 1.0CSS TV Profile 1.0 is for browsers that is for browsers that run on television setsrun on television sets

Page 21: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 21 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Web-based Information IntegrationWeb-based Information Integration

Retrieve document collections Retrieve document collections from from various Web resourcesvarious Web resources (e.g. search engines, news (e.g. search engines, news video archives)video archives)

Analyze the document Analyze the document collections using various collections using various document analysisdocument analysis services services (characterize, sort, partition, (characterize, sort, partition, filter, etc.)filter, etc.)

Visualize analysis results using Visualize analysis results using various various visualization servicesvisualization services to to help users make sense of themhelp users make sense of them

Impose structureImpose structure on the on the resulting document collection resulting document collection to define a customized, task-to define a customized, task-oriented information spaceoriented information space

Document AnalysisDocument Analysis Information VisualizationInformation Visualization

Information OrganizationInformation Organization

Information SpacesWeb

Information Gathering

Information Gathering

Document Collection

Document AnalysisDocument AnalysisDocument AnalysisDocument Analysis Information VisualizationInformation VisualizationInformation VisualizationInformation Visualization

Information OrganizationInformation OrganizationInformation OrganizationInformation Organization

Information Spaces

Information SpacesWeb

Information Gathering

Information Gathering

Web

Information Gathering

Information Gathering

Document CollectionDocument Collection

Page 22: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 22 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Daily News AnalysisDaily News AnalysisISI’s GeoTopics http://www.isi.edu/info-agents/demos.html

Page 23: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 23 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Large-scale Web-based Large-scale Web-based Information Integration ExampleInformation Integration ExampleGeoTopics: GeoTopics: Daily News Analysis Portal GeneratorDaily News Analysis Portal Generator

((www.isi.edu/geoworlds/geotopicswww.isi.edu/geoworlds/geotopics//))

News SourcesNews Sources Extracted ArticlesExtracted Articles News Compilation ResultsNews Compilation ResultsDocument AnalysesDocument Analyses

Document filteringDocument filteringDocument filteringDocument filtering

Topic and place name Topic and place name extractionsextractionsTopic and place name Topic and place name extractionsextractions

Topic and place-based Topic and place-based Document classifications Document classifications Topic and place-based Topic and place-based Document classifications Document classifications

Topic ranking and sortingTopic ranking and sortingTopic ranking and sortingTopic ranking and sorting

Cross-product between Cross-product between topics and placestopics and placesCross-product between Cross-product between topics and placestopics and places

Geographical mapping of Geographical mapping of the articlesthe articlesGeographical mapping of Geographical mapping of the articlesthe articles

Requires 92 component servicesRequires 92 component services NeedNeedss to generate portals customized for different news sources and regions to generate portals customized for different news sources and regions

Page 24: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 24 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Travel PlannerTravel PlannerISI’s Heracles Projecthttp://www.isi.edu/info-agents/Heracles/examples/TravelPlanner/

Page 25: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 25 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Intelligent WorldInfo Assistant Intelligent WorldInfo Assistant

ISI’s Heracles Projecthttp://www.isi.edu/info-agents/Heracles/examples/TIP/

Page 26: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 26 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Heracles’ Information SourcesHeracles’ Information Sources

ScheduleSchedule : Outlook Calendar Address Info : Outlook Contact : Outlook Calendar Address Info : Outlook Contact WeatherWeather : Yahoo Weather (weather.yahoo.com) : Yahoo Weather (weather.yahoo.com) GeocodesGeocodes : MapBlast (www.mapblast.com) : MapBlast (www.mapblast.com) Driving MapDriving Map : MapQuest (www.mapquest.com) : MapQuest (www.mapquest.com) Map(airports)Map(airports) : YahooMap (maps.yahoo.com) : YahooMap (maps.yahoo.com) Flight InfoFlight Info : ITA Software (www.itasoftware.com) : ITA Software (www.itasoftware.com) Airport InfoAirport Info : Travelocity (www.travelocity.com) : Travelocity (www.travelocity.com) Airport Parkging InfoAirport Parkging Info (www.airwise.com) (www.airwise.com) Taxi Fare InfoTaxi Fare Info : Washington Post : Washington Post

(www.whshingtonpost.com) (www.whshingtonpost.com) HotelHotel : ITN Hotel (www.itn.com) : ITN Hotel (www.itn.com) Car Rental Car Rental : ITN Car (www.itn.com) : ITN Car (www.itn.com) Flight TrackingFlight Tracking : ITN flight tracking system(www.itn.com) : ITN flight tracking system(www.itn.com)

Page 27: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 27 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

CIA – The World FactbookCIA – The World Factbook

http://www.cia.gov/cia/publications/factbook/http://www.cia.gov/cia/publications/factbook/

Page 28: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 28 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Search EnginesSearch Engines

Open Directory: http://dmoz.org/

Page 29: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 29 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Online BookstoresOnline Bookstores

Page 30: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 30 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Information MediatorsInformation Mediators Provides an Provides an intermediate layerintermediate layer between information between information

sources and users/applicationssources and users/applications Queries to a mediator are in a Queries to a mediator are in a uniform languageuniform language Determines which Determines which data sourcesdata sources to use, how to to use, how to obtain obtain

the desired informationthe desired information, and how to , and how to manipulate the manipulate the informationinformation

e.g., ISI’s SIMS, Stanford’s TSIMMISe.g., ISI’s SIMS, Stanford’s TSIMMIS

Knoblock & Minton, IEEE Intelligence, Sep/Oct 1998

Source Source Source Source

Wrapper Wrapper Wrapper Wrapper

Mediator

Queries

Users, Applications

Page 31: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 31 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

MediationMediation Transformation and subsettingTransformation and subsetting of databases to of databases to

reorganize base data into new configurations reorganize base data into new configurations appropriate to specific users and applicationsappropriate to specific users and applications

GatheringGathering an appropriate amount of data by an appropriate amount of data by specializing or generalizing the searchspecializing or generalizing the search

Accessing and mergingAccessing and merging data from multiple data from multiple databasesdatabases

AbstractionAbstraction of data to bring them to a higher level of data to bring them to a higher level MaintainingMaintaining derived data for efficiency derived data for efficiency

Gio Wiederhold, Mediators in the Architecture of Future Gio Wiederhold, Mediators in the Architecture of Future Information Systems, Computer, March 1992.Information Systems, Computer, March 1992.

Page 32: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 32 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Information WrappersInformation Wrappers

AcceptAccept queries from the mediator queries from the mediator TranslateTranslate the query into the appropriate the query into the appropriate

query for the individual sourcequery for the individual source PerformPerform any additional processing if any additional processing if

necessarynecessary ReturnReturn the results to the mediator the results to the mediator Web WrappersWeb Wrappers: make Web sources look : make Web sources look

like databases that can be queried like databases that can be queried through the mediatorthrough the mediator

Ashish & Knoblock, COOPIS, 1997

Page 33: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 33 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Web Wrapper Generation StepsWeb Wrapper Generation Steps

1.1. Analyze the local Analyze the local querying mechanismquerying mechanismee.g., A search query to Naver.com .g., A search query to Naver.com

http://http://websearch.naver.comwebsearch.naver.com/search.naver/search.naver??where=webkrwhere=webkr&query=www&query=www&xc=&qt=df&f=al&xc=&qt=df&f=all&r=&st=s&fd=1l&r=&st=s&fd=1&start=101&display=10&start=101&display=10&do&domain=&dftf=&qf=1&qvt=0main=&dftf=&qf=1&qvt=0

Host Address Local Path

Information Category

Search QueryResult start index

Result page size

Page 34: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 34 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Web Wrapper Generation StepsWeb Wrapper Generation Steps

2.2. Analyze Analyze result page structureresult page structureURL SummaryTitle

Page 35: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 35 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Web Wrapper Generation StepsWeb Wrapper Generation Steps

3.3. Develop a mechanism to Develop a mechanism to translate a translate a user query into a local queryuser query into a local query

4.4. Develop a Develop a result parserresult parser to extract to extract information blocks from result pagesinformation blocks from result pages

5.5. Integrate the information blocksIntegrate the information blocks retrieved from the result pagesretrieved from the result pages

6.6. ConvertConvert the integrated information into the integrated information into the format that a mediator or a client can the format that a mediator or a client can acceptaccept

Page 36: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 36 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Java Implementation of Web WrapperJava Implementation of Web Wrapper

public void WebWrapper(String host, String path, String query, int startIndex, int pageSize) {public void WebWrapper(String host, String path, String query, int startIndex, int pageSize) {try {try { String address = "http://" + host + path + "?where=webkr" + "&query=" + query + String address = "http://" + host + path + "?where=webkr" + "&query=" + query +

"&start=" + startIndex + "1" + “&display=" + pageSize;"&start=" + startIndex + "1" + “&display=" + pageSize; URL url = new URL url = new URLURL(address);(address); URLConnection urlc = url.URLConnection urlc = url.openConnection()openConnection();; urlc.setRequestProperty("Accept", "*/*");urlc.setRequestProperty("Accept", "*/*"); urlc.setRequestProperty("User-Agent", "Mozilla/4.0");urlc.setRequestProperty("User-Agent", "Mozilla/4.0"); InputStream is = urlc.InputStream is = urlc.getInputStream()getInputStream();; InputStreamReader ips = new InputStreamReader(is);InputStreamReader ips = new InputStreamReader(is); BufferedReader in = new BufferedReader(ips);BufferedReader in = new BufferedReader(ips); String line;String line; while ((line=in.readLine()) != null) {while ((line=in.readLine()) != null) {

////System.out.println(line);System.out.println(line);////

}}} catch(Exception e) {} catch(Exception e) { e.printStackTrace();e.printStackTrace();}}

}}

Parsing Results

Query Translation

Page 37: Programming for WWW (ICE 1338) Lecture #3 Lecture #3 June 30, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

June 30, 2004 37 Programming for WWW (Lecture#3) In-Young Ko, Information Communications University

Project Proposal AssignmentProject Proposal Assignment

Due Date: Due Date: July 9, 2004July 9, 2004 Develop a Web-based Develop a Web-based information integration information integration

scenarioscenario that includes the following: that includes the following: Which Web sources to accessWhich Web sources to access Which informationWhich information will be collected from the sources will be collected from the sources How the information will be How the information will be integrated and presentedintegrated and presented

Submit a short (less than 5 pages) Submit a short (less than 5 pages) proposal proposal documentdocument that includes the following contents: that includes the following contents: ObjectivesObjectives of the project of the project Web-based information integration Web-based information integration scenarioscenario Development Development scheduleschedule

Present the proposalPresent the proposal on July 9 on July 9thth

5 min presentation for each team5 min presentation for each team