11A Programming 1 HTML-Web
Transcript of 11A Programming 1 HTML-Web
-
7/30/2019 11A Programming 1 HTML-Web
1/36
IST 195 Programming 1: HTML-Web
The Semantic Web, XML, and English on the
Web
Prof. Randy Wenner
Adapted from Professor Jeff Stanton
-
7/30/2019 11A Programming 1 HTML-Web
2/36
Learning Map
1. The Semantic Web
Giving web page contents more meaning for peopleand computers
2. XML
One of the most important tools for creating thesemantic web
3. English and the Web
Challenges of many cultures, many pages, in manylanguages and how XML and the semantic web may
help
-
7/30/2019 11A Programming 1 HTML-Web
3/36
Semantic Web
Semantic:
Part of the structure of language relating to meaning,especially of words
The Semantic Web: Web 3.0?
An idea for the future of the WWW in whichinformation is tagged with information about its
meaning rather than about its format
-
7/30/2019 11A Programming 1 HTML-Web
4/36
The Web is not Semantic now
Currently the web is a large collection of HTML
documents and a bit of other stuff We know that HTML is a formatting language: It says
where elements on the page should go and what they
should look like
Example: Zap Mama makes a second level
heading, left justified, bold, larger font
HTML does not actually say what anything actually is
what kind of information it is You cant tell from the tag what Zap Mama
signifies. Is it a command? A label? A name?
-
7/30/2019 11A Programming 1 HTML-Web
5/36
Where we are Today: the Syntactic Web
[Hendler & Miller 02]
-
7/30/2019 11A Programming 1 HTML-Web
6/36
The Syntactic Web is A hypermedia, a digital library
A library of documents called (web pages) interconnected by a
hypermedia of links A database, an application platform
A common portal to applications accessible through webpages, and presenting their results as web pages
A platform for multimedia BBC Radio 4 anywhere in the world! Terminator trailers!
A naming scheme Unique identity for those documents
A place where computers do the presentation (easy)and people do the linking and interpreting (hard).
Why not get computers to do more of the hard work?
[Goble 03]
-
7/30/2019 11A Programming 1 HTML-Web
7/36
Hard Work using the Syntactic Web
Find image of Buzz Shaw (SU former chancellor)
http://www.buzzbutt.com/
html/shaw_party.html
http://www.buzzbutt.com/html/shaw_party.htmlhttp://www.buzzbutt.com/html/shaw_party.htmlhttp://www.buzzbutt.com/html/shaw_party.htmlhttp://www.buzzbutt.com/html/shaw_party.html -
7/30/2019 11A Programming 1 HTML-Web
8/36
What is the Problem?
Consider a typical web
page:
Markup consists
of:
renderinginformation (e.g.,
font size and
color)
Hyperlinks to
related content
Semantic content
is accessible to
humans but not
(easily) tocomputers
-
7/30/2019 11A Programming 1 HTML-Web
9/36
What information we seeWWW 2002
The eleventh international world wide web conference
Sheraton Waikiki hotel
Honolulu, Hawaii, USA7-11 may 2002
1 location 5 days learn interact
Registered participants coming from
australia, canada, chile denmark, france, germany, ghana, hong kong, india,
ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore,switzerland, the united kingdom, the united states, vietnam, zaire
Register now
On the 7th May Honolulu will provide the backdrop of the eleventh internationalworld wide web conference. This prestigious event
Speakers confirmed
Tim Berners-LeeTim is the well known inventor of the Web,
Ian Foster
Ian is the pioneer of the Grid, the next generation internet
-
7/30/2019 11A Programming 1 HTML-Web
10/36
What information a machine seesWWW2002The eleventh inteqnational woqld wide webconfeqenceSheqaton waikiki hotelHonolulu, hawaii, USA7-11 may 20021 location 5 days leaqn inteqactRegisteqed paqticipants coming fqomaustqalia, canada, chile denmaqk, fqance,geqmany, ghana, hong kong, india,iqeland, italy, japan, malta, new zealand,the netheqlands, noqway, singapoqe,switzeqland, the united kingdom, the unitedstates, vietnam, zaiqe
Registeq nowOn the 7th May Honolulu will pqovide thebackdqop of the eleventh inteqnational woqldwide web confeqence. This pqestigious event Speakeqs confiqmedTim beqneqs-leeTim is the well known inventoq of the Web, Ian FosteqIan is the pioneeq of the Gqid, the nextgeneqation inteqnet
-
7/30/2019 11A Programming 1 HTML-Web
11/36
So, if a machine sees garble
You cant ask it
Whos speaking at the conference? What countries will be represented?
What dates is the conference being held?
etc.
-
7/30/2019 11A Programming 1 HTML-Web
12/36
The Semantic Web aims to solve this
Rather than describing formatting, tags would
designate what kind of information a piece ofinformation was
Rather than discarding the internal organization of the
data when placing it on web pages, authors would
keep the natural structure of the data
Tags like Zap Mama would replace thecurrent HTML strategy of tagging the format of the
information
Artist Title Courtesy
Beastie Boys Now Get Busy Beastie Boys appear courtesy of Beastie Boys and Capitol Records.
David Byrne My Fair Lady David Byrne appears courtesy of Nonesuch Records.
Zap Mama Wadidyusay? Zap Mama appears courtesy of Luaka Bop Records.
-
7/30/2019 11A Programming 1 HTML-Web
13/36
The Semantic Web was always the goal
Web was invented byTim Berners-Lee (amongst others), a
physicist working at CERN
TBLs original vision of the Web was much more ambitious than
the reality of the existing (syntactic) Web:
TBL (and others) have since been working towards realizing this
vision, which has become known as the Semantic Web
article in May 2001 issue of Scientific American
... a goal of the Web was that, if the interaction between person and hypertext
could be so intuitive that the machine-readable information space gave an
accurate representation of the state of people's thoughts, interactions, and workpatterns, then machine analysis could become a very powerful management
tool, seeing patterns in our work and facilitating our working together through the
typical problems which beset the management of large organizations.
http://www.w3.org/People/Berners-Lee/ -
7/30/2019 11A Programming 1 HTML-Web
14/36
More on the Semantic Web
-
7/30/2019 11A Programming 1 HTML-Web
15/36
Oh Happy Day!
The Semantic Web is under development
Three major components XML Extensible markup language
for tagging the structure of the data
RDFResource description framework
a way to break knowledge down into small pieces, with some rules
about the meaning of those pieces
Goal: to have a method so simple that it can express any fact, and yet
so structured that computer applications can do useful things with
knowledge expressed in RDF
OWL Web Ontology Language
for describing the big picture about how data elements on one or more
pages all fit together and relate to one another
-
7/30/2019 11A Programming 1 HTML-Web
16/36
-
7/30/2019 11A Programming 1 HTML-Web
17/36
Of course, we can't be
drawing our way through the
Semantic Web, so insteadhow about a table-style
representation for the graph?
Each row represents an arrow
(an edge) in the figure. Thefirst column has the name of
the node at the start of the
edge. The second column has
the label of the edge itself(the kind of edge). The third
column has the name of the
node at the end of the arrow.
Start Node Edge Label End Node
vincent_donofrio starred_in law_&_order_ci
law_&_order_ci is_a tv_show
the_thirteenth_floor similar_plot_as the_matrix
...
l X L
-
7/30/2019 11A Programming 1 HTML-Web
18/36
Example XML
The following text may look identical in a
browser
E l XML
-
7/30/2019 11A Programming 1 HTML-Web
19/36
Example XML
But its quite different under the hood.
See how the XML differs from the HTML?
HTML
XML
E l XML
-
7/30/2019 11A Programming 1 HTML-Web
20/36
Example XML
You can use Internet Explorer to view XML in its
raw form (VIEW>SOURCE) Note the meaningful tags, like
E l RDF
-
7/30/2019 11A Programming 1 HTML-Web
21/36
Example RDF
RDF information is expressed in XML
This example describes the prior example Gives the title, author, creation date, and subject
These pieces of information are called metadata
because they are data about data
OWL E l
-
7/30/2019 11A Programming 1 HTML-Web
22/36
OWL Example OWL is also expressed in a form similar to XML
Things to note from the example:
a wine is a potable liquid produced by at least one maker of type winery
A wine is made from at least one type of grape (such grapes are restricted to wine grapes elsewhere in the ontology)
Wine
1
1
.........
L i M P
-
7/30/2019 11A Programming 1 HTML-Web
23/36
Learning Map - Pause
1. The Semantic Web
As it stands, HTML documents have lots of formatting but
little meaningful structure: Example, it is impossible to
search the web for a name and restrict the search just to
names of authors
Adding semantics to web pages would simplify both
searching and organization of information on the web;
automatic research tools could eliminate lots of the
drudgery
The semantic web is under development by W3C and
others and consists of XML, RDF, and OWL Next: More on XML
Last: English and the Web
XML Th U i l M k
-
7/30/2019 11A Programming 1 HTML-Web
24/36
XML The Universal Markup
A human and computer readable coding system
for tagging web documents In order to be human readable, both markup and
data are represented in character form
Can be used for a variety of purposes, but for thesemantic web is used for giving structure to raw
data
XML is a restricted form of SGML, the Standard
Generalized Markup Language (ISO Standard
#8879)
XML S kd
-
7/30/2019 11A Programming 1 HTML-Web
25/36
XML Smackdown
XML is bigger and more powerful than HTML
Anything HTML can do, XML can do better XML can be used to create specialized tools (such as
XML Signatures for storing and managing digital
signatures)
There is no easy or consistent way to do this in HTML
HTML has recently been eaten up by XHTML, which
expresses all of the HTML tags as proper XML tags
Eventually, HTML will become obsolete, and to be properlydisplayed, all pages will have to be XHTML compliant
XML M k it
-
7/30/2019 11A Programming 1 HTML-Web
26/36
XML Make it up as you go
IST195Summer 2010
Randy Wenner
Adjunct Professor
Hinds 010
Learning Map Pause
-
7/30/2019 11A Programming 1 HTML-Web
27/36
Learning Map - Pause
1. The Semantic Web
2. XML The universal markup language, poised to take
over pretty much everything having to do with
markup on the web
XML is the heart of the semantic web
English and the Web
English Language Country Categories
-
7/30/2019 11A Programming 1 HTML-Web
28/36
English Language Country Categories
Countries with English as a native language:
Examples: UK, USA, Australia, New Zealand, Canada
Countries with English as a second language:
Everyday or official usage; E.g., India, Singapore, Ghana
Countries with English as a frequently taught foreign
language
Examples: France, Germany, Netherlands
Countries with English as a foreign language:
English not generally spoken within the country Examples: Japan, Thailand, China
Where is the major growth coming from
in terms of new web content?
North America? Nope try again
-
7/30/2019 11A Programming 1 HTML-Web
29/36
North America? Nope, try again
N.A. does not have the most Internet users, and
Does not have the fastest growth in Internet useWORLD INTERNET USAGE AND POPULATION STATISTICS
World RegionsPopulation
( 2009 Est.)
Internet Users
Dec. 31, 2000
Internet Users
Latest Data
Penetration
(% Population)
Growth
2000-2009
Users %
of Table
Africa 991,002,342 4,514,400 86,217,900 8.7 % 1,809.8 % 4.8 %
Asia 3,808,070,503 114,304,000 764,435,900 20.1 % 568.8 % 42.4 %
Europe 803,850,858 105,096,093 425,773,571 53.0 % 305.1 % 23.6 %
Middle East 202,687,005 3,284,800 58,309,546 28.8 % 1,675.1 % 3.2 %
North America 340,831,831 108,096,800 259,561,000 76.2 % 140.1 % 14.4 %
Latin America/Caribbean 586,662,468 18,068,919 186,922,050 31.9 % 934.5 % 10.4 %
Oceania / Australia 34,700,201 7,620,480 21,110,490 60.8 % 177.0 % 1.2 %
But English is the Language of the Web
http://www.internetworldstats.com/stats1.htmhttp://www.internetworldstats.com/stats3.htmhttp://www.internetworldstats.com/stats4.htmhttp://www.internetworldstats.com/stats5.htmhttp://www.internetworldstats.com/stats14.htmhttp://www.internetworldstats.com/stats10.htmhttp://www.internetworldstats.com/stats6.htmhttp://www.internetworldstats.com/stats6.htmhttp://www.internetworldstats.com/stats10.htmhttp://www.internetworldstats.com/stats14.htmhttp://www.internetworldstats.com/stats5.htmhttp://www.internetworldstats.com/stats4.htmhttp://www.internetworldstats.com/stats3.htmhttp://www.internetworldstats.com/stats1.htm -
7/30/2019 11A Programming 1 HTML-Web
30/36
But English is the Language of the Web
Are you pleased? Better not get too happy!
Because the Internet was developed in the U.S.and had its earliest rapid growth there,originally most pages were in English
As the Internet has grown and the adoptionhas occurred more rapidly, the overall numberof English language pages has grown, but as aproportion of the total content on the web,
English has dropped
Top 10 Languages on the Web
http://www.internetworldstats.com/ -
7/30/2019 11A Programming 1 HTML-Web
31/36
Top 10 Languages on the Web (http://www.internetworldstats.com)
TOP TEN LANGUAGES
IN THE INTERNET
Internet
Users
by Language
Internet
Penetration
by Language
Growth
in Internet
(2000 - 2009)
Internet
Users
% of Total
World Population
for this Language
(2009 Estimate)
English 499,213,462 39.5 % 251.7 % 27.7 % 1,263,830,976
Chinese 407,650,713 29.7 % 1,162.0 % 22.6 % 1,373,859,774
Spanish 139,849,651 34.0 % 669.2 % 7.8 % 411,631,985
Japanese 95,979,000 75.5 % 103.9 % 5.3 % 127,078,679
Portuguese 77,569,900 31.4 % 923.9 % 4.3 % 247,223,493
German 72,337,310 75.0 % 161.1 % 4.0 % 96,389,702
Arabic 60,252,100 17.5 % 2,297.7 % 3.3 % 344,139,242
French 57,017,099 16.9 % 375.2 % 3.2 % 337,046,097
Russian 45,250,000 32.3 % 1,359.7 % 2.5 % 140,041,247
Korean 37,475,800 52.7 % 96.8 % 2.1 % 71,174,317
Likely Conclusions
http://www.internetworldstats.com/http://www.internetworldstats.com/languages.htmhttp://www.internetworldstats.com/stats17.htmhttp://www.internetworldstats.com/stats13.htmhttp://en.wikipedia.org/wiki/Japanese_languagehttp://en.wikipedia.org/wiki/Portugese_languagehttp://www.internetworldstats.com/stats18.htmhttp://www.internetworldstats.com/stats19.htmhttp://en.wikipedia.org/wiki/French_languagehttp://en.wikipedia.org/wiki/Russian_languagehttp://en.wikipedia.org/wiki/Corean_languagehttp://en.wikipedia.org/wiki/Corean_languagehttp://en.wikipedia.org/wiki/Russian_languagehttp://en.wikipedia.org/wiki/French_languagehttp://www.internetworldstats.com/stats19.htmhttp://www.internetworldstats.com/stats18.htmhttp://en.wikipedia.org/wiki/Portugese_languagehttp://en.wikipedia.org/wiki/Japanese_languagehttp://www.internetworldstats.com/stats13.htmhttp://www.internetworldstats.com/stats17.htmhttp://www.internetworldstats.com/languages.htmhttp://www.internetworldstats.com/ -
7/30/2019 11A Programming 1 HTML-Web
32/36
Likely Conclusions
Although the total number of English language
web pages will continue to grow, the proportionversus total pages will continue to drop
The proportional growth of pages in European
languages will also slow down
The proportional growth of pages in Chinese will
grow at an accelerating pace
The use for or necessity of automated and semi-
automated page translation will increasemarkedly over the coming ten years
Machine Translation
-
7/30/2019 11A Programming 1 HTML-Web
33/36
Machine Translation
Refers to the process of using a computer program to translate
from one language to another
The state of the art is still not as accurate or sophisticated as one
might like
Back-translation example from Babelfish:
Original text: In all of Syracuse University, there is not a finer instructor of
Information Technology than the highly-accomplished, intellectualoverachiever, and all-around good guy who is known as Randy Wenner.
English to Chinese:Wenner
Chinese back to English: West the grand total forces the doveSi university, compared to is called blue Wenner the high
success, the intelligence high achievement and the versatile
goodness does not have an information technology better
instructor.
Machine Translation and XML
-
7/30/2019 11A Programming 1 HTML-Web
34/36
Machine Translation and XML
Machine translation can be improved
by the use of XML and XML standards XML documents are much easier to
translate than other electronic
documents because they separate outform from content, and they conform to arigorous standard and defined syntax.
Web Based Translation Tools
-
7/30/2019 11A Programming 1 HTML-Web
35/36
Web-Based Translation Tools
Babelfish (http://babelfish.yahoo.com/)
Translates short text passagesAlso tryhttp://www.freetranslation.com/
Google
Google Translate (http://translate.google.com/) Translates text or URLs
Google Language Tools(http://www.google.com/language_tools?hl=en )
Search for pages in any of several dozen languages
Google Chrome (browser) will offer totranslate for you
Learning Map - Pause
http://babelfish.yahoo.com/http://www.freetranslation.com/http://translate.google.com/http://www.google.com/language_tools?hl=enhttp://www.google.com/language_tools?hl=enhttp://translate.google.com/http://www.freetranslation.com/http://babelfish.yahoo.com/ -
7/30/2019 11A Programming 1 HTML-Web
36/36
Learning Map - Pause
1. Semantic Web
2. XML
3. English and the Web
English is first language in relatively little of theworld population and a declining proportion of
Internet users The need will grow over the coming 5-10 years for
automated translation of web content to facilitateuse of foreign language web pages
XML can be used to develop tools and standardsthat will assist with the development of bettermachine translation