11A Programming 1 HTML-Web

download 11A Programming 1 HTML-Web

of 36

Transcript of 11A Programming 1 HTML-Web

  • 7/30/2019 11A Programming 1 HTML-Web

    1/36

    IST 195 Programming 1: HTML-Web

    The Semantic Web, XML, and English on the

    Web

    Prof. Randy Wenner

    Adapted from Professor Jeff Stanton

  • 7/30/2019 11A Programming 1 HTML-Web

    2/36

    Learning Map

    1. The Semantic Web

    Giving web page contents more meaning for peopleand computers

    2. XML

    One of the most important tools for creating thesemantic web

    3. English and the Web

    Challenges of many cultures, many pages, in manylanguages and how XML and the semantic web may

    help

  • 7/30/2019 11A Programming 1 HTML-Web

    3/36

    Semantic Web

    Semantic:

    Part of the structure of language relating to meaning,especially of words

    The Semantic Web: Web 3.0?

    An idea for the future of the WWW in whichinformation is tagged with information about its

    meaning rather than about its format

  • 7/30/2019 11A Programming 1 HTML-Web

    4/36

    The Web is not Semantic now

    Currently the web is a large collection of HTML

    documents and a bit of other stuff We know that HTML is a formatting language: It says

    where elements on the page should go and what they

    should look like

    Example: Zap Mama makes a second level

    heading, left justified, bold, larger font

    HTML does not actually say what anything actually is

    what kind of information it is You cant tell from the tag what Zap Mama

    signifies. Is it a command? A label? A name?

  • 7/30/2019 11A Programming 1 HTML-Web

    5/36

    Where we are Today: the Syntactic Web

    [Hendler & Miller 02]

  • 7/30/2019 11A Programming 1 HTML-Web

    6/36

    The Syntactic Web is A hypermedia, a digital library

    A library of documents called (web pages) interconnected by a

    hypermedia of links A database, an application platform

    A common portal to applications accessible through webpages, and presenting their results as web pages

    A platform for multimedia BBC Radio 4 anywhere in the world! Terminator trailers!

    A naming scheme Unique identity for those documents

    A place where computers do the presentation (easy)and people do the linking and interpreting (hard).

    Why not get computers to do more of the hard work?

    [Goble 03]

  • 7/30/2019 11A Programming 1 HTML-Web

    7/36

    Hard Work using the Syntactic Web

    Find image of Buzz Shaw (SU former chancellor)

    http://www.buzzbutt.com/

    html/shaw_party.html

    http://www.buzzbutt.com/html/shaw_party.htmlhttp://www.buzzbutt.com/html/shaw_party.htmlhttp://www.buzzbutt.com/html/shaw_party.htmlhttp://www.buzzbutt.com/html/shaw_party.html
  • 7/30/2019 11A Programming 1 HTML-Web

    8/36

    What is the Problem?

    Consider a typical web

    page:

    Markup consists

    of:

    renderinginformation (e.g.,

    font size and

    color)

    Hyperlinks to

    related content

    Semantic content

    is accessible to

    humans but not

    (easily) tocomputers

  • 7/30/2019 11A Programming 1 HTML-Web

    9/36

    What information we seeWWW 2002

    The eleventh international world wide web conference

    Sheraton Waikiki hotel

    Honolulu, Hawaii, USA7-11 may 2002

    1 location 5 days learn interact

    Registered participants coming from

    australia, canada, chile denmark, france, germany, ghana, hong kong, india,

    ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore,switzerland, the united kingdom, the united states, vietnam, zaire

    Register now

    On the 7th May Honolulu will provide the backdrop of the eleventh internationalworld wide web conference. This prestigious event

    Speakers confirmed

    Tim Berners-LeeTim is the well known inventor of the Web,

    Ian Foster

    Ian is the pioneer of the Grid, the next generation internet

  • 7/30/2019 11A Programming 1 HTML-Web

    10/36

    What information a machine seesWWW2002The eleventh inteqnational woqld wide webconfeqenceSheqaton waikiki hotelHonolulu, hawaii, USA7-11 may 20021 location 5 days leaqn inteqactRegisteqed paqticipants coming fqomaustqalia, canada, chile denmaqk, fqance,geqmany, ghana, hong kong, india,iqeland, italy, japan, malta, new zealand,the netheqlands, noqway, singapoqe,switzeqland, the united kingdom, the unitedstates, vietnam, zaiqe

    Registeq nowOn the 7th May Honolulu will pqovide thebackdqop of the eleventh inteqnational woqldwide web confeqence. This pqestigious event Speakeqs confiqmedTim beqneqs-leeTim is the well known inventoq of the Web, Ian FosteqIan is the pioneeq of the Gqid, the nextgeneqation inteqnet

  • 7/30/2019 11A Programming 1 HTML-Web

    11/36

    So, if a machine sees garble

    You cant ask it

    Whos speaking at the conference? What countries will be represented?

    What dates is the conference being held?

    etc.

  • 7/30/2019 11A Programming 1 HTML-Web

    12/36

    The Semantic Web aims to solve this

    Rather than describing formatting, tags would

    designate what kind of information a piece ofinformation was

    Rather than discarding the internal organization of the

    data when placing it on web pages, authors would

    keep the natural structure of the data

    Tags like Zap Mama would replace thecurrent HTML strategy of tagging the format of the

    information

    Artist Title Courtesy

    Beastie Boys Now Get Busy Beastie Boys appear courtesy of Beastie Boys and Capitol Records.

    David Byrne My Fair Lady David Byrne appears courtesy of Nonesuch Records.

    Zap Mama Wadidyusay? Zap Mama appears courtesy of Luaka Bop Records.

  • 7/30/2019 11A Programming 1 HTML-Web

    13/36

    The Semantic Web was always the goal

    Web was invented byTim Berners-Lee (amongst others), a

    physicist working at CERN

    TBLs original vision of the Web was much more ambitious than

    the reality of the existing (syntactic) Web:

    TBL (and others) have since been working towards realizing this

    vision, which has become known as the Semantic Web

    article in May 2001 issue of Scientific American

    ... a goal of the Web was that, if the interaction between person and hypertext

    could be so intuitive that the machine-readable information space gave an

    accurate representation of the state of people's thoughts, interactions, and workpatterns, then machine analysis could become a very powerful management

    tool, seeing patterns in our work and facilitating our working together through the

    typical problems which beset the management of large organizations.

    http://www.w3.org/People/Berners-Lee/
  • 7/30/2019 11A Programming 1 HTML-Web

    14/36

    More on the Semantic Web

  • 7/30/2019 11A Programming 1 HTML-Web

    15/36

    Oh Happy Day!

    The Semantic Web is under development

    Three major components XML Extensible markup language

    for tagging the structure of the data

    RDFResource description framework

    a way to break knowledge down into small pieces, with some rules

    about the meaning of those pieces

    Goal: to have a method so simple that it can express any fact, and yet

    so structured that computer applications can do useful things with

    knowledge expressed in RDF

    OWL Web Ontology Language

    for describing the big picture about how data elements on one or more

    pages all fit together and relate to one another

  • 7/30/2019 11A Programming 1 HTML-Web

    16/36

  • 7/30/2019 11A Programming 1 HTML-Web

    17/36

    Of course, we can't be

    drawing our way through the

    Semantic Web, so insteadhow about a table-style

    representation for the graph?

    Each row represents an arrow

    (an edge) in the figure. Thefirst column has the name of

    the node at the start of the

    edge. The second column has

    the label of the edge itself(the kind of edge). The third

    column has the name of the

    node at the end of the arrow.

    Start Node Edge Label End Node

    vincent_donofrio starred_in law_&_order_ci

    law_&_order_ci is_a tv_show

    the_thirteenth_floor similar_plot_as the_matrix

    ...

    l X L

  • 7/30/2019 11A Programming 1 HTML-Web

    18/36

    Example XML

    The following text may look identical in a

    browser

    E l XML

  • 7/30/2019 11A Programming 1 HTML-Web

    19/36

    Example XML

    But its quite different under the hood.

    See how the XML differs from the HTML?

    HTML

    XML

    E l XML

  • 7/30/2019 11A Programming 1 HTML-Web

    20/36

    Example XML

    You can use Internet Explorer to view XML in its

    raw form (VIEW>SOURCE) Note the meaningful tags, like

    E l RDF

  • 7/30/2019 11A Programming 1 HTML-Web

    21/36

    Example RDF

    RDF information is expressed in XML

    This example describes the prior example Gives the title, author, creation date, and subject

    These pieces of information are called metadata

    because they are data about data

    OWL E l

  • 7/30/2019 11A Programming 1 HTML-Web

    22/36

    OWL Example OWL is also expressed in a form similar to XML

    Things to note from the example:

    a wine is a potable liquid produced by at least one maker of type winery

    A wine is made from at least one type of grape (such grapes are restricted to wine grapes elsewhere in the ontology)

    Wine

    1

    1

    .........

    L i M P

  • 7/30/2019 11A Programming 1 HTML-Web

    23/36

    Learning Map - Pause

    1. The Semantic Web

    As it stands, HTML documents have lots of formatting but

    little meaningful structure: Example, it is impossible to

    search the web for a name and restrict the search just to

    names of authors

    Adding semantics to web pages would simplify both

    searching and organization of information on the web;

    automatic research tools could eliminate lots of the

    drudgery

    The semantic web is under development by W3C and

    others and consists of XML, RDF, and OWL Next: More on XML

    Last: English and the Web

    XML Th U i l M k

  • 7/30/2019 11A Programming 1 HTML-Web

    24/36

    XML The Universal Markup

    A human and computer readable coding system

    for tagging web documents In order to be human readable, both markup and

    data are represented in character form

    Can be used for a variety of purposes, but for thesemantic web is used for giving structure to raw

    data

    XML is a restricted form of SGML, the Standard

    Generalized Markup Language (ISO Standard

    #8879)

    XML S kd

  • 7/30/2019 11A Programming 1 HTML-Web

    25/36

    XML Smackdown

    XML is bigger and more powerful than HTML

    Anything HTML can do, XML can do better XML can be used to create specialized tools (such as

    XML Signatures for storing and managing digital

    signatures)

    There is no easy or consistent way to do this in HTML

    HTML has recently been eaten up by XHTML, which

    expresses all of the HTML tags as proper XML tags

    Eventually, HTML will become obsolete, and to be properlydisplayed, all pages will have to be XHTML compliant

    XML M k it

  • 7/30/2019 11A Programming 1 HTML-Web

    26/36

    XML Make it up as you go

    IST195Summer 2010

    Randy Wenner

    Adjunct Professor

    Hinds 010

    Learning Map Pause

  • 7/30/2019 11A Programming 1 HTML-Web

    27/36

    Learning Map - Pause

    1. The Semantic Web

    2. XML The universal markup language, poised to take

    over pretty much everything having to do with

    markup on the web

    XML is the heart of the semantic web

    English and the Web

    English Language Country Categories

  • 7/30/2019 11A Programming 1 HTML-Web

    28/36

    English Language Country Categories

    Countries with English as a native language:

    Examples: UK, USA, Australia, New Zealand, Canada

    Countries with English as a second language:

    Everyday or official usage; E.g., India, Singapore, Ghana

    Countries with English as a frequently taught foreign

    language

    Examples: France, Germany, Netherlands

    Countries with English as a foreign language:

    English not generally spoken within the country Examples: Japan, Thailand, China

    Where is the major growth coming from

    in terms of new web content?

    North America? Nope try again

  • 7/30/2019 11A Programming 1 HTML-Web

    29/36

    North America? Nope, try again

    N.A. does not have the most Internet users, and

    Does not have the fastest growth in Internet useWORLD INTERNET USAGE AND POPULATION STATISTICS

    World RegionsPopulation

    ( 2009 Est.)

    Internet Users

    Dec. 31, 2000

    Internet Users

    Latest Data

    Penetration

    (% Population)

    Growth

    2000-2009

    Users %

    of Table

    Africa 991,002,342 4,514,400 86,217,900 8.7 % 1,809.8 % 4.8 %

    Asia 3,808,070,503 114,304,000 764,435,900 20.1 % 568.8 % 42.4 %

    Europe 803,850,858 105,096,093 425,773,571 53.0 % 305.1 % 23.6 %

    Middle East 202,687,005 3,284,800 58,309,546 28.8 % 1,675.1 % 3.2 %

    North America 340,831,831 108,096,800 259,561,000 76.2 % 140.1 % 14.4 %

    Latin America/Caribbean 586,662,468 18,068,919 186,922,050 31.9 % 934.5 % 10.4 %

    Oceania / Australia 34,700,201 7,620,480 21,110,490 60.8 % 177.0 % 1.2 %

    But English is the Language of the Web

    http://www.internetworldstats.com/stats1.htmhttp://www.internetworldstats.com/stats3.htmhttp://www.internetworldstats.com/stats4.htmhttp://www.internetworldstats.com/stats5.htmhttp://www.internetworldstats.com/stats14.htmhttp://www.internetworldstats.com/stats10.htmhttp://www.internetworldstats.com/stats6.htmhttp://www.internetworldstats.com/stats6.htmhttp://www.internetworldstats.com/stats10.htmhttp://www.internetworldstats.com/stats14.htmhttp://www.internetworldstats.com/stats5.htmhttp://www.internetworldstats.com/stats4.htmhttp://www.internetworldstats.com/stats3.htmhttp://www.internetworldstats.com/stats1.htm
  • 7/30/2019 11A Programming 1 HTML-Web

    30/36

    But English is the Language of the Web

    Are you pleased? Better not get too happy!

    Because the Internet was developed in the U.S.and had its earliest rapid growth there,originally most pages were in English

    As the Internet has grown and the adoptionhas occurred more rapidly, the overall numberof English language pages has grown, but as aproportion of the total content on the web,

    English has dropped

    Top 10 Languages on the Web

    http://www.internetworldstats.com/
  • 7/30/2019 11A Programming 1 HTML-Web

    31/36

    Top 10 Languages on the Web (http://www.internetworldstats.com)

    TOP TEN LANGUAGES

    IN THE INTERNET

    Internet

    Users

    by Language

    Internet

    Penetration

    by Language

    Growth

    in Internet

    (2000 - 2009)

    Internet

    Users

    % of Total

    World Population

    for this Language

    (2009 Estimate)

    English 499,213,462 39.5 % 251.7 % 27.7 % 1,263,830,976

    Chinese 407,650,713 29.7 % 1,162.0 % 22.6 % 1,373,859,774

    Spanish 139,849,651 34.0 % 669.2 % 7.8 % 411,631,985

    Japanese 95,979,000 75.5 % 103.9 % 5.3 % 127,078,679

    Portuguese 77,569,900 31.4 % 923.9 % 4.3 % 247,223,493

    German 72,337,310 75.0 % 161.1 % 4.0 % 96,389,702

    Arabic 60,252,100 17.5 % 2,297.7 % 3.3 % 344,139,242

    French 57,017,099 16.9 % 375.2 % 3.2 % 337,046,097

    Russian 45,250,000 32.3 % 1,359.7 % 2.5 % 140,041,247

    Korean 37,475,800 52.7 % 96.8 % 2.1 % 71,174,317

    Likely Conclusions

    http://www.internetworldstats.com/http://www.internetworldstats.com/languages.htmhttp://www.internetworldstats.com/stats17.htmhttp://www.internetworldstats.com/stats13.htmhttp://en.wikipedia.org/wiki/Japanese_languagehttp://en.wikipedia.org/wiki/Portugese_languagehttp://www.internetworldstats.com/stats18.htmhttp://www.internetworldstats.com/stats19.htmhttp://en.wikipedia.org/wiki/French_languagehttp://en.wikipedia.org/wiki/Russian_languagehttp://en.wikipedia.org/wiki/Corean_languagehttp://en.wikipedia.org/wiki/Corean_languagehttp://en.wikipedia.org/wiki/Russian_languagehttp://en.wikipedia.org/wiki/French_languagehttp://www.internetworldstats.com/stats19.htmhttp://www.internetworldstats.com/stats18.htmhttp://en.wikipedia.org/wiki/Portugese_languagehttp://en.wikipedia.org/wiki/Japanese_languagehttp://www.internetworldstats.com/stats13.htmhttp://www.internetworldstats.com/stats17.htmhttp://www.internetworldstats.com/languages.htmhttp://www.internetworldstats.com/
  • 7/30/2019 11A Programming 1 HTML-Web

    32/36

    Likely Conclusions

    Although the total number of English language

    web pages will continue to grow, the proportionversus total pages will continue to drop

    The proportional growth of pages in European

    languages will also slow down

    The proportional growth of pages in Chinese will

    grow at an accelerating pace

    The use for or necessity of automated and semi-

    automated page translation will increasemarkedly over the coming ten years

    Machine Translation

  • 7/30/2019 11A Programming 1 HTML-Web

    33/36

    Machine Translation

    Refers to the process of using a computer program to translate

    from one language to another

    The state of the art is still not as accurate or sophisticated as one

    might like

    Back-translation example from Babelfish:

    Original text: In all of Syracuse University, there is not a finer instructor of

    Information Technology than the highly-accomplished, intellectualoverachiever, and all-around good guy who is known as Randy Wenner.

    English to Chinese:Wenner

    Chinese back to English: West the grand total forces the doveSi university, compared to is called blue Wenner the high

    success, the intelligence high achievement and the versatile

    goodness does not have an information technology better

    instructor.

    Machine Translation and XML

  • 7/30/2019 11A Programming 1 HTML-Web

    34/36

    Machine Translation and XML

    Machine translation can be improved

    by the use of XML and XML standards XML documents are much easier to

    translate than other electronic

    documents because they separate outform from content, and they conform to arigorous standard and defined syntax.

    Web Based Translation Tools

  • 7/30/2019 11A Programming 1 HTML-Web

    35/36

    Web-Based Translation Tools

    Babelfish (http://babelfish.yahoo.com/)

    Translates short text passagesAlso tryhttp://www.freetranslation.com/

    Google

    Google Translate (http://translate.google.com/) Translates text or URLs

    Google Language Tools(http://www.google.com/language_tools?hl=en )

    Search for pages in any of several dozen languages

    Google Chrome (browser) will offer totranslate for you

    Learning Map - Pause

    http://babelfish.yahoo.com/http://www.freetranslation.com/http://translate.google.com/http://www.google.com/language_tools?hl=enhttp://www.google.com/language_tools?hl=enhttp://translate.google.com/http://www.freetranslation.com/http://babelfish.yahoo.com/
  • 7/30/2019 11A Programming 1 HTML-Web

    36/36

    Learning Map - Pause

    1. Semantic Web

    2. XML

    3. English and the Web

    English is first language in relatively little of theworld population and a declining proportion of

    Internet users The need will grow over the coming 5-10 years for

    automated translation of web content to facilitateuse of foreign language web pages

    XML can be used to develop tools and standardsthat will assist with the development of bettermachine translation