Traditional Electronic Printing On The Internet
description
Transcript of Traditional Electronic Printing On The Internet
MHE - Consultants for Document and Datament Technologies
Traditional Electronic PrintingOn The Internet
William J. “Bill” McCalpin
EDPP, CDIA, MIT, LIT
Principal, MHE
MHE - Consultants for Document and Datament Technologies
Xplor 21st Global Conference and Exhibit
Miami Beach, Florida
October 30, 2000
MHE - Consultants for Document and Datament Technologies
Printing Versus The Internet
MHE - Consultants for Document and Datament Technologies
Printing Versus The Internet• Electronic printing is an $125,000,000,000 (US)
industry worldwide (www.xplor.org)
• There are now an estimated 98,685,000 host computers on the Internet (www.mids.org)
• Xplor International estimates that the production of paper documents and electronic documents is still increasing
• So, for a while yet, we’re living in a hybrid world
MHE - Consultants for Document and Datament Technologies
Printing Versus The Internet• Customer service needs identical look and
feel in paper and electronic documents
• Regulatory agencies continue to have an interest in document presentation
• Customers need a re-education process as documents change media
• Hence, there are good reasons in the short run to be concerned about presentation
MHE - Consultants for Document and Datament Technologies
The Nature Of Print Streams
MHE - Consultants for Document and Datament Technologies
EBCDIC Versus ASCII
• BCD - Binary Coded Decimal
• BCDIC - Binary Coded Decimal Interchange Code
• EBCDIC - IBM Extended Binary Coded Decimal Interchange Code
• ASCII - American Standard Code for Information Interchange
MHE - Consultants for Document and Datament Technologies
EBCDIC Line Data
• EBCDIC encoded - 8 bit
• Record-oriented because of IBM OS’s
• Carriage controls– Machine carriage controls– ANSI carriage controls
MHE - Consultants for Document and Datament Technologies
ASCII Line Data• ASCII encoded - 7 bit• ‘Record’ orientation is not intrinsic to OS• Text files use print controls to delimit records• Common print controls
– x’0d’ carriage return– x’0a’ line feed– x’0c’ form feed
MHE - Consultants for Document and Datament Technologies
The EBCDIC Family Tree
• EBCDIC text• 1403 data - EBCDIC records with a carriage
control• LCDS - ‘Line conditioned’ data stream
– 3800 Mod I– 3211 data with Xerox DJDEs– Others
• AFP, MO:DCA, and IPDS
MHE - Consultants for Document and Datament Technologies
The ASCII Family Tree• ASCII text• ASCII text with print controls• ASCII text with escape sequences
Epson MX-80 Xerox UDK (XES)
QMS QUIC IBM PPDS
HP PCL Xerox Metacode
• Print programming languages using ASCIIInterpress PostScript
MHE - Consultants for Document and Datament Technologies
Line Data And Conditioned Line Data
• 1403, 3211, other EBCDIC line data streams, including Xerox DJDE
• 3800 Mod I and other IBM data streams
• ASCII text files of all sorts
1 This is text
F44444E88A48A4A8AA
100000389209203573
F CL
F This is textRF
02222256672672767700
C00000489309304584DA
MHE - Consultants for Document and Datament Technologies
Print Data With Escape Sequences
• Epson and many other impact printers
• Xerox UDK (XES)• QMS QUIC• IBM PPDS• HP PCL• Xerox Metacode• AFP, MO:DCA, and
IPDS
X’01060001040002000154686973206973207465787401’
AMB 100 AMI 300 STO 0,90 SCFL 3 SVI 14 TRN “This is text”
MHE - Consultants for Document and Datament Technologies
Print Programming Languages
• Interpress• PostScript (and PDF)
%!PS-Adobe-2.0
%%Title: Blue Book Program 7, on page 157
%%EndComments/Times-Roman findfont 18 scalefont setfont
72 500 moveto
(This is text) show
...
MHE - Consultants for Document and Datament Technologies
The Nature Of Internet Formats
MHE - Consultants for Document and Datament Technologies
Common Internet Formats
• The most commonly used data format on the Internet is HTML - HyperText Markup Language
• The next expected wave on the Internet is XML (eXtensible Markup Language) and its related standards such as XSL, SVG, etc.
• As a secondary standard, PDF is widely used to present static documents
MHE - Consultants for Document and Datament Technologies
HTML
• HTML is an instance of SGML
• HTML has a set of 40 to 50 tags, which are “grammar” based
• HTML tags have default presentation characteristics, but these can be overridden with CSS (Cascading Style Sheets)
MHE - Consultants for Document and Datament Technologies
Sample HTML<!doctype html public "-//w3c//dtd html 4.0
transitional//en"><html><h1>Poison Ivy Vineyards</h1><p>Poison Ivy Vineyards is an experiment in growing
wine-quality grapes in a backyard in a residential neighborhood in Richardson, Texas. This website serves as a running diary of the steps I took to create the vineyard and - eventually - to make wine.</p>
</html>
MHE - Consultants for Document and Datament Technologies
XML
• XML is eXtensible Markup Language, which means that you can make up the tags
• Since a browser can’t know how to format the tags, default formatting is in outline form
• Normally, you would use XSL (CSS) to describe how each tag is to be formatted
MHE - Consultants for Document and Datament Technologies
Sample XML<NAME>William J. "Bill" McCalpin, EDPP, CDIA, MIT,
LIT</NAME><JOBTITLE>Principal</JOBTITLE><AFFILIATION>MHE</AFFILIATION><ADDRESS><STREET>1400 Cheyenne Dr.</STREET><CITY>Richardson</CITY><STATE>Texas</STATE><ZIPCODE>75080</ZIPCODE><EMAIL>[email protected]</EMAIL></ADDRESS>
MHE - Consultants for Document and Datament Technologies
Sample XSL
This is an <emph>important</emph> point.
<xsl:template match="emph”>
<fo:sequence font-weight="bold”>
<xsl:process-children/>
</fo:sequence>
</xsl:template>
MHE - Consultants for Document and Datament Technologies
• PDF is Adobe’s Portable Document Format
• PDF is a print stream, not an SGML instance
• PDF is similar to PostScript, but more portable, because it carries its own resources
• PDF provides good fidelity, at a price
MHE - Consultants for Document and Datament Technologies
Sample PDF%PDF-1.1...2 0 obj<</CreationDate (D:19960809191047)/Producer (Acrobat Distiller 2.1 for Windows)
/Creator (Adobe PageMaker 6.0)/Author (Doc)/Keywords ()/Title (bills)/Subject ()>>endobj
MHE - Consultants for Document and Datament Technologies
Limits Of Browsers
MHE - Consultants for Document and Datament Technologies
A Normal HTML Page
MHE - Consultants for Document and Datament Technologies
Default Font Increased
MHE - Consultants for Document and Datament Technologies
Using Ghouly Solid
MHE - Consultants for Document and Datament Technologies
Adjusting The Fonts
MHE - Consultants for Document and Datament Technologies
Methods Of Moving Traditional Electronic Print To The Internet
MHE - Consultants for Document and Datament Technologies
Five Methods
• Conversion to PDF
• Rasterization to gif or jpeg
• Recomposition into HTML/XML
• “Conversion” to normal HTML/XML
• Translation to highly formatted HTML/XML
MHE - Consultants for Document and Datament Technologies
Conversion to PDF
• This is a print stream to print stream conversion
• The output in PDF usually looks very similar to the original printed document
• Many tools which create the PDF also add value, such as hypertext links, bookmarking, et cetera, to the PDF document
MHE - Consultants for Document and Datament Technologies
Pros And Cons Of PDF• Pros
– High fidelity to original document– Reader is widespread and free– Reasonably transportable– Widely used in some circles (e.g., IRS)
• Cons:– PDF files tend to be large– PDF documents are paper-sized centric– Browser requires a “plug-in”*
MHE - Consultants for Document and Datament Technologies
PDF Sample%PDF-1.1...2 0 obj<</CreationDate (D:19960809191047)/Producer (Acrobat Distiller 2.1 for Windows)/Creator (Adobe PageMaker 6.0)/Author (Doc)/Keywords ()/Title (bills)/Subject ()>>endobj
Xploration Guidelines
MHE - Consultants for Document and Datament Technologies
Sources For * To PDF
• Composition Tools - create new PDF documents from source code
• Transforms - translate existing formatted print streams into PDF
• Larger Systems- composition or translation capabilities inserted transparently into document systems
• See Xplor Products and Services Reference Guide
MHE - Consultants for Document and Datament Technologies
Rasterization to gif or jpeg• The print stream is”rasterized”, that is,
converted to a bit map format– GIF: Graphical Interchange Format (GIF) - Invented
by CompuServe for graphics. Supports only 256 colors, or 8 bits.
– JPEG (Joint Photographic Experts Group) Specifically for more than 256 colors, with better compression, but is “lossey”
– Excellent discussion of each at http://www.efuse.com/Design/web_graphics_basics.html
MHE - Consultants for Document and Datament Technologies
Pros And Cons Of Rasterization• Pros:
– Image is exact copy of original document– Image can be viewed on any browser which takes
gifs and jpegs• Cons:
– Resolution is hardcoded at one size– There’s no text to search– Download is longer– No correspondence of printed pages and “HTML”
pages
MHE - Consultants for Document and Datament Technologies
Sample Rasterization
• This page was originally created in PDF, then rasterized, and converted to a jpeg
MHE - Consultants for Document and Datament Technologies
Recomposition into HTML/XML• Data is extracted from a print stream• Templates have been created in advance• The extracted data is merged into the
templates• There may be fewer or more output pages in
HTML than were in the print stream• Templates are built to be the most effective in
the browser window
MHE - Consultants for Document and Datament Technologies
Pros And Cons Of Recomposition• Pros:
– HTM/XMLL pages are well-suited for the browser– HTML/XML is considered by some to be simpler
than PDF
• Cons:– HTML/XML pages don’t necessarily match the
printed pages– All pages (templates) must be pre-composed
MHE - Consultants for Document and Datament Technologies
Sample Recomposition
• This document is a sample telephone bill which have been divided into 11 HTML pages
• Note how the HTML pages are divided by subject, not by page overflow
MHE - Consultants for Document and Datament Technologies
“Conversion” to normal HTML/XML
• Both data and formatting information are extracted from the print file
• Some formats easily correspond to an HTML tag, e.g., a heading to <h1>
• More complex formatting can be approximated by the use of table tags
MHE - Consultants for Document and Datament Technologies
Pros And Cons of “Conversion”
• Pros:– HTML/XML pages look similar to printed
pages– Pages are in HTML/XML, not PDF or raster
• Cons:– Fidelity is approximate– Reader can substantially alter the presentation– Graphics may not be supported
MHE - Consultants for Document and Datament Technologies
Sample “Conversion”
Print FileHTML Document #1
MHE - Consultants for Document and Datament Technologies
Translation to highly formatted HTML/XML
• This method uses particular CSS commands to do “exact” placement of text in the window of the browser
• This is as close as XML gets (today) to being a print stream
• Fonts are still subject to user override
MHE - Consultants for Document and Datament Technologies
Pros And Cons Of Translation• Pros:
– Author has very good control over the presentation of text
• Cons:– Much of the value of a tagged language is lost– Portrait print pages still don’t fit on landscape
browser windows– May not work with all browsers– Fonts can still be overridden
MHE - Consultants for Document and Datament Technologies
Sample Translation• <HTML>
• <HEAD>
• .ps9{position:absolute;top:676px;left:454px;width:65px;}
• .ps10{position:absolute;top:676px;left:535px;width:66px;}
• .ps11{position:absolute;top:676px;left:1102px;width:70px;}
• <SPAN CLASS="ps9"><NOBR>Balance</NOBR></SPAN>
• <SPAN CLASS="ps10"><NOBR>Forward</NOBR></SPAN>
• <SPAN CLASS="ps11"><NOBR>5,000.00</NOBR></SPAN>
MHE - Consultants for Document and Datament Technologies
William J. “Bill” McCalpin
EDPP, CDIA, MIT, LIT
Principal, MHE
1400 Cheyenne Dr.
Richardson, Texas 75080-3921
972-231-3660 (v) 972-690-4521 (f)