Traditional Electronic Printing On The Internet

47
MHE - Consultants for Document and Datament Technologies Traditional Electronic Printing On The Internet William J. “Bill” McCalpin EDPP, CDIA, MIT, LIT Principal, MHE

description

Traditional Electronic Printing On The Internet. William J. “Bill” McCalpin EDPP, CDIA, MIT, LIT Principal, MHE. Xplor 21st Global Conference and Exhibit Miami Beach, Florida October 30, 2000. Printing Versus The Internet. Printing Versus The Internet. - PowerPoint PPT Presentation

Transcript of Traditional Electronic Printing On The Internet

Page 1: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Traditional Electronic PrintingOn The Internet

William J. “Bill” McCalpin

EDPP, CDIA, MIT, LIT

Principal, MHE

Page 2: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Xplor 21st Global Conference and Exhibit

Miami Beach, Florida

October 30, 2000

Page 3: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Printing Versus The Internet

Page 4: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Printing Versus The Internet• Electronic printing is an $125,000,000,000 (US)

industry worldwide (www.xplor.org)

• There are now an estimated 98,685,000 host computers on the Internet (www.mids.org)

• Xplor International estimates that the production of paper documents and electronic documents is still increasing

• So, for a while yet, we’re living in a hybrid world

Page 5: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Printing Versus The Internet• Customer service needs identical look and

feel in paper and electronic documents

• Regulatory agencies continue to have an interest in document presentation

• Customers need a re-education process as documents change media

• Hence, there are good reasons in the short run to be concerned about presentation

Page 6: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

The Nature Of Print Streams

Page 7: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

EBCDIC Versus ASCII

• BCD - Binary Coded Decimal

• BCDIC - Binary Coded Decimal Interchange Code

• EBCDIC - IBM Extended Binary Coded Decimal Interchange Code

• ASCII - American Standard Code for Information Interchange

Page 8: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

EBCDIC Line Data

• EBCDIC encoded - 8 bit

• Record-oriented because of IBM OS’s

• Carriage controls– Machine carriage controls– ANSI carriage controls

Page 9: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

ASCII Line Data• ASCII encoded - 7 bit• ‘Record’ orientation is not intrinsic to OS• Text files use print controls to delimit records• Common print controls

– x’0d’ carriage return– x’0a’ line feed– x’0c’ form feed

Page 10: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

The EBCDIC Family Tree

• EBCDIC text• 1403 data - EBCDIC records with a carriage

control• LCDS - ‘Line conditioned’ data stream

– 3800 Mod I– 3211 data with Xerox DJDEs– Others

• AFP, MO:DCA, and IPDS

Page 11: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

The ASCII Family Tree• ASCII text• ASCII text with print controls• ASCII text with escape sequences

Epson MX-80 Xerox UDK (XES)

QMS QUIC IBM PPDS

HP PCL Xerox Metacode

• Print programming languages using ASCIIInterpress PostScript

Page 12: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Line Data And Conditioned Line Data

• 1403, 3211, other EBCDIC line data streams, including Xerox DJDE

• 3800 Mod I and other IBM data streams

• ASCII text files of all sorts

1 This is text

F44444E88A48A4A8AA

100000389209203573

F CL

F This is textRF

02222256672672767700

C00000489309304584DA

Page 13: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Print Data With Escape Sequences

• Epson and many other impact printers

• Xerox UDK (XES)• QMS QUIC• IBM PPDS• HP PCL• Xerox Metacode• AFP, MO:DCA, and

IPDS

X’01060001040002000154686973206973207465787401’

AMB 100 AMI 300 STO 0,90 SCFL 3 SVI 14 TRN “This is text”

Page 14: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Print Programming Languages

• Interpress• PostScript (and PDF)

%!PS-Adobe-2.0

%%Title: Blue Book Program 7, on page 157

%%EndComments/Times-Roman findfont 18 scalefont setfont

72 500 moveto

(This is text) show

...

Page 15: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

The Nature Of Internet Formats

Page 16: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Common Internet Formats

• The most commonly used data format on the Internet is HTML - HyperText Markup Language

• The next expected wave on the Internet is XML (eXtensible Markup Language) and its related standards such as XSL, SVG, etc.

• As a secondary standard, PDF is widely used to present static documents

Page 17: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

HTML

• HTML is an instance of SGML

• HTML has a set of 40 to 50 tags, which are “grammar” based

• HTML tags have default presentation characteristics, but these can be overridden with CSS (Cascading Style Sheets)

Page 18: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Sample HTML<!doctype html public "-//w3c//dtd html 4.0

transitional//en"><html><h1>Poison Ivy Vineyards</h1><p>Poison Ivy Vineyards is an experiment in growing

wine-quality grapes in a backyard in a residential neighborhood in Richardson, Texas. This website serves as a running diary of the steps I took to create the vineyard and - eventually - to make wine.</p>

</html>

Page 19: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

XML

• XML is eXtensible Markup Language, which means that you can make up the tags

• Since a browser can’t know how to format the tags, default formatting is in outline form

• Normally, you would use XSL (CSS) to describe how each tag is to be formatted

Page 20: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Sample XML<NAME>William J. "Bill" McCalpin, EDPP, CDIA, MIT,

LIT</NAME><JOBTITLE>Principal</JOBTITLE><AFFILIATION>MHE</AFFILIATION><ADDRESS><STREET>1400 Cheyenne Dr.</STREET><CITY>Richardson</CITY><STATE>Texas</STATE><ZIPCODE>75080</ZIPCODE><EMAIL>[email protected]</EMAIL></ADDRESS>

Page 21: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Sample XSL

This is an <emph>important</emph> point.

<xsl:template match="emph”>

<fo:sequence font-weight="bold”>

<xsl:process-children/>

</fo:sequence>

</xsl:template>

Page 22: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

PDF

• PDF is Adobe’s Portable Document Format

• PDF is a print stream, not an SGML instance

• PDF is similar to PostScript, but more portable, because it carries its own resources

• PDF provides good fidelity, at a price

Page 23: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Sample PDF%PDF-1.1...2 0 obj<</CreationDate (D:19960809191047)/Producer (Acrobat Distiller 2.1 for Windows)

/Creator (Adobe PageMaker 6.0)/Author (Doc)/Keywords ()/Title (bills)/Subject ()>>endobj

Page 24: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Limits Of Browsers

Page 25: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

A Normal HTML Page

Page 26: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Default Font Increased

Page 27: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Using Ghouly Solid

Page 28: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Adjusting The Fonts

Page 29: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Methods Of Moving Traditional Electronic Print To The Internet

Page 30: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Five Methods

• Conversion to PDF

• Rasterization to gif or jpeg

• Recomposition into HTML/XML

• “Conversion” to normal HTML/XML

• Translation to highly formatted HTML/XML

Page 31: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Conversion to PDF

• This is a print stream to print stream conversion

• The output in PDF usually looks very similar to the original printed document

• Many tools which create the PDF also add value, such as hypertext links, bookmarking, et cetera, to the PDF document

Page 32: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Pros And Cons Of PDF• Pros

– High fidelity to original document– Reader is widespread and free– Reasonably transportable– Widely used in some circles (e.g., IRS)

• Cons:– PDF files tend to be large– PDF documents are paper-sized centric– Browser requires a “plug-in”*

Page 33: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

PDF Sample%PDF-1.1...2 0 obj<</CreationDate (D:19960809191047)/Producer (Acrobat Distiller 2.1 for Windows)/Creator (Adobe PageMaker 6.0)/Author (Doc)/Keywords ()/Title (bills)/Subject ()>>endobj

Xploration Guidelines

Page 34: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Sources For * To PDF

• Composition Tools - create new PDF documents from source code

• Transforms - translate existing formatted print streams into PDF

• Larger Systems- composition or translation capabilities inserted transparently into document systems

• See Xplor Products and Services Reference Guide

Page 35: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Rasterization to gif or jpeg• The print stream is”rasterized”, that is,

converted to a bit map format– GIF: Graphical Interchange Format (GIF) - Invented

by CompuServe for graphics. Supports only 256 colors, or 8 bits.

– JPEG (Joint Photographic Experts Group) Specifically for more than 256 colors, with better compression, but is “lossey”

– Excellent discussion of each at http://www.efuse.com/Design/web_graphics_basics.html

Page 36: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Pros And Cons Of Rasterization• Pros:

– Image is exact copy of original document– Image can be viewed on any browser which takes

gifs and jpegs• Cons:

– Resolution is hardcoded at one size– There’s no text to search– Download is longer– No correspondence of printed pages and “HTML”

pages

Page 37: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Sample Rasterization

• This page was originally created in PDF, then rasterized, and converted to a jpeg

Page 38: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Recomposition into HTML/XML• Data is extracted from a print stream• Templates have been created in advance• The extracted data is merged into the

templates• There may be fewer or more output pages in

HTML than were in the print stream• Templates are built to be the most effective in

the browser window

Page 39: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Pros And Cons Of Recomposition• Pros:

– HTM/XMLL pages are well-suited for the browser– HTML/XML is considered by some to be simpler

than PDF

• Cons:– HTML/XML pages don’t necessarily match the

printed pages– All pages (templates) must be pre-composed

Page 40: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Sample Recomposition

• This document is a sample telephone bill which have been divided into 11 HTML pages

• Note how the HTML pages are divided by subject, not by page overflow

Page 41: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

“Conversion” to normal HTML/XML

• Both data and formatting information are extracted from the print file

• Some formats easily correspond to an HTML tag, e.g., a heading to <h1>

• More complex formatting can be approximated by the use of table tags

Page 42: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Pros And Cons of “Conversion”

• Pros:– HTML/XML pages look similar to printed

pages– Pages are in HTML/XML, not PDF or raster

• Cons:– Fidelity is approximate– Reader can substantially alter the presentation– Graphics may not be supported

Page 43: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Sample “Conversion”

Print FileHTML Document #1

Page 44: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Translation to highly formatted HTML/XML

• This method uses particular CSS commands to do “exact” placement of text in the window of the browser

• This is as close as XML gets (today) to being a print stream

• Fonts are still subject to user override

Page 45: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Pros And Cons Of Translation• Pros:

– Author has very good control over the presentation of text

• Cons:– Much of the value of a tagged language is lost– Portrait print pages still don’t fit on landscape

browser windows– May not work with all browsers– Fonts can still be overridden

Page 46: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

Sample Translation• <HTML>

• <HEAD>

• .ps9{position:absolute;top:676px;left:454px;width:65px;}

• .ps10{position:absolute;top:676px;left:535px;width:66px;}

• .ps11{position:absolute;top:676px;left:1102px;width:70px;}

• <SPAN CLASS="ps9"><NOBR>Balance</NOBR></SPAN>

• <SPAN CLASS="ps10"><NOBR>Forward</NOBR></SPAN>

• <SPAN CLASS="ps11"><NOBR>5,000.00</NOBR></SPAN>

Page 47: Traditional Electronic Printing On The Internet

MHE - Consultants for Document and Datament Technologies

William J. “Bill” McCalpin

EDPP, CDIA, MIT, LIT

Principal, MHE

1400 Cheyenne Dr.

Richardson, Texas 75080-3921

972-231-3660 (v) 972-690-4521 (f)

[email protected]