PDF.JS at SwissJeese 2012
-
Upload
julian-viereck -
Category
Technology
-
view
1.707 -
download
1
description
Transcript of PDF.JS at SwissJeese 2012
Julian Viereck
@jviereck+julian.viereck
Overview
• What is PDF.JS about
• How PDF is structured & processing in PDF.JS
• “Why are you doing this?”
• Firefox Integration
• What’s next?
• Demo
• Q & A
5
10
15
5
5
15
5
BespinSkywriter
Ace
FirefoxDevTools
ETH Zurich
(Physics)PDF.JS
?
About me
PDF Viewerusing
OpenWebStandards
What is PDF.JS
• building faithful & efficient PDF viewer
• HTML5 technology experiment
• no native code
• secure (web sandbox)
• Mozilla Labs Project - Open Source (Github)
What is PDF.JS
• Not Firefox-Specific - all modern browsers
• 1.3 MB uncompressed JS
• ~ 33`000 lines of code
• viewer in different languages
• async API
root objID, xRef byte offset
root obj = ref to pages catalog
How PDF is structuredHeader
Body
[Objects]
xRef Table
Trailer
sequence of objets
fonts, drawing cmds, images, words, bookmarks, form fields
mapping objID ⇔ byte offset
PDF version
PDF file
Let’s look at it
CanvasGraphics
PartialEvaluator
Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)
• page.startRendering(graphics)
• read & convert all PDF cmds ➟ OL
• load required objects (fonts, images)
• graphics.executeOperatorList(OL)
OperationList
Execution ExamplePartial
Evaluator
draw(obj#3, dict.x, dict.y
)
“get page 2”Data
Graphics
buildsobj#3?dict.x, .y?
obj#3 = ”foo”x = 20y = 30
draw oncanvas
drawing cmds
Problem Processing
• Extracting data slow (compressed)
• Transform data (images) slow
• Sometimes a lot of objects on page
➡ Freezes UI
➡ Use WebWorker
➡ :( no direct memory access, postMessage
PartialEvaluator
draw(obj#3, dict.x, dict.y
)
Data
Graphics
builds
draw oncanvas
Data“get page 2”
data
draw(“foo”, 20, 30
)
MainThread
Web Worker
OpListOperation
List + Data
setGState: [ LW: 10 ]dependency: [ font0 ]setFont: font0, 12beginTextmoveText: 100, 700showText: “Hello World!”endTextmoveTo: 50, 600lineTo: 400, 600stroke
5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj Graphics
PartialEvaluator xRef, catalog, resources+ OL
Images• JPEG streams:
• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));
• If not JPEG stream:
• read bytes, convert to colorspace
• imgData = canvas.getImageData()
• fillWithPixelData(bytes, imgData)
• canvas.putImageData(imgData)
Jpeg, but...
• no natives support for Jpeg 2000, CMYK
➡ use JS implementation
‣ works, not that performant but good enough
Fonts
• There are lots of different font formats!
• fonts are converted to OpenType
• use CSS for loading: @font-face { font-family:'font0'; src:url(data:font/opentype;base64, ...)
• Fonts are sanitized by browser
• Need to rebuild malformed fonts :/
“Why are you doing this?”
aka. ∃ C/C++ libraries= isn’t that faster?
“Performance is not the only measure”
1. Security
Most vulnerable programs
Source: http://www.csis.dk/en/csis/news/3321
~ 25% crashes in Firefox are Plugin related
2. WebSpecific Viewer
3. Drive Innovation
4. Speed
4. Speed
• Rendering slower then C/C++
• BUT
• Partial downloading
• Render page in background
• Make slow become faster
• Mostly: Good enough
5. Can do better
6. Push WebPlatform
B2G aka. Boot2Gecko
New API: Printing
• Printing very limited on the web right now
• no way to achieve native printing experience
• NEED: New API for printing
• mozPrintCallback
• define canvas content during printing
• send drawing commands directly to printer
WebPagePrint
Single Pages
• Find print canvas on page
• Execute printCallback
• All canvas done ➠ print page
Page 1
Page 2
canvas.mozPrintCallback
Firefox Integration
Firefox Integration
• PDF.JS as bundled Addon in Firefox Nightly
• Getting in Release Channel is hard
• 400M users have expectations
• more testing coverage
• accessibility
• match UX expectation
• fallback if something is not working
Firefox Integration
• Try to make it till Aurora Merge (6/5)
• Firefox Specific, BUT
• improving quality browser independent
• only small parts Firefox specific
What’s next
• Fix broken PDFs
• Improve performance
• Improve Text selection
• Text search
• Form support
• Printing support
Demo
Contributing
• Lots of areas
• Translation
• Writing Code (embeddable viewer?)
• Testing (Firefox Auto-Update Addon)
Github: https://github.com/mozilla/pdf.js
Twitter: @pdfjs
Mailing List: https://groups.google.com/group/mozilla.dev.pdf-js/topics
IRC: irc.mozilla.org #pdfjs
Engineering Weekly Call:
Thursday - 10:00am PDT
ReadmeIssuesWiki
Q & A