Session 803 : Processing s
description
Transcript of Session 803 : Processing s
Session 803:Session 803:Processing PDF FilesProcessing PDF Files
Gaeir DietrichGaeir DietrichDirector High Tech Center Training Director High Tech Center Training UnitUnit
www.htctu.netwww.htctu.net
OverviewOverview
Explanation of PDFExplanation of PDF Programs that work with PDF filesPrograms that work with PDF files
– Adobe ReaderAdobe Reader– Acrobat ProAcrobat Pro
Processing with Acrobat ProProcessing with Acrobat Pro Processing with OCR ProgramsProcessing with OCR Programs Clean-up in WordClean-up in Word
PDFPDF
Great starting pointGreat starting point– Contains all text and graphicsContains all text and graphics– Easy to generate Word files once Easy to generate Word files once
you learn howyou learn how– Reduces retypingReduces retyping
Excellent format for creating Excellent format for creating large printlarge print
What is a PDF?What is a PDF?
Portable document format (PDF)Portable document format (PDF) Reads the same on any computerReads the same on any computer Looks like the bookLooks like the book Contains all the textContains all the text Easy for publishersEasy for publishers
Types of PDF Types of PDF DocumentsDocuments Text-based PDFText-based PDF
– SearchableSearchable Graphical PDFGraphical PDF
– Picture of text (i.e., a graphic)Picture of text (i.e., a graphic) Use text-selection (I-beam) toolUse text-selection (I-beam) tool
to tell the differenceto tell the difference– Text can be selected; graphics Text can be selected; graphics
cannotcannot
PDFs and PublishersPDFs and Publishers
Easy for publishersEasy for publishers– Even small publishers can create a Even small publishers can create a
PDFPDF Most accurate formatMost accurate format
– Looks like the bookLooks like the book– Includes page numbers and all textIncludes page numbers and all text
Will be completeWill be complete– BUT watch out for teacher’s editionsBUT watch out for teacher’s editions
Security IssuesSecurity Issues
PDF files can be locked in various PDF files can be locked in various waysways
Some files can be read but no Some files can be read but no text extractedtext extracted
If you receive a locked PDF, go If you receive a locked PDF, go back to the publisherback to the publisher
Working with PDF FilesWorking with PDF Files
Native utilities from AdobeNative utilities from Adobe– Adobe ReaderAdobe Reader– Acrobat ProAcrobat Pro
Optical character recognition Optical character recognition (OCR)(OCR)
Free extraction tool: BalabolkaFree extraction tool: Balabolka
Which PDF Software?Which PDF Software?
Adobe ReaderAdobe Reader– FreeFree– Open, view, and read (including TTS)Open, view, and read (including TTS)– www.adobe.com/products/reader/www.adobe.com/products/reader/
Acrobat ProAcrobat Pro– Discounted educational pricingDiscounted educational pricing– Crop pages, delete/combine pages, Crop pages, delete/combine pages,
renumber pages, extract textrenumber pages, extract text– Highly recommended for alternate format Highly recommended for alternate format
producersproducers
Reading Features in Reading Features in Adobe ReaderAdobe Reader Access text-based PDFs within Access text-based PDFs within
ReaderReader Reads aloudReads aloud
– But does not highlight or trackBut does not highlight or track Enlarges textEnlarges text
– Nice reflow featureNice reflow feature Changes text/background colorsChanges text/background colors Text highlighting, sticky notes, and Text highlighting, sticky notes, and
commentscomments
Production Features in Production Features in ReaderReader Really designed for reading, not Really designed for reading, not
reformattingreformatting
Export PDFExport PDF– Subscription service (about Subscription service (about
$20/year)$20/year)– Upload PDF file, service auto-Upload PDF file, service auto-
converts to Word, downloadconverts to Word, download
Process with Acrobat Process with Acrobat ProPro CroppingCropping Enlargement for printingEnlargement for printing TilingTiling Extracting/deleting pagesExtracting/deleting pages Combining/inserting pagesCombining/inserting pages Text extractionText extraction
– Works best with text-based PDFWorks best with text-based PDF
Customize Quick ToolsCustomize Quick Tools
Click on the “gear”Click on the “gear”
View > Show/hide > Toolbar View > Show/hide > Toolbar Items > Quick ToolsItems > Quick Tools
Quick Tools MenuQuick Tools Menu
CustomizeCustomize
Please NotePlease Note
To enable single-key shortcutsTo enable single-key shortcuts– Open Preferences dialog box Ctrl + Open Preferences dialog box Ctrl +
KK– Under General > select Use Single-Under General > select Use Single-
Key Accelerators To Access Tools Key Accelerators To Access Tools (first checkbox under Basic Tools)(first checkbox under Basic Tools)
CroppingCropping
Tools > Pages > CropTools > Pages > Crop
Shortcut: CShortcut: C (Please note: This shortcut brings (Please note: This shortcut brings
up the mouse-driven cropping up the mouse-driven cropping tool—must double click to open tool—must double click to open the dialog box!)the dialog box!)
Crop ToolCrop Tool
Crop ToolboxCrop Toolbox
EnlargingEnlarging
Choose paper size/printerChoose paper size/printer File > Print > Size…to FitFile > Print > Size…to Fit
Shortcut: Ctrl + P (tab through)Shortcut: Ctrl + P (tab through)
Tip: Crop document before Tip: Crop document before enlargingenlarging
Print to FitPrint to Fit
TilingTiling
Choose paper size/printerChoose paper size/printer File > Print > Poster > Tile Scale File > Print > Poster > Tile Scale
and Overlapand Overlap
Shortcut: Ctrl + P (tab through)Shortcut: Ctrl + P (tab through)
Tip: Crop document before tilingTip: Crop document before tiling
Enlarge with TilingEnlarge with Tiling
Extracting PagesExtracting Pages
Tools > Pages > ExtractTools > Pages > Extract
Delete Shortcut: Ctrl + Shift + DDelete Shortcut: Ctrl + Shift + D Extract Pages Shortcut: Alt V + T Extract Pages Shortcut: Alt V + T
+ P (opens Pages pane; F6 + P (opens Pages pane; F6 focuses in pane and can arrow focuses in pane and can arrow down)down)
Extraction ToolExtraction Tool
Tips for Extracting Tips for Extracting ChaptersChapters Crop on complete file before Crop on complete file before
extractingextracting Work on a copy!!!!!Work on a copy!!!!! Extract from end toward front!Extract from end toward front! Use table of contents to helpUse table of contents to help Place focus on first page of chapter Place focus on first page of chapter
to extract (beginning with last)to extract (beginning with last)
Starting from the BackStarting from the Back
CombiningCombining
File > Pages > InsertFile > Pages > Insert
OROR
Create > Combine filesCreate > Combine files
Inserting PagesInserting Pages
Combining PagesCombining Pages
Auto Extracting TextAuto Extracting Text
File > Save As > MS WordFile > Save As > MS Word– Retains styles and paragraphsRetains styles and paragraphs
File > Save As > More options…File > Save As > More options…– Text (Accessible)Text (Accessible)
Lose styles, places hard returns at end of lineLose styles, places hard returns at end of line
– Text (Plain)Text (Plain) Lose styles, keeps paragraphsLose styles, keeps paragraphs
Shortcut: Alt F + AShortcut: Alt F + A
Save As OptionsSave As Options
More Control over TextMore Control over Text
For graphical PDFsFor graphical PDFs OrOr To maintain more control over To maintain more control over
extracting text from text-based extracting text from text-based PDFsPDFs
Use an OCR program!Use an OCR program!
Better Text ExtractionBetter Text Extraction
Use Optical Character Recognition Use Optical Character Recognition (OCR) program(OCR) program
OCR programs analyze text and OCR programs analyze text and structurestructure– Acrobat Pro has built-in OCR, but Acrobat Pro has built-in OCR, but
other programs provide more other programs provide more controlcontrol
OCR ProgramsOCR Programs
ABBYY FineReader ProABBYY FineReader Pro– Easier to learnEasier to learn– Somewhat better with structureSomewhat better with structure– About $75About $75
Nuance OmniPageNuance OmniPage– A bit more accessibleA bit more accessible– A bit better with STEM materialsA bit better with STEM materials– About $100About $100
Kurzweil-users NoteKurzweil-users Note
If students are using Kurzweil, then If students are using Kurzweil, then use Kurzweil for the OCRuse Kurzweil for the OCR– Do not OCR and then load into Kurzweil Do not OCR and then load into Kurzweil
unless you do not care about the page unless you do not care about the page structurestructure
Use KESI virtual printerUse KESI virtual printer– Print from Acrobat or Adobe ReaderPrint from Acrobat or Adobe Reader– Creates KESI filesCreates KESI files– Will not work with locked filesWill not work with locked files
OCR ProgramsOCR Programs
Treat all graphics files the sameTreat all graphics files the same– PDFs, TIFFs, JPEGsPDFs, TIFFs, JPEGs
Load image fileLoad image file– Create templatesCreate templates
Zone (analyze structure)Zone (analyze structure) Run OCRRun OCR
OCR Process DetailsOCR Process Details
Crop before loading into OCR programCrop before loading into OCR program Turn on multiple languages as neededTurn on multiple languages as needed
– If doing math, turn on GreekIf doing math, turn on Greek– Only turn on the languages you needOnly turn on the languages you need
Edit in the OCR programEdit in the OCR program– Some OCR programs have font matching Some OCR programs have font matching
featuresfeatures Save to WordSave to Word
Once in WordOnce in Word
Learn to use “show hidden”Learn to use “show hidden”– Ctrl + Shift + 8Ctrl + Shift + 8
Beware of the optional hyphenBeware of the optional hyphen– Search and replace to deleteSearch and replace to delete– Search for ^- replace with nothingSearch for ^- replace with nothing– Run spell checkRun spell check
Use styles to structure files for Use styles to structure files for braille programbraille program
More informationMore information
Gaeir (rhymes with “fire”) DietrichGaeir (rhymes with “fire”) Dietrich [email protected] 408-996-6047408-996-6047
www.htctu.netwww.htctu.net