OpenDocument Scripting

30
ODF Scripting: how ODF makes it easy to tell your computer to do your office work for you ODF Scripting Marco Fioretti OOOCon 2010 http://mfioretti.com 2010/9/2 Budapest 1

description

One of the main advantages of the OpenDocument format for office documents is that it is very, very easy to generate or process automatically. Anybody who needs to generate many similar texts, spreadsheets and presentations and is willing to run some scripts can save countless hours of work with ODF. In this talk I have explained how to do this applying the general method I call ODF Scripting. Background and more info at http://mfioretti.com/2010/09/budapest-openoffice-org-conference-2010-odf-scripting-and-odf-future/

Transcript of OpenDocument Scripting

Page 1: OpenDocument Scripting

ODF Scripting:

how ODF makes it easy to tell your computer to do your office

work for you

ODF Scripting

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 1

Page 2: OpenDocument Scripting

● Marco Fioretti

● Freelance writer, trainer, activist● Linux Journal Contributing Editor, contributor of Pc Professionale,

Linux Format and other magazines

● Author of the Family Guide to Digital Freedom

(http://digifreedom.net)

● Co-author of the O'Reilly book on Open Government, 2010

● Member of the ODF Fellowship and Digistan.org

● Advanced ODF user, but not a programmer!

Author intro

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 2

Page 3: OpenDocument Scripting

•a simple and effective way to quickly write scripts to:

● generate, filter or process ODF texts, presentations and spreadsheets

● particularly productive on low volume, but boring and repetitive tasks

● made possible by the openness and simplicity of the OpenDocument Format

● ...but still unknown to most (potential!) users of ODF and OO.o

What is ODF Scripting?

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 3

Page 4: OpenDocument Scripting

What I call ODF Scripting is based on two facts:

• any ODF file is just a ZIP archive containing plain text files and other objects (e.g. images) in normally standard formats

• there are lots of FOSS utilities and scripting tools made just to process plain text and working on every platform

How does ODF Scripting work?

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 4

Page 5: OpenDocument Scripting

What's inside an ODF file?

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 5

Every OpenDocument file is simply a compressed Zip folder containing various elements. Some of them are listed here:

content.xmlthe actual textual content of the document.Complex XML markup, but still readable by humans

meta.xmlMetadata like Author name, Word count, Language, Date of last modification, etc

styles.xmlStyle  information like  like font size, colour, page widthfor pages, characters, paragraphs...

Separate folders for binary objects●Images●Macros●....

Page 6: OpenDocument Scripting

● create (ONCE per project!) an empty document with placeholder strings

● write simple, ad-hoc shell/Perl/whatever short scripts that:

● Unzip that empty document

● Replace placeholder strings in content.xml with actual values from text files or database queries

● Zip everything together and gives the new file the right extension

ODF Scripting document generation

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 6

Page 7: OpenDocument Scripting

● study the structure of the ODF file(s) to process,

to find out which XML fields and/or text values

are interesting

● write ad-hoc shell/Perl/whatever scripts that:● unzip the ODF file

● extract relevant strings via Perl/grep/awk whatever

● process them as needed

Document analysis and processing with ODF scripting

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 7

Page 8: OpenDocument Scripting

Examples (1): Invoice generation

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 8

marco => cat my_data.sh INVOICE_DATE='2010/05/15' VENDOR_CODE='007' PO_NUMBER='Purchase Order #1' TOTAL=10 ISSUE=150 DESCRIPTION='Here is your invoice'

1 ASCII data file

+ 35 lines shell script

Page 9: OpenDocument Scripting

•Time-of-day BW 1 BW2• Midnight 4.5 6.4• 6.3 6.3• 3.1 6.1• 1.85 5.87

Example (2): Spreadsheets with graphs and formulas from log files

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 9

+

Still editable formulas!!!

Page 10: OpenDocument Scripting

• </table:table-row>MY_DATA_GO_HERE</table:table>

How were the spreadsheets generated?

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 10

27 lines shell script calling 66 lines Perl script

That substitute placeholders like this in the XML files:

with snippets of XML code (copied by those same files) that describe table cells, but whose numeric contents were loaded from the ASCII input file

Page 11: OpenDocument Scripting

A: with the same trick used in the spreadsheet example, same scripts complexity

Examples (3):Slideshow drafts from plain text outlines

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 11

Q: how did I generate the first version of this slideshow?

Page 12: OpenDocument Scripting

Examples (3):Slideshow drafts from plain text outlines

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 12

Page 13: OpenDocument Scripting

● Image processing with ImageMagick:

● unzip the ODF file● process in Shell loop every file in the Image folder, using

composite, convert or similar ImageMagick utilities● zip everything together, assign the proper extension

● Practical uses:

● add watermark or caption to each image in a collection of ODF files

● reduce resolution, to save disk space● replace company logos or other clipart

Other ODF scripting recipes I plan to write

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 13

Page 14: OpenDocument Scripting

● Metadata processing:

● Unzip the ODF file

● Use grep/Perl/sed/whatever to replace orupdate

the current values of

● phone numbers, addresses, names

● author name or any other metadata

● zip everything together, assign the proper extension

Other ODF scripting recipes I plan to write (2)

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 14

Page 15: OpenDocument Scripting

● Add data from ODF files to databases or generate graphs:

● Unzip the ODF file● grep interesting strings from metadata.xml or

content.xml● add them to database, generate graphs with

gnuplot...•

•Example: extract answer to multiple choice tests from .odt files received via email to calculate student grades, average...

Other ODF scripting recipes I plan to write (3)

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 15

Page 16: OpenDocument Scripting

● Courseware!

● Generate didactical DVDs from the same notes used to generate ODP slideshow

● Generate multiple choice tests from sources in the same language used in Moodle to run the same tests online (http://docs.moodle.org/en/GIFT)

Other ODF scripting recipes I plan to write (4)

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 16

•ASCII source with GIFT markup:•

•Thanksgiving is celebrated on the {• ~second• ~third• =fourth•} Thursday of November.•

Page 17: OpenDocument Scripting

● Courseware again

● Use the same approach to generate math exercises automatically

● import in ODF file formulas created with Mathematica and saved as MathML

Other ODF scripting example, from Rob Weir

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 17

Page 18: OpenDocument Scripting

● Very simple method, simple to learn and use whenever one needs to save time

● Flexible: everybody can use his or her preferred scripting or source markup language: it's a way of working, not a program!!!

● Very portable, with the smallest possible number of dependencies

Pros of ODF Scripting (1)

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 18

Page 19: OpenDocument Scripting

● Huge time saver in many cases when an industrial strength ODF/XML processor cannot be installed or would be an overkill (to learn, at least)

● SIMPLE and perfectly adequate to the real needs of many home,school or SME users: save time on boring simple, repetitive, neverending modifications or analyses of files that have the same structure

● useful for webmasters that need to assemble and serve ODF stuff on the fly

Pros of ODF Scripting (2)

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 19

Page 20: OpenDocument Scripting

● can mix, reuse or generate on the fly the "source code", that is the strings from plain text files, databases and what not that must be inserted in the ODF files

● Does not need OpenOffice.org or any other office suite! Perfect for servers or very limited systems

● integrates well with any other command line data processing tool (including OpenOffice.org, when necessary)

Pros of ODF Scripting (3)

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 20

Page 21: OpenDocument Scripting

● Most important advantage:

● ODF Scripting is much easier and much more acceptable from the psychological point of view!

● easier than LaTeX and, unlike LaTeX:

● accepts standard office documents as „input”

● produces stuff DIRECTLY readable and EDITABLE with "normal" office suites

Pros of ODF Scripting (4)

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 21

Page 22: OpenDocument Scripting

● The ODF Scripting way of working is "real-world-office-ready", that is:

• compatible with secretaries and existing material (e.g. corporate templates in ODF format)

• results can even be converted (even running OO.o in a script) to MS Office formats if really, really necessary (but why??? It's already ODF!)

Pros of ODF Scripting (5)

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 22

Page 23: OpenDocument Scripting

● not really scalable to flexible processing of complex ODF documents (it's a way to quickly write single-purpose, throw-away tools)

● Less performing than other solutions

● but... who cares??? No, really!● those users for which these are serious issues already

have real XML parsers and other similar tools

● ODF scripting is for all the others

Cons of ODF Scripting

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 23

Page 24: OpenDocument Scripting

● the biggest obstacles are the psychological barriers:

● generic fear or hate of the command line (you need to learn simple shell scripting to do ODF scripting)

● fear to mess with/inside objects (office files) that aren't believed to be touchable by mere humans:

● "if it were really that simple, why on Earth would we need a very expensive/complex software just to write a letter with justified paragraphs?"

Cons of ODF Scripting (2)

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 24

Page 25: OpenDocument Scripting

A most important advantage of ODF Scripting is

proving that those taboos are wrong and that

office files are something that normal people

CAN handle by themselves, and on which they

can have complete control without asking

"permission" to anybody

Cons or opportunity?

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 25

Page 26: OpenDocument Scripting

Feedback from users

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 26

Wow! This opens windows of opportunities!

Somehow I've totally missed out on the fact that odt documents are just zip

files...Use OpenOffice as a mark-up language... that is just frigging

awesome!

Page 27: OpenDocument Scripting

● flood the world with ODF files!!!

● stimulate cultural change:

● prove the openness, freedom and robustness of ODF and its ecosystem

● encourage migrations: it proves that converting legacy collection of old corporte templates, reports and such isn't NECESSARILY something that requires expensive consulting

How can ODF Scripting help to promote OpenOffice?

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 27

Page 28: OpenDocument Scripting

● ODF Scripting is not an application, it's an attitude:

● it's just being aware that, thanks to ODF, there is a SIMPLE way to write quick and dirty scripts to save lots of time

● it is nothing new, really. It's just that too few people already know how cool and easy it is

● can make more people love ODF and OO.o

Summary

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 28

Page 29: OpenDocument Scripting

● make it easy to use for Windows users: bundle example scripts with CygWin live/virtual environments

● optimize existing scripts? Probably not worth it.

● Write simple GUIs for them? Hmm... what do you think?

● Prove its potential for schools

● (for me) figure out how to mix whole (groups of) ODP slides from existing ODP slideshows

What's next?

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 29

Page 30: OpenDocument Scripting

● Resources

● the „OASIS OpenDOcument essentials” book http://books.evc-cit.info/

● My ODF Scripting pages (which will gladly host also 3rd party recipes!)

http://freesoftware.zona-m.net/odf-scripting•

● Questions?•

● Contact info: [email protected] or http://mfioretti.com•

Thanks!

Conclusion

Marco Fioretti OOOCon 2010http://mfioretti.com 2010/9/2 Budapest 30