Dev8d 2011-pipe2 py

Post on 06-May-2015

760 views 0 download

description

An introduction to pipe2py a Yahoo Pipes to Python compiler.

Transcript of Dev8d 2011-pipe2 py

Introducing Pipe2Py:Converting Yahoo Pipes to Python Code

Original code: Greg Gaughan

Additional development: Tuukka Hastrup

Based on an original idea by: Tony Hirst, Dept of Communication and Systems, The Open University

pipes.yahoo.com

But what happens if Yahoo Pipes dies?

Pipe2Pygithub.com/ggaughan/pipe2py

Yahoo pipelines are translated into pipelines of Python generators* to give a close match to the original data flow.

* based on ideas by David Beazley http://www.dabeaz.com/generators-uk

Each Yahoo module is coded as a separate Python module.

So you can use Yahoo Pipes as a graphical rapid prototyping application, and then generate a Python code equivalent you can host yourself

So what?

download codehttp://github.com/ggaughan/pipe2py

to dev8d/pipes/pipe2py

set pathexport PYTHONPATH=dev8d/pipes

installation

simplejson*sudo easy_install simplejson

dependencies

* only needed for Python pre 2.6

test directorypython testbasics.py

unit tests

python compile.py -p pipelineid

compilation - direct from Yahoo Pipes

generatespipe_pipelineid.py

python compile.py pipelinefile.json

compilation - from a file

generatespipelinefile.py

python pipe_pipelineid.py

command line execution

runspipe_pipelineid.py

from pipe2py import Contextfrom pipe2py.modules import *

def pipe_404411a8d22104920f3fc1f428f33642(context, _INPUT, conf=None, **kwargs):    "Pipeline"    if conf is None:        conf = {}

    forever = pipeforever.pipe_forever(context, None, conf=None)

    sw_502 = pipefetch.pipe_fetch(context, forever, conf={u'URL': {u'type': u'url', u'value': u'http://blog.ouseful.info/feed'}})    _OUTPUT = pipeoutput.pipe_output(context, sw_502, conf={})    return _OUTPUT

compiled code of the form...

Each call to the final generator will ripple through the pipeline

issuing .next() calls onto the previous generator until the

source is exhausted.

Each item is typically passed through the whole pipeline one at a time, so:

memory usage is kept to a minimumno module is waiting on an earlier module to finish processing the whole data setby adding queues between the modules they could easily be made to run in parallel, each on a different CPU, to give great scalability

from pipe2py import Contextimport pipe_9dc8014dcfd34c834a960321afde68d9 as p

C=Context()

r = p.pipe_9dc8014dcfd34c834a960321afde68d9(C,None)

for i in r:   print i   print i['title']

usage - compiled pipe

from pipe2py.compile import parse_and_build_pipefrom pipe2py import Context

pipe_def = """json representation of the pipe"""

p = parse_and_build_pipe(Context(), pipe_def)

for i in p:    print i

usage - interpreted pipe

context = Context(describe_input=True)

p = pipe_ac45e9eb9b0174a4e53f23c4c9903c3f(context, None)

need_inputs = pprint need_inputs

>>> [(u'0', u'username', u'Twitter username', u'text', u''),...    (u'1', u'statustitle',  u'Status title [string] or [logo] means twitter icon', u'text', u'logo')]

''' That is, tuples of the form   (position, name, prompt, type, default)'''

usage - user inputs #1            Identifying console prompts

C = Context(inputs={'username':'greg', 'statustitle':'logo'},                     console=False)p = pipe_ac45e9eb9b0174a4e53f23c4c9903c3f(C, None)

for i in p:    print i

usage - user inputs #2            avoiding console prompts

Yahoo Pipes modules:Pipe2Py implementation progress

Yahoo Pipes modules:Pipe2Py implementation progress

Yahoo Pipes modules:Pipe2Py implementation progress

;-)

One more thing...

pipes-engine.appspot.com

pipe2py hosting on Google App Engine

- generate test pipes that work of increasing complexity

- generate test pipes that don't work

- commit pipe2py patches for test pipes that don't work

How can you help?

- simplify installation (easy_install?)

- identify a good convention for integrating pipe2py compiled pipes in arbitrary code

- - identify a good convention for inserting arbitrary python functions into, or in-between, compiled pipe2py pipelines

How else can you help?

the next step: produce an open source front end visual editor?

wireit?pypes?

Anything else?

generate a ready-to-run instance of a Google App Engine configuration

based around a compiled pipe?

Anything more else?

Pipe2Pygithub.com/ggaughan/pipe2py