Celery with python
-
Upload
alexandre-gonzalez-rodriguez -
Category
Technology
-
view
8.727 -
download
3
description
Transcript of Celery with python
Celery
Òscar Vilaplana
February 28 2012
Outline
self.__dict__
Use task queues
Celery and RabbitMQ
Getting started with RabbitMQ
Getting started with Celery
Periodic tasks
Examples
self.__dict__
{'name': 'Òscar Vilaplana','origin': 'Catalonia','company': 'Paylogic','tags': ['developer', 'architect', 'geek'],'email': '[email protected]',
}
Proposal
I Take a slow task.
I Decouple it from your system
I Call it asynchronously
Separate projects
Separate projects allow us to:I Divide your system in sections
I e.g. frontend, backend, mailing, reportgenerator. . .
I Tackle them individuallyI Conquer them�declare them Done:
I Clean codeI Clean interfaceI Unit testedI Maintainable
(but this is not only for Celery tasks)
Coupled Tasks
In some cases, it may not be possible to decouple some tasks.Then, we either:
I Have some workers in your system's networkI with access to the code of your systemI with access to the system's database
I They handle messages from certain queues, e.g. internal.#
Candidates
Processes that:
I Need a lot of memory.
I Are slow.
I Depend on external systems.
I Need a limited amount of data to work (easy to decouple).
I Need to be scalable.
Examples:
I Render complex reports.
I Import big �les
I Send e-mails
Example: sending complex emails
Create a in independent project: yourappmailI Generator of complex e-mails.
I It needs the templates, images. . .I It doesn't need access to your system's database.
I Deploy it in servers of our own, or in Amazon serversI We can add/remove as we need themI On startup:
I Join the RabbitMQ clusterI Start celeryd
I Normal operation: 1 server is enough
I On high load: start as many servers as needed (tpspeak
tpsserver)
yourappmail
A decoupled email generator:I Has a clean API
I Decoupled from your system's db: It needs to receive allinformation
I Customer informationI Custom dataI Contents of the email
I Can be deployed to as many servers as we needI Scalable
Not for everything
I Task queues are not a magic wand to make things fasterI They can be used as such (like cache).I It hides the real problem.
Celery
I Asynchronous distributed task queue
I Based on distributed message passing.
I Mostly for real-time queuing
I Can do scheduling too.
I REST: you can query status and results via URLs.
I Written in Python
I Celery: Message Brokers and Result Storage
Celery's tasks
I Tasks can be async or sync
I Low latency
I Rate limiting
I Retries
I Each task has an UUID: you can ask for the result back if youknow the task UUID.
I RabbitMQI Messaging systemI Protocol: AMQPI Open standard for messaging middlewareI Written in Erlang
I Easy to cluster!
Install the packages from the RabbitMQ website
I RabbitMQ ServerI Management Plugin (nice HTML interface)
I rabbitmq-plugins enable rabbitmq_management
I Go to http://localhost:55672/cli/ and download the cli.
I HTML interface at http://localhost:55672/
Set up a cluster
rabbit1$ rabbitmqctl cluster_statusCluster status of node rabbit@rabbit1 ...[{nodes,[{disc,[rabbit@rabbit1]}]},{running_nodes,[rabbit@rabbit1]}]...done.rabbit2$ rabbitmqctl stop_appStopping node rabbit@rabbit2 ...done.rabbit2$ rabbitmqctl resetResetting node rabbit@rabbit2 ...done.rabbit2$ rabbitmqctl cluster rabbit@rabbit1Clustering node rabbit@rabbit2 with [rabbit@rabbit1] ...done.rabbit2$ rabbitmqctl start_appStarting node rabbit@rabbit2 ...done.
Notes
I Automatic con�guration
I Use .config �le to describe the cluster.
I Change the type of the node
I RAM node
I Disk node
Install Celery
I Just pip install
De�ne a task
Example tasks.py
from celery.task import task
@taskdef add(x, y):
print "I received the task to add {} and {}".format(x, y)return x + y
Con�gure username, vhost, permissions
$ rabbitmqctl add_user myuser mypassword$ rabbitmqctl add_vhost myvhost$ rabbitmqctl set_permissions -p myvhost myuser ".*" ".*" ".*"
Con�guration �le
Write celeryconfig.py
BROKER_HOST = "localhost"BROKER_PORT = 5672BROKER_USER = "myusername"BROKER_PASSWORD = "mypassword"BROKER_VHOST = "myvhost"CELERY_RESULT_BACKEND = "amqp"CELERY_IMPORTS = ("tasks", )
Launch daemon
celeryd -I tasks # import the tasks module
Schedule tasks
from tasks import add
# Schedule the taskresult = add.delay(1, 2)
value = result.get() # value == 3
Schedule tasks by name
Sometimes the tasks module is not available on the clients
from tasks import add
# Schedule the taskresult = add.delay(1, 2)
value = result.get() # value == 3print value
Schedule the tasks better: apply_async
task.apply_async has more options:
I countdown=n: the task will run at least n seconds in thefuture.
I eta=datetime: the task will run not earlier than thandatetime.
I expires=n or expires=datetime the task will be revoked inn seconds or at datetime
I It will be marked as REVOKEDI result.get will raise a TaskRevokedError
I serializerI pickle: default, unless CELERY_TASK_SERIALIZER says
otherwise.I alternative: json, yaml, msgpack
Result
A result has some useful operations:
I successful: True if task succeeded
I ready: True if the result is ready
I revoke: cancel the task.
I result: if task has been executed, this contains the result if itraised an exception, it contains the exception instance
I state:I PENDINGI STARTEDI RETRYI FAILUREI SUCCESS
TaskSet
Run several tasks at once. The result keeps the order.
from celery.task.sets import TaskSetfrom tasks import addjob = TaskSet(tasks=[
add.subtask((4, 4)),add.subtask((8, 8)),add.subtask((16, 16)),add.subtask((32, 32)),
])result = job.apply_async()result.ready() # True -- all subtasks completedresult.successful() # True -- all subtasks successfulvalues = result.join() # [4, 8, 16, 32, 64]print values
TaskSetResult
The TaskSetResult has some interesting properties:
I successful: if all of the subtasks �nished successfully (noException)
I failed: if any of the subtasks failed.
I waiting: if any of the subtasks is not ready yet.
I ready: if all of the subtasks are ready.
I completed_count: number of completed subtasks.
I revoke: revoke all subtasks.
I iterate: iterate oer the return values of the subtasks oncethey �nish (sorted by �nish order).
I join: gather the results of the subtasks and return them in alist (sorted by the order on which they were called).
Retrying tasks
If the task fails, you can retry it by calling retry()
@taskdef send_twitter_status(oauth, tweet):
try:twitter = Twitter(oauth)twitter.update_status(tweet)
except (Twitter.FailWhaleError, Twitter.LoginError), exc:send_twitter_status.retry(exc=exc)
To limit the number of retries set task.max_retries.
Routing
apply_async accepts the parameter routing to create someRabbitMQ queues
pdf: ticket.#import_files: import.#
I Schedule the task to the appropriate queue
import_vouchers.apply_async(args=[filename],routing_key="import.vouchers")
generate_ticket.apply_async(args=barcodes,routing_key="ticket.generate")
celerybeat
from celery.schedules import crontab
CELERYBEAT_SCHEDULE = {# Executes every Monday morning at 7:30 A.M"every-monday-morning": {"task": "tasks.add","schedule": crontab(hour=7, minute=30,day_of_week=1),"args": (16, 16),
},}
There can be only one celerybeat running
I But we can have two machines that check on each other.
Import a big �le:
tasks.py
def import_bigfile(server, filename):with create_temp_file() as tmp:
fetch_bigfile(tmp, server, filename)import_bigfile(tmp)report_result(...) # e.g. send confirmation e-mail
Import big �le: Admin interface, server-Side
import tasksdef import_bigfile(filename):
result = tasks.imporg_bigfile.delay(filename)return result.task_id
class ImportBigfile(View):def post_ajax(request):
filename = request.get('big_file')task_id = import_bigfile(filename)return task_id
Import big �le: Admin interface, client-side
I Post the �le asynchronously
I Get the task_id back
I Put some �working. . . � message.
I Periodically ask Celery if the task is ready and change�working. . . � into �done!�
I No need to call Paylogic code: just ask Celery directly
I Improvements:I Send the username to the task.I Have the task call back the Admin interface when it's done.I The Backo�ce can send an e-mail to the user when the task is
done.
Do a time-consuming task.
from tasks import do_difficult_thing...stuff...# I have all data necessary to do the difficult thingdifficult_result = do_difficult_thing.delay(some, values)# I don't need the result just yet, I can keep myself busy... stuff ...# Now I really need the resultdifficult_value = difficult_result.get()