Python – What are the fastest ways to do this?

What are the fastest ways to do this?… here is a solution to the problem.

What are the fastest ways to do this?

My application is at a crossroads – I’m using Python/Django, MySQL, and Ubuntu 12.04

My app will access other apps online, index their path structures, and submit forms. If you think this is happening in 10 or 100 accounts, each with 1 or more domains, performance can get a bit out of hand.

My initial idea was to set up an EC2 environment to distribute the load across multiple EC2 instances to access all these paths on each domain, each running Celery/Rabbitmq to distribute the processing load across those EC2 instances.

The problem is – I want to store the result of submitting the form I visited. I read that I may need to use a NoSQL database (e.g. Hadoop, Redis, etc.).

My question is:

  • Are there different ways to use celery/rabbitmq with SQL-db and what are the advantages/disadvantages?
    I can see one problem with having to use nosql: the learning curve.
  • Second: Is there another way to distribute the (processing) load of multiple Python scripts running simultaneously in multiple EC2 environments?

Thank you.

Solution

Is there a different way to use celery/rabbitmq with a SQL-db and what
are the advantages/disadvantages? I can see one problem with having to
use nosql : the learning curve

Yes.

  1. If you’re talking about storing your Django application/model data, you can use it with any SQL type database, as long as you have its Python binding (bind). Most popular SQL databases have Python bindings (binds).

  2. If you are referring to storing task results on a specific backend, multiple database/protocol SQL and noSQL are supported. I don’t see any specific advantages or disadvantages to storing results in SQL (MySQL, Posgtgres) or noSQL (Mongo, CouchDB), but this is just my personal opinion, depending on the type of application you are running. These are some examples you can use with SQL Database (from their documentation):

    # sqlite (filename) CELERY_RESULT_BACKEND = ‘db+sqlite:///results.sqlite’
    # mysql CELERY_RESULT_BACKEND = ‘db+mysql://scott:tiger@localhost/foo’
    # postgresql CELERY_RESULT_BACKEND = ‘db+postgresql://scott:tiger@localhost/mydatabase’
    # oracle CELERY_RESULT_BACKEND = ‘db+oracle://scott:[email protected]:1521/sidname’
    
  3. If you’re referring to brokers (queuing mechanism), celery only supports RabbitMQ and redis.

Secondly: is there some other way to distribute the (processing) load
of several python scripts being run at the same time on multiple ec2
environments?

That’s exactly what celery does, you can set up your workers on multiple machines, which can be different EC2 instances. Then all you have to do is point their celery installation to the same queue/agent in your configuration. If you want to have redundancy in your brokers (RabbitMQ and/or Redis), you should consider setting them up in a cluster configuration.

Related Problems and Solutions