Python – Where did the Luigi task go?

Where did the Luigi task go?… here is a solution to the problem.

Where did the Luigi task go?

First entry into Luigi (and Python!) field and have some problems. The relevant code is:

from Database import Database
import luigi

class bbSanityCheck(luigi. Task):

conn = luigi. Parameter()
  date = luigi. Parameter()
  def __init__(self, *args, **kwargs):
    super(bbSanityCheck, self).__init__(*args, **kwargs)
    self.has_run = False

def run(self):
    print "Entering run of bb sanity check"
    # DB STUFF HERE THAT DOESN"T MATTER
   print "Are we in la-la land?"

def complete(self):
    print "BB Sanity check being asked for completeness: " , self.has_run
    return self.has_run

class Pipeline(luigi. Task):
  date = luigi. DateParameter()

def requires(self):
    db = Database('cbs')
    self.conn = db.connect()
    print "I'm about to yield!"
    return bbSanityCheck(conn = self.conn, date = self.date)

def run(self):
    print "Hello World"
    self.conn.query("""SELECT * 
              FROM log_blackbook""")
    result = conn.store_result()

print result.fetch_row()

def complete(self):
    return False

if __name__=='__main__':
  luigi.run()

The output is here (dropped the related database returns “Reason”):

DEBUG: Checking if Pipeline(date=2013-03-03) is complete
I'm about to yield!
INFO: Scheduled Pipeline(date=2013-03-03)
I'm about to yield!
DEBUG: Checking if bbSanityCheck(conn=<_mysql.connection open to 'sas1.rad.wc.truecarcorp.com' at 223f050>, date=2013-03-03) is complete
BB Sanity check being asked for completeness:  False
INFO: Scheduled bbSanityCheck(conn=<_mysql.connection open to 'sas1.rad.wc.truecarcorp.com' at 223f050>, date=2013-03-03)
INFO: Done scheduling tasks
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 2
INFO: [pid 5150] Running   bbSanityCheck(conn=<_mysql.connection open to 'sas1.rad.wc.truecarcorp.com' at 223f050>, date=2013-03-03)
Entering run of bb sanity check
Are we in la-la land?
INFO: [pid 5150] Done      bbSanityCheck(conn=<_mysql.connection open to 'sas1.rad.wc.truecarcorp.com' at 223f050>, date=2013-03-03)
DEBUG: Asking scheduler for work...
INFO: Done
INFO: There are no more tasks to run at this time
INFO: There are 1 pending tasks possibly being run by other workers
INFO: Worker was stopped. Shutting down Keep-Alive thread

So the question:

1.) Why is “I’m about to yield” printed twice?

2.) Why never print “hello world”?

3.) What is “1 pending task that may be run by another worker”?

I prefer super clean output because it’s easier to maintain. I hope I can fix these warning equivalents.

I also noticed the need for “yield” or “return item, item2, item3”. I have read and understood yield. I don’t understand which conventions are considered superior here, or if they are nuances that I don’t get when I’m new to the language.

Solution

I think you misunderstood the general way luigi works.

(1) Hmm: Not sure. To me it looks more like a matter of printing the same in info and debug

(2)
Therefore, you are trying to run a pipeline that depends on the bbSanityCheck to run. bbSanityCheck.complete() never returns True because you never set has_run to True in bbSanityCheck. So the Pipeline task can never run and output hello world because its dependencies never complete.

(3) This may be because you have this pending task (it’s actually a pipe). But Luigi knew it couldn’t run and shut down.

I personally wouldn’t use has_run as a way to check if a task has been run, but to check if the result of that job exists. That is, if the job does something to the database, then complete() should check if the expected content exists.

Related Problems and Solutions