Python – django: How to stream incoming POST data as a class file obj

django: How to stream incoming POST data as a class file obj… here is a solution to the problem.

django: How to stream incoming POST data as a class file obj

I’m using python + Django to handle incoming web requests that can post a lot of JSON appended as one of the POST data fields (e.g. var1=abc&json_var=lots_of_data&other_var=xxx). I want to stream JSON using my own streaming JSON parser that takes a handle to a similar file as its input parameter. It appears https://docs.djangoproject.com/en/1.11/ref/request-response/ this works, using HttpRequest.__iter__()), but I can’t find any examples of how to use my own code to achieve this (i.e. not just importing things like xml.etree.ElementTree.) Such a library).

Basically, I want to do the following:

POST

request with large JSON => Django/python => Create a handle to a file like to read POST => Streaming URL Decoder = > Streaming JSON processor

I can use ijson for streaming JSON processors. How do I fill in the two gaps that create a POST data class file handle and pass it to the streaming URL decoder? I don’t want to do it myself, but I guess I can if necessary.

Solution

I can only solve this problem by scrolling through my own generators and iterators. There are several keys to solving this problem:

  • Look for how to access a file handle to POST data if the data is sent in chunks. I can download the data in the request. META.get('wsgi.input') found it, I used this post found it dumping all request attributes
  • Use my own generator to read file-like handle and yield (varname, data_chunk) pairs
  • Based on this post A modified version of the launch of my own generator, creates a file-like handle that has a normal read() operation but has three additional features:
    • f.varname returns the name of the variable that is currently being read
    • The data is not url-encoded before being returned from read().
    • f.next_pair() advance handle to read the next variable. So, call f.read() until the first variable is completed, then if there is another variable, f.next_pair() will return true and f.read() can be called again until the next variable is finished reading
  • Further stream processing can be implemented in the main loop

Put them together to look like this:

f = request. META.get('wsgi.input')
ff = some_magic_adaptor(qs_from_file_to_generator(f))

while ff.next_pair():
    print 'varname:' + ff.varname
    if ff.varname == 'stream_parse_this':
        parser = stream_parser(ff)
        for event_results in parser:
            do_something

while True:
        data = ff.read(buffer_size)
        if not data:
            break
        do_something_with_data_chunk(data)

Related Problems and Solutions