django: How to stream incoming POST data as a class file obj
I’m using python + Django to handle incoming web requests that can post a lot of JSON appended as one of the POST data fields (e.g. var1=abc&json_var=lots_of_data&other_var=xxx). I want to stream JSON using my own streaming JSON parser that takes a handle to a similar file as its input parameter. It appears https://docs.djangoproject.com/en/1.11/ref/request-response/ this works, using HttpRequest.__iter__()
), but I can’t find any examples of how to use my own code to achieve this (i.e. not just importing things like xml.etree.ElementTree.) Such a library).
Basically, I want to do the following:
POST
request with large JSON => Django/python => Create a handle to a file like to read POST => Streaming URL Decoder = > Streaming JSON processor
I can use ijson for streaming JSON processors. How do I fill in the two gaps that create a POST data class file handle and pass it to the streaming URL decoder? I don’t want to do it myself, but I guess I can if necessary.
Solution
I can only solve this problem by scrolling through my own generators and iterators. There are several keys to solving this problem:
- Look for how to access a file handle to POST data if the data is sent in chunks. I can download the data in
the request. META.get('wsgi.input')
found it, I used this post found it dumping all request attributes - Use my own generator to read file-like handle and yield (varname, data_chunk) pairs
- Based on this post A modified version of the launch of my own generator, creates a file-like handle that has a normal read() operation but has three additional features:
- f.varname returns the name of the variable that is currently being read
- The data is not url-encoded before being returned from read().
- f.next_pair() advance handle to read the next variable. So, call f.read() until the first variable is completed, then if there is another variable, f.next_pair() will return true and f.read() can be called again until the next variable is finished reading
- Further stream processing can be implemented in the main loop
Put them together to look like this:
f = request. META.get('wsgi.input')
ff = some_magic_adaptor(qs_from_file_to_generator(f))
while ff.next_pair():
print 'varname:' + ff.varname
if ff.varname == 'stream_parse_this':
parser = stream_parser(ff)
for event_results in parser:
do_something
while True:
data = ff.read(buffer_size)
if not data:
break
do_something_with_data_chunk(data)