Python – Tornado request handlers map to international characters

Tornado request handlers map to international characters… here is a solution to the problem.

Tornado request handlers map to international characters

I want URL requests that match some internationalized characters, such as /Comisión. Here is my setup :

class Application(tornado.web.Application):
    def __init__(self):
        handlers = [ 
            '''some handlers, and then this: '''
            (r"/([\w\:\,]+)", InternationalizedHandler)
            ]
            tornado.web.Application.__init__(self, handlers, **settings)

But setting the locale in Tornado doesn’t seem like the right solution. How do I set up a regular expression to capture characters like é, å, μ, etc.? Is it OK to change the re mode in python?

Solution

TL;DR: It is not possible to use Tornado’s built-in router.

Tornado buries the regular expression compilation of handler patterns very deeply, so @stema about using re. Recommendations for Unicode flags are difficult because it is not clear where to pass the flags. There are two ways to solve this particular problem: subclass URLSpec and override the __init__ function, or place a flag prefix in the pattern.

The first option requires a lot of work. The second option takes advantage of a feature in Python’s re module, where the pattern can be specified at the beginning of the pattern (?u) instead of passing in re. UNICODE FLAG AS A PARAMETER.

Unfortunately, neither of these options work, as Tornado matches the pattern to the request URL before decoding its percentage to a unicode string. Therefore, compiling mode with Unicode flags has no effect because you are matching a percent-encoded ASCII URL, not a Unicode string.

Related Problems and Solutions