Python – Non-strictly by name parameters in Python?

Non-strictly by name parameters in Python?… here is a solution to the problem.

Non-strictly by name parameters in Python?

Question

Is there a way to declare function arguments as non-strict (via by-name)?

If this can’t be implemented directly: are there any helper functions or decorators that can help me achieve something similar?


Specific examples

This is an example of a small toy that can be used for experimentation.

Let’s say I want to build a small parser combiner library that can handle the classic syntax of the following parenthetical arithmetic expression (numbers are replaced by a single literal value 1 for simplicity):

num    = "1"

factor = num 
       | "(" + expr + ")"

term   = factor + "*" + term 
       | factor

expr   = term + "+" + expr
       | term

Suppose I define a parser combiner as an object that has a method parse that can get a list of tokens, the current position, and throw a parsing error, or return the result and the new location. I can define a ParserCombinator base class very well, which provides + (concatenation) and | (alternative). Then I can define a parser combiner that accepts a constant string and implement + and |:

# Two kinds of errors that can be thrown by a parser combinator
class UnexpectedEndOfInput(Exception): pass
class ParseError(Exception): pass

# Base class that provides methods for `+` and `|` syntax
class ParserCombinator:
  def __add__(self, next):
    return AddCombinator(self, next)
  def __or__(self, other):
    return OrCombinator(self, other)

# Literally taken string constants
class Lit(ParserCombinator):
  def __init__(self, string):
    self.string = string

def parse(self, tokens, pos):
    if pos < len(tokens):
      t = tokens[pos]
      if t == self.string:
        return t, (pos + 1)
      else:
        raise ParseError
    else:
      raise UnexpectedEndOfInput

def lit(str): 
  return Lit(str)

# Concatenation
class AddCombinator(ParserCombinator):
  def __init__(self, first, second):
    self.first = first
    self.second = second
  def parse(self, tokens, pos):
    x, p1 = self.first.parse(tokens, pos)
    y, p2 = self.second.parse(tokens, p1)
    return (x, y), p2

# Alternative
class OrCombinator(ParserCombinator):
  def __init__(self, first, second):
    self.first = first
    self.second = second
  def parse(self, tokens, pos):
    try:
      return self.first.parse(tokens, pos)
    except:
      return self.second.parse(tokens, pos)

So far, so good. However, because the non-terminating symbols of the syntax are defined in a mutually recursive way, I can’t eagerly expand all possible parser composition trees, I have to use a factory of parser combiners, and wrap them into something like this:

# Wrapper that prevents immediate stack overflow
class LazyParserCombinator(ParserCombinator):
  def __init__(self, parserFactory):
    self.parserFactory = parserFactory
  def parse(self, tokens, pos):
    return self.parserFactory().parse(tokens, pos)

def p(parserFactory):
  return LazyParserCombinator(parserFactory)

This does allow me to write down the syntax in a way that is very close to EBNF:

num    = p(lambda: lit("1"))
factor = p(lambda: num | (lit("(") + expr + lit(")")))
term   = p(lambda: (factor + lit("*") + term) | factor)
expr   = p(lambda: (term + lit("+") + expr) | term)

It does work :

tokens = [str(x) for x in "1+(1+1)*(1+1+1)+1*(1+1)"]
print(expr.parse(tokens, 0))

However, p(lambda: ...) in each row is a bit annoying. Are there some idiomatic ways to get rid of it? It would be nice if the entire RHS of the rule could somehow be passed “by name” without triggering an eager evaluation of infinite mutual recursion.


I tried

I looked at what’s available in the core language: it seems that only if, and, and or can be “short-circuited”, correct if there is a mistake.

I’ve tried to see how other non-toy sample libraries do this.

  • For example
    funcparserlib
    Use explicit forward declarations to avoid mutual recursion
    (See forward_decl and value.define.)
    GitHub README.md part of the sample code).

  • parsec.py use some special @generate decorators
    And seems to use coroutines to do something like monad parsing.
    That’s all well and good, but my goal is to understand which options
    I have information about the basic evaluation strategies available
    In Python.

I also found something like lazy_object_proxy. Proxy stuff, but it doesn’t seem to help instantiate such objects in a more concise way.

So, is there a better way to pass parameters by name and avoid exploding of values defined recursively against each other?

Solution

This is a good idea, but it’s not allowed by Python syntax: Python expressions are always strictly evaluated (except for if block and and and or short-circuit expressions).

In particular, the problem is in expressions like this:

num = p(lit("1"))

p function arguments are always received with a new name bound (bind) to the same object. The object evaluating lit("1") does not name anything (until the name is created with the form parameter of p), so there is no name there to bind (bind). Instead, there must be an object there, otherwise p cannot receive the value at all.

What you can do is add a new object in place of the lambda to delay the evaluation of the name. For example, something like this:

class DeferredNamespace(object):
    def __init__(self, namespace):
        self.__namespace = namespace
    def __getattr__(self, name):
        return DeferredLookup(self.__namespace, name)

class DeferredLookup(object):
    def __init__(self, namespace, name):
        self.__namespace = namespace
        self.__name = name
    def __getattr__(self, name):
        return getattr(getattr(self.__namespace, self.__name), name)

d = DeferredNamespace(locals())

num = p(d.lit("1"))

In this case, d.lit doesn’t actually return lit, it returns a DeferredLookup object (locals(), 'lit') that will use getattr to resolve its members in actual use. Note that this will eagerly capture locals(), which you probably don’t want to do; You can adapt it to use lambdas, or better yet, just create all entities in other namespaces anyway.

You will still encounter the drawbacks of d. in the syntax, which may or may not break the deal, depending on your goals using this API.

Related Problems and Solutions