Non-strictly by name parameters in Python?
Question
Is there a way to declare function arguments as non-strict (via by-name)?
If this can’t be implemented directly: are there any helper functions or decorators that can help me achieve something similar?
Specific examples
This is an example of a small toy that can be used for experimentation.
Let’s say I want to build a small parser combiner library that can handle the classic syntax of the following parenthetical arithmetic expression (numbers are replaced by a single literal value 1
for simplicity):
num = "1"
factor = num
| "(" + expr + ")"
term = factor + "*" + term
| factor
expr = term + "+" + expr
| term
Suppose I define a parser combiner as an object that has a method parse
that can get a list of tokens, the current position, and throw a parsing error, or return the result and the new location. I can define a ParserCombinator
base class very well, which provides +
(concatenation) and |
(alternative). Then I can define a parser combiner that accepts a constant string and implement +
and |
:
# Two kinds of errors that can be thrown by a parser combinator
class UnexpectedEndOfInput(Exception): pass
class ParseError(Exception): pass
# Base class that provides methods for `+` and `|` syntax
class ParserCombinator:
def __add__(self, next):
return AddCombinator(self, next)
def __or__(self, other):
return OrCombinator(self, other)
# Literally taken string constants
class Lit(ParserCombinator):
def __init__(self, string):
self.string = string
def parse(self, tokens, pos):
if pos < len(tokens):
t = tokens[pos]
if t == self.string:
return t, (pos + 1)
else:
raise ParseError
else:
raise UnexpectedEndOfInput
def lit(str):
return Lit(str)
# Concatenation
class AddCombinator(ParserCombinator):
def __init__(self, first, second):
self.first = first
self.second = second
def parse(self, tokens, pos):
x, p1 = self.first.parse(tokens, pos)
y, p2 = self.second.parse(tokens, p1)
return (x, y), p2
# Alternative
class OrCombinator(ParserCombinator):
def __init__(self, first, second):
self.first = first
self.second = second
def parse(self, tokens, pos):
try:
return self.first.parse(tokens, pos)
except:
return self.second.parse(tokens, pos)
So far, so good. However, because the non-terminating symbols of the syntax are defined in a mutually recursive way, I can’t eagerly expand all possible parser composition trees, I have to use a factory of parser combiners, and wrap them into something like this:
# Wrapper that prevents immediate stack overflow
class LazyParserCombinator(ParserCombinator):
def __init__(self, parserFactory):
self.parserFactory = parserFactory
def parse(self, tokens, pos):
return self.parserFactory().parse(tokens, pos)
def p(parserFactory):
return LazyParserCombinator(parserFactory)
This does allow me to write down the syntax in a way that is very close to EBNF:
num = p(lambda: lit("1"))
factor = p(lambda: num | (lit("(") + expr + lit(")")))
term = p(lambda: (factor + lit("*") + term) | factor)
expr = p(lambda: (term + lit("+") + expr) | term)
It does work :
tokens = [str(x) for x in "1+(1+1)*(1+1+1)+1*(1+1)"]
print(expr.parse(tokens, 0))
However, p(lambda: ...)
in each row is a bit annoying. Are there some idiomatic ways to get rid of it? It would be nice if the entire RHS of the rule could somehow be passed “by name” without triggering an eager evaluation of infinite mutual recursion.
I tried
I looked at what’s available in the core language: it seems that only if, and
, and or
can be “short-circuited”, correct if
there is a mistake.
I’ve tried to see how other non-toy sample libraries do this.
For example
funcparserlib
Use explicit forward declarations to avoid mutual recursion
(Seeforward_decl
andvalue.define
.)
GitHub README.md part of the sample code).parsec.py
use some special@generate
decorators
And seems to use coroutines to do something like monad parsing.
That’s all well and good, but my goal is to understand which options
I have information about the basic evaluation strategies available
In Python.
I also found something like lazy_object_proxy. Proxy
stuff, but it doesn’t seem to help instantiate such objects in a more concise way.
So, is there a better way to pass parameters by name and avoid exploding of values defined recursively against each other?
Solution
This is a good idea, but it’s not allowed by Python syntax: Python expressions are always strictly evaluated (except for if
block and and
and or
short-circuit expressions).
In particular, the problem is in expressions like this:
num = p(lit("1"))
p
function arguments are always received with a new name bound (bind) to the same object. The object evaluating lit("1")
does not name anything (until the name is created with the form parameter of p
), so there is no name there to bind (bind). Instead, there must be an object there, otherwise p
cannot receive the value at all.
What you can do is add a new object in place of the lambda to delay the evaluation of the name. For example, something like this:
class DeferredNamespace(object):
def __init__(self, namespace):
self.__namespace = namespace
def __getattr__(self, name):
return DeferredLookup(self.__namespace, name)
class DeferredLookup(object):
def __init__(self, namespace, name):
self.__namespace = namespace
self.__name = name
def __getattr__(self, name):
return getattr(getattr(self.__namespace, self.__name), name)
d = DeferredNamespace(locals())
num = p(d.lit("1"))
In this case, d.lit doesn’t actually return lit, it returns a
to resolve its members in actual useDeferredLookup
object (locals(), 'lit') that will use getattr.
Note that this will eagerly capture locals(),
which you probably don’t want to do; You can adapt it to use lambdas, or better yet, just create all entities in other namespaces anyway.
You will still encounter the drawbacks of d. in the syntax, which may or may not break the deal, depending on your goals using this API.