Python – Apache Pig – Jython UDF memory error

Apache Pig – Jython UDF memory error… here is a solution to the problem.

Apache Pig – Jython UDF memory error

I’m writing my Python UDF in pig using Jython, but I’m having memory issues when my UDF input is large (i.e. more than the memory allocated to my JVM). In the Pig documentation, functions such as COUNT, MAX, etc. overcome this problem by using Algebraic and more importantly, the Accumulator interface. The Accumulator interface allows data from Pig to be sent to UDFs in blocks, which suits my problem well. Is there an example of anyone with Jython doing this? Any help would be appreciated (or any idea of streaming input to Python!) 🙂

Solution

Python UDFs do not support such optimized interfaces:
http://pig.apache.org/docs/r0.11.1/udf.html#udfs

Related Problems and Solutions