Python – Access the distributed cache from MrJob

Access the distributed cache from MrJob… here is a solution to the problem.

Access the distributed cache from MrJob

I’m writing hadoop applications using MrJob. I need to use a distributed cache to access some files.
I know there is an option -files in the hadoop stream, but I don’t know how to access it in my program.

Thanks for your help.

Solution

I think you have to use

mrjob.compat.supports_new_distributed_cache_options (version).

Then use -files and -archives instead of -cacheFile and -cacheArchive

Maybe you’ll get more< a href="http://pythonhosted.org/mrjob/utils-compat.html?highlight=distributed%20cache#mrjob.compat.supports_new_distributed_cache_options" rel="noreferrer noopener nofollow" > here

Related Problems and Solutions