Access the distributed cache from MrJob… here is a solution to the problem.
Access the distributed cache from MrJob
I’m writing hadoop applications using MrJob. I need to use a distributed cache to access some files.
I know there is an option -files in the hadoop stream, but I don’t know how to access it in my program.
Thanks for your help.
Solution
I think you have to use
mrjob.compat.supports_new_distributed_cache_options (version).
Then use -files and -archives instead of -cacheFile and -cacheArchive
Maybe you’ll get more< a href="http://pythonhosted.org/mrjob/utils-compat.html?highlight=distributed%20cache#mrjob.compat.supports_new_distributed_cache_options" rel="noreferrer noopener nofollow" > here