Java – Stream Hadoop jar files in the Hortonworks sandbox without a contrib directory

Stream Hadoop jar files in the Hortonworks sandbox without a contrib directory… here is a solution to the problem.

Stream Hadoop jar files in the Hortonworks sandbox without a contrib directory

I’m demonstrating Hadoop virtual machine in the hortonworks sandbox

I previously set up a simple elastic map reduce streaming job on EC2, following patterns like this, or this .

However, I

don’t seem to have streaming jars installed – in fact, I seem to be missing many of the required basic directories :

$HADOOP_HOME/mapred/contrib/

My ls -lah actually looks like this :

[root@sandbox ~]# ls -lah
total 60K
dr-xr-x---.  5 root root 4.0K Apr 10 18:52 .
dr-xr-xr-x. 24 root root 4.0K Apr 10 18:31 ..
-rw-------   1 root root  126 Oct 28 08:35 .bash_history
-rw-r--r--.  1 root root   18 May 20  2009 .bash_logout
-rw-r--r--.  1 root root  176 May 20  2009 .bash_profile
-rw-r--r--   1 root root  262 Oct 28 08:29 .bashrc
-rw-r--r--.  1 root root  100 Sep 22  2004 .cshrc
-rw-r--r--   1 root root    0 Oct 28 08:34 .hdfs_prepared
drwxr-xr-x   2 root root 4.0K Apr 10 18:22 .pip
drwxr-----   3 root root 4.0K Oct 20 16:21 .pki
-rw-------   1 root root 1.0K Oct 20 14:04 .rnd
drwx------   2 root root 4.0K Oct 20 09:21 .ssh
lrwxrwxrwx   1 root root   48 Oct 28 08:28 start_ambari.sh -> /usr/lib/hue/tools/start_scripts/start_ambari.sh
lrwxrwxrwx   1 root root   47 Oct 28 08:28 start_hbase.sh -> /usr/lib/hue/tools/start_scripts/start_hbase.sh
-rw-r--r--.  1 root root  129 Dec  3  2004 .tcshrc
-rw-------   1 root root 4.8K Oct 28 08:30 .viminfo
-rw-r--r--   1 root root  218 Oct 20 08:55 zero_machine.sh

Although using the hadoop command, I see that mapred exists, but does not contain contributions.

[root@sandbox ~]# hadoop fs -ls /
Found 6 items
drwxrwxrwt   - yarn   hadoop          0 2014-04-10 19:14 /app-logs
drwxr-xr-x   - hdfs   hdfs            0 2013-10-20 15:08 /apps
drwxr-xr-x   - mapred hdfs            0 2013-10-20 15:10 /mapred
drwxr-xr-x   - hdfs   hdfs            0 2013-10-20 15:10 /mr-history
drwxrwxrwx   - hdfs   hdfs            0 2013-10-28 08:34 /tmp
drwxr-xr-x   - hdfs   hdfs            0 2013-10-28 08:34 /user
[root@sandbox ~]# hadoop fs -ls /mapred/
Found 1 items
drwxr-xr-x   - mapred hdfs          0 2013-10-20 15:10 /mapred/system

Is there a dedicated download page for streaming .jar files? When I visit the link at the bottom of this page, link to streaming is dead .

Solution

The default location of hadoop streaming jar is /usr/lib/hadoop/contrib/streaming/hadoop-streaming-*.jar.

If you can’t find the jar in the above location. You can download hadoop-streaming-*.jar from the following Hortonworks repository:

http://repo.hortonworks.com/content/repositories/releases/org/apache/hadoop/hadoop-streaming/

Related Problems and Solutions