Stream Hadoop jar files in the Hortonworks sandbox without a contrib directory
I’m demonstrating Hadoop virtual machine in the hortonworks sandbox
I previously set up a simple elastic map reduce streaming job on EC2, following patterns like this, or this .
However, I
don’t seem to have streaming jars installed – in fact, I seem to be missing many of the required basic directories :
$HADOOP_HOME/mapred/contrib/
My ls -lah actually looks like this :
[root@sandbox ~]# ls -lah
total 60K
dr-xr-x---. 5 root root 4.0K Apr 10 18:52 .
dr-xr-xr-x. 24 root root 4.0K Apr 10 18:31 ..
-rw------- 1 root root 126 Oct 28 08:35 .bash_history
-rw-r--r--. 1 root root 18 May 20 2009 .bash_logout
-rw-r--r--. 1 root root 176 May 20 2009 .bash_profile
-rw-r--r-- 1 root root 262 Oct 28 08:29 .bashrc
-rw-r--r--. 1 root root 100 Sep 22 2004 .cshrc
-rw-r--r-- 1 root root 0 Oct 28 08:34 .hdfs_prepared
drwxr-xr-x 2 root root 4.0K Apr 10 18:22 .pip
drwxr----- 3 root root 4.0K Oct 20 16:21 .pki
-rw------- 1 root root 1.0K Oct 20 14:04 .rnd
drwx------ 2 root root 4.0K Oct 20 09:21 .ssh
lrwxrwxrwx 1 root root 48 Oct 28 08:28 start_ambari.sh -> /usr/lib/hue/tools/start_scripts/start_ambari.sh
lrwxrwxrwx 1 root root 47 Oct 28 08:28 start_hbase.sh -> /usr/lib/hue/tools/start_scripts/start_hbase.sh
-rw-r--r--. 1 root root 129 Dec 3 2004 .tcshrc
-rw------- 1 root root 4.8K Oct 28 08:30 .viminfo
-rw-r--r-- 1 root root 218 Oct 20 08:55 zero_machine.sh
Although using the hadoop command, I see that mapred exists, but does not contain contributions.
[root@sandbox ~]# hadoop fs -ls /
Found 6 items
drwxrwxrwt - yarn hadoop 0 2014-04-10 19:14 /app-logs
drwxr-xr-x - hdfs hdfs 0 2013-10-20 15:08 /apps
drwxr-xr-x - mapred hdfs 0 2013-10-20 15:10 /mapred
drwxr-xr-x - hdfs hdfs 0 2013-10-20 15:10 /mr-history
drwxrwxrwx - hdfs hdfs 0 2013-10-28 08:34 /tmp
drwxr-xr-x - hdfs hdfs 0 2013-10-28 08:34 /user
[root@sandbox ~]# hadoop fs -ls /mapred/
Found 1 items
drwxr-xr-x - mapred hdfs 0 2013-10-20 15:10 /mapred/system
Is there a dedicated download page for streaming .jar files? When I visit the link at the bottom of this page, link to streaming is dead .
Solution
The default location of hadoop streaming jar is /usr/lib/hadoop/contrib/streaming/hadoop-streaming-*.jar
.
If you can’t find the jar in the above location. You can download hadoop-streaming-*.jar from the following Hortonworks repository:
http://repo.hortonworks.com/content/repositories/releases/org/apache/hadoop/hadoop-streaming/