Java – Hadoop MapReduce job started but can’t find the Map class?

Hadoop MapReduce job started but can’t find the Map class?… here is a solution to the problem.

Hadoop MapReduce job started but can’t find the Map class?

My MapReduce app calculates the usage of field values in Hive tables. After including all the jars from /usr/lib/hadood, /usr/lib/hive, and /usr/lib/, I managed to build and run it from Eclipse hcatalog directory. It works.

After a few setbacks, I also managed to compile it and run as a Maven project :

 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.bigdata.hadoop</groupId>
<artifactId>FieldCounts</artifactId>
  <packaging>jar</packaging>
  <name>FieldCounts</name>
  <version>0.0.1-SNAPSHOT</version>
  <url>http://maven.apache.org</url>

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>
<dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
<dependency>   
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>2.3.0</version>
</dependency>    
<dependency>
    <groupId>org.apache.hcatalog</groupId>
    <artifactId>hcatalog-core</artifactId>
    <version>0.11.0</version>
</dependency>
</dependencies>
</project>

To run a job from the command line, you must use the following script:

#!/bin/sh
export LIBJARS=/usr/lib/hcatalog/share/hcatalog/hcatalog-core.jar,/usr/lib/hive/lib/hive-exec-0.12.0.2.0.6.1-101.jar,/usr/lib/hive/lib/ hive-metastore-0.12.0.2.0.6.1-101.jar,/usr/lib/hive/lib/libfb303-0.9.0.jar,/usr/lib/hive/lib/jdo-api-3.0.1.jar,/usr/lib/hive/lib/antlr-runtime-3.4.jar,/usr/lib/hive/ lib/datanucleus-api-jdo-3.2.1.jar,/usr/lib/hive/lib/datanucleus-core-3.2.2.jar
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:.:/ usr/lib/hcatalog/share/hcatalog/hcatalog-core.jar:/usr/lib/hive/lib/hive-exec-0.12.0.2.0.6.1-101.jar:/usr/lib/hive/lib/hive-metastore-0.12.0.2.0.6.1-101.jar:/usr/lib/ hive/lib/libfb303-0.9.0.jar:/usr/lib/hive/lib/jdo-api-3.0.1.jar:/usr/lib/hive/lib/antlr-runtime-3.4.jar:/usr/lib/hive/lib/datanucleus-api-jdo-3.2.1.jar:/usr/lib/hive/ lib/datanucleus-core-3.2.2.jar
hadoop jar FieldCounts-0.0.1-SNAPSHOT.jar com.bigdata.hadoop.FieldCounts -libjars ${LIBJARS} simple simpout

Now Hadoop

creates and starts the next failed job because Hadoop can’t find the Map class:

14/03/26 16:25:58 INFO mapreduce. Job: Running job: job_1395407010870_0007
14/03/26 16:26:07 INFO mapreduce. Job: Job job_1395407010870_0007 running in uber mode : false
14/03/26 16:26:07 INFO mapreduce. Job:  map 0% reduce 0%
14/03/26 16:26:13 INFO mapreduce. Job: Task Id : attempt_1395407010870_0007_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.bigdata.hadoop.FieldCounts$Map not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.lang.ClassNotFoundException: Class com.bigdata.hadoop.FieldCounts$Map not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
... 8 more

Why is this? The Job jar contains all classes, including Map:

 jar tvf FieldCounts-0.0.1-SNAPSHOT.jar 
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/
   121 Wed Mar 26 15:51:04 MSK 2014 META-INF/MANIFEST. MF
     0 Wed Mar 26 14:29:58 MSK 2014 com/
     0 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/
     0 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/
  3992 Fri Mar 21 17:29:22 MSK 2014 hive-site.xml
  4093 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/FieldCounts.class
  2961 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/FieldCounts$Reduce.class
  1621 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/TableFieldValueKey.class
  4186 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/FieldCounts$Map.class
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/maven/
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/maven/com.bigdata.hadoop/
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/maven/com.bigdata.hadoop/FieldCounts/
  1030 Wed Mar 26 14:28:22 MSK 2014 META-INF/maven/com.bigdata.hadoop/FieldCounts/pom.xml
   123 Wed Mar 26 14:30:02 MSK 2014 META-INF/maven/com.bigdata.hadoop/FieldCounts/pom.properties
[hdfs@localhost target]$ jar tvf FieldCounts-0.0.1-SNAPSHOT.jar 
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/
   121 Wed Mar 26 15:51:04 MSK 2014 META-INF/MANIFEST. MF
     0 Wed Mar 26 14:29:58 MSK 2014 com/
     0 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/
     0 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/
  3992 Fri Mar 21 17:29:22 MSK 2014 hive-site.xml
  4093 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/FieldCounts.class
  2961 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/FieldCounts$Reduce.class
  1621 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/TableFieldValueKey.class
  4186 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/FieldCounts$Map.class
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/maven/
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/maven/com.bigdata.hadoop/
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/maven/com.bigdata.hadoop/FieldCounts/
  1030 Wed Mar 26 14:28:22 MSK 2014 META-INF/maven/com.bigdata.hadoop/FieldCounts/pom.xml
   123 Wed Mar 26 14:30:02 MSK 2014 META-INF/maven/com.bigdata.hadoop/FieldCounts/pom.properties

What’s wrong? Should I put the Map and Reduce classes in separate files?

MapReduce code:

package com.bigdata.hadoop;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.util.*;
import org.apache.hcatalog.mapreduce.*;
import org.apache.hcatalog.data.*;
import org.apache.hcatalog.data.schema.*;
import org.apache.log4j.Logger;

public class FieldCounts extends Configured implements Tool {

public static class Map extends Mapper<WritableComparable, HCatRecord, TableFieldValueKey, IntWritable> {

static Logger logger = Logger.getLogger("com.foo.Bar");

static boolean firstMapRun = true;
        static List<String> fieldNameList = new LinkedList<String>();
        /**
         * Return a list of field names not containing `id` field name
         * @param schema
         * @return
         */
        static List<String> getFieldNames(HCatSchema schema) {
             Filter out `id` name just once
            if (firstMapRun) {
                firstMapRun = false;
                List<String> fieldNames = schema.getFieldNames();
                for (String fieldName : fieldNames) {
                    if (!fieldName.equals("id")) {
                        fieldNameList.add(fieldName);
                    }
                }
            } // if (firstMapRun)
            return fieldNameList;
        }

@Override
      protected void map( WritableComparable key,
                          HCatRecord hcatRecord,
                          org.apache.hadoop.mapreduce.Mapper
                          <WritableComparable, HCatRecord, Text, IntWritable>. Context context)
                          Context context)
            throws IOException, InterruptedException {

HCatSchema schema = HCatBaseInputFormat.getTableSchema(context.getConfiguration());

String schemaTypeStr = schema.getSchemaAsTypeString();
           logger.info("******** schemaTypeStr ********** : "+schemaTypeStr);

List<String> fieldNames = schema.getFieldNames();
            List<String> fieldNames = getFieldNames(schema);
            for (String fieldName : fieldNames) {
                Object value = hcatRecord.get(fieldName, schema);
                String fieldValue = null;
                if (null == value) {
                    fieldValue = "<NULL>";
                } else {
                    fieldValue = value.toString();
                }
                String fieldNameValue = fieldName+"." +fieldValue;
                context.write(new Text(fieldNameValue), new IntWritable(1));
                TableFieldValueKey fieldKey = new TableFieldValueKey();
                fieldKey.fieldName = fieldName;
                fieldKey.fieldValue = fieldValue;
                context.write(fieldKey, new IntWritable(1));
            }

}       
    }

public static class Reduce extends Reducer<TableFieldValueKey, IntWritable,
                                       WritableComparable, HCatRecord> {

protected void reduce( TableFieldValueKey key,
                               java.lang.Iterable<IntWritable> values,
                               Context context)
                               org.apache.hadoop.mapreduce.Reducer<Text, IntWritable,
                               WritableComparable, HCatRecord>. Context context)
            throws IOException, InterruptedException {
            Iterator<IntWritable> iter = values.iterator();
            int sum = 0;
             Sum up occurrences of the given key 
            while (iter.hasNext()) {
                IntWritable iw = iter.next();
                sum = sum + iw.get();
            }

HCatRecord record = new DefaultHCatRecord(3);
            record.set(0, key.fieldName);
            record.set(1, key.fieldValue);
            record.set(2, sum);

context.write(null, record);
        }
    }

public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        args = new GenericOptionsParser(conf, args).getRemainingArgs();

 To fix Hadoop "META-INFO" (http://stackoverflow.com/questions/17265002/hadoop-no-filesystem-for-scheme-file)
        conf.set("fs.hdfs.impl",
                org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
        conf.set("fs.file.impl",
                org.apache.hadoop.fs.LocalFileSystem.class.getName());

 Get the input and output table names as arguments
        String inputTableName = args[0];
        String outputTableName = args[1];
         Assume the default database
        String dbName = null;

Job job = new Job(conf, "FieldCounts");

HCatInputFormat.setInput(job,
                InputJobInfo.create(dbName, inputTableName, null));
        job.setJarByClass(FieldCounts.class);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

 An HCatalog record as input
        job.setInputFormatClass(HCatInputFormat.class);

 Mapper emits TableFieldValueKey as key and an integer as value
        job.setMapOutputKeyClass(TableFieldValueKey.class);
        job.setMapOutputValueClass(IntWritable.class);

 Ignore the key for the reducer output; emitting an HCatalog record as
         value
        job.setOutputKeyClass(WritableComparable.class);
        job.setOutputValueClass(DefaultHCatRecord.class);
        job.setOutputFormatClass(HCatOutputFormat.class);

HCatOutputFormat.setOutput(job,
                OutputJobInfo.create(dbName, outputTableName, null));
        HCatSchema s = HCatOutputFormat.getTableSchema(job);
        System.err.println("INFO: output schema explicitly set for writing:"
                + s);
        HCatOutputFormat.setSchema(job, s);
        return (job.waitForCompletion(true) ? 0 : 1);
    }

public static void main(String[] args) throws Exception {
        String classpath = System.getProperty("java.class.path");
        System.out.println("*** CLASSPATH: "+classpath);         
        int exitCode = ToolRunner.run(new FieldCounts(), args);
        System.exit(exitCode);
    }
}

Solution

I found that the problem is in the directory permissions where MapReduce jar is located. This jar is built in the home directory of a normal user, not an hdfs user. As long as this MRD job outputs the results of its work directly to the Hive table, it should run under the hdfs user. If such a job runs under a normal user, it does not have permission to write data to the Hive table!

On the other hand, the home directory of a normal user in CentOS has 700 permissions. So when you run the hadoop jar ... command under a different user than the one who owns this home directory, access to the MRD jar is denied somewhere during Hadoop loading the class. This is why this job under the hdfs user results in java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.bigdata.hadoop.MyMap not found.

Recursively changing the permissions on the home directory where the MRD jar is built from 700 to 755 solved the problem.

There is also a more important question: how do you run a job under a normal user so that they have permission to write data to the Hive table?

Related Problems and Solutions