Java – Hbase mapreduce job : all column values are null

Hbase mapreduce job : all column values are null… here is a solution to the problem.

Hbase mapreduce job : all column values are null

I’m trying to create a map-reduce job in Java on a table in an HBase database. Using the example in and other things on the internet, I managed to successfully write a simple row counter. However, trying to write a program that actually performs some action on the data in the column is unsuccessful because the bytes received are always empty.

Part of my Driver job goes something like this:

/* Set main, map and reduce classes */
job.setJarByClass(Driver.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

Scan scan = new Scan();
scan.setCaching(500);
scan.setCacheBlocks(false);

/* Get data only from the last 24h */
Timestamp timestamp = new Timestamp(System.currentTimeMillis());
try {
    long now = timestamp.getTime();
    scan.setTimeRange(now - 24 * 60 * 60 * 1000, now);
} catch (IOException e) {
    e.printStackTrace();
}

/* Initialize the initTableMapperJob */
TableMapReduceUtil.initTableMapperJob(
        "dnsr",
        scan,
        Map.class,
        Text.class,
        Text.class,
        job);

/* Set output parameters */
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);

As you can see, the table is named DNSR. My mapper looks like this :

@Override
    public void map(ImmutableBytesWritable row, Result value, Context context)
            throws InterruptedException, IOException {
        byte[] columnValue = value.getValue("d".getBytes(), "fqdn".getBytes());
        if (columnValue == null)
            return;

byte[] firstSeen = value.getValue("d".getBytes(), "fs".getBytes());
         if (firstSeen == null)
             return;

String fqdn = new String(columnValue).toLowerCase();
        String fs = (firstSeen == null) ? "empty" : new String(firstSeen);

context.write(new Text(fqdn), new Text(fs));
    }

Some notes:

  • The column family in a DNS table is just D. There are multiple columns, some of which are called fqdn and fs (firstSeen);
  • Even though the FQDN value is displayed correctly, fs is always an “empty” string (I added this check string after I got some errors indicating that you can’t convert null to a new string);
  • If I change the fs column name with something else, e.g. ls (lastSeen), it works;
  • The reducer does nothing but output everything it receives.

I

created a simple table scanner in javascript that is querying the exact same tables and columns and I can clearly see the values there. Using the command line and executing the query manually, I can clearly see that the fs values are not empty, they are bytes that can later be converted to a string (representing a date).

What could be the problem with me always getting null?

Thanks!

Update:
If I get all the columns in a particular column family, I don’t receive fs. However, a simple scanner implemented in JavaScript returns FS as a column in the DNS table.

@Override
public void map(ImmutableBytesWritable row, Result value, Context context)
        throws InterruptedException, IOException {
    byte[] columnValue = value.getValue(columnFamily, fqdnColumnName);
    if (columnValue == null)
        return;
    String fqdn = new String(columnValue).toLowerCase();

/* Getting all the columns */
    String[] cns = getColumnsInColumnFamily(value, "d");
    StringBuilder sb = new StringBuilder();
    for (String s : cns) {
        sb.append(s).append(";" );
    }

context.write(new Text(fqdn), new Text(sb.toString()));
}

I used a > from The answer gets all column names.