Java – Custom data writes unknown data in map output

Custom data writes unknown data in map output… here is a solution to the problem.

Custom data writes unknown data in map output

Can someone help me understand why I get this weird behavior of the custom data type I’m referring to This my mapper code is

public class customDataMapper extends Mapper<LongWritable, Text,Text,customText > {

Text url = new Text();
Text date = new Text();
Text ip = new Text();
customText ctext = new customText();

public void map (LongWritable key , Text value , Context context) throws IOException , InterruptedException{

String words[] = value.toString().split("|");
    url.set(words[1]);
    date.set(words[2]);
    ip.set(words[4]);
    ctext.set(date,ip);
    context.write(url, ctext);
}   
}

And.
The customText data type code is

public class customText implements WritableComparable<customText>{

private Text url , ip;

public customText(){
    this.url=new Text();
    this.ip=new Text();

}

public customText(Text URL , Text IP){
    this.url=URL;
    this.ip=IP;

}

public void set (Text URL , Text IP){
    this.url=URL;
    this.ip=IP;

}

public void readFields(DataInput in) throws IOException{
    url.readFields(in);
    ip.readFields(in);

}

public void write(DataOutput out ) throws IOException{
    url.write(out);
    ip.write(out);

}

public int compareTo(customText o){
    if(url.compareTo(o.ip)==0){

return (ip.compareTo(o.ip));

}
    else return (url.compareTo(o.ip));
}

public boolean equals(Object o){

if (o instanceof customText){
    customText other = (customText)o;   
    return (url.equals(other.ip)) && ip.equals(other.ip);
    }
    return false;
}

public int hashCode(){
    return url.hashCode();

}

I received my output

hduser@pradeep-VirtualBox:~/builds$ hadoop fs -cat
/user/hadoop/dir8_customData/output/part-m-00000
1 customData.customDataSample1.customText@51
1 customData.customDataSample1.customText@51
1 customData.customDataSample1.customText@51
1 customData.customDataSample1.customText@51
1 customData.customDataSample1.customText@51

My input file is

127248|/rr.html|2014-03-10|12:32:08|42.416.153.181
12|/rr12.html|2014-03-11|12:00:00|42.416.153.182
127241|/rr3232.html|2014-03-12|13:32:00|42.416.153.183
1272|/rrw33232.html|2014-03-15|14:32:08|42.416.153.184
121|/rr21212.html|2015-12-10|16:32:08|42.416.153.185

Can someone help me understand why I’m getting this output as well
Secondly, I’m not sure how compareTo works, I mean when a new group is created in the reducer. I am new to Hadoop and Java programming.

Thanks

Solution

You are using split("|") to split on |. This should be split("\\|"). See why escaping a pipe is SO answer needed

Your customText class needs to override toString() so that it knows how to deserialize the data contained in the object. For example:

@Override
public String toString() {
    return url + "," + ip;
}

You also set the Text object incorrectly:

public void set (Text URL , Text IP){
    this.url=URL;
    this.ip=IP;
}

This should be:

public void set(Text URL , Text IP){
    this.url.set(URL);
    this.ip.set(IP);
}

If your custom Writable object is used as a value, it only needs to implement the Writable interface instead of WritableComparable. The WritableComparable interface is only required if Hadoop needs to group and sort keys.

Your compareTo() method doesn’t make sense (you’re comparing URL to IP):

public int compareTo(customText o){
    if(url.compareTo(o.ip)==0){
        return (ip.compareTo(o.ip));
    }
    else return (url.compareTo(o.ip));
}

Should look like:

@Override
public int compareTo(customText o) {

int result = url.compareTo(o.url);
    if (result != 0) {
        return result;
    }
    return ip.compareTo(o.ip);
}

Your hash code should look like this:

@Override
public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + ((ip == null) ? 0 : ip.hashCode());
    result = prime * result + ((url == null) ? 0 : url.hashCode());
    return result;
}

Currently it only uses URLs and ignores IPs.

You also pass date to ctext.set(date, ip). This variable is called a URL in a custom object.

Stylistically, your variable name should be lowercase URL=url and the class should start with uppercase customText = CustomText

Related Problems and Solutions