Custom data writes unknown data in map output
Can someone help me understand why I get this weird behavior of the custom data type I’m referring to This my mapper code is
public class customDataMapper extends Mapper<LongWritable, Text,Text,customText > {
Text url = new Text();
Text date = new Text();
Text ip = new Text();
customText ctext = new customText();
public void map (LongWritable key , Text value , Context context) throws IOException , InterruptedException{
String words[] = value.toString().split("|");
url.set(words[1]);
date.set(words[2]);
ip.set(words[4]);
ctext.set(date,ip);
context.write(url, ctext);
}
}
And.
The customText data type code is
public class customText implements WritableComparable<customText>{
private Text url , ip;
public customText(){
this.url=new Text();
this.ip=new Text();
}
public customText(Text URL , Text IP){
this.url=URL;
this.ip=IP;
}
public void set (Text URL , Text IP){
this.url=URL;
this.ip=IP;
}
public void readFields(DataInput in) throws IOException{
url.readFields(in);
ip.readFields(in);
}
public void write(DataOutput out ) throws IOException{
url.write(out);
ip.write(out);
}
public int compareTo(customText o){
if(url.compareTo(o.ip)==0){
return (ip.compareTo(o.ip));
}
else return (url.compareTo(o.ip));
}
public boolean equals(Object o){
if (o instanceof customText){
customText other = (customText)o;
return (url.equals(other.ip)) && ip.equals(other.ip);
}
return false;
}
public int hashCode(){
return url.hashCode();
}
I received my output
hduser@pradeep-VirtualBox:~/builds$ hadoop fs -cat
/user/hadoop/dir8_customData/output/part-m-00000
1 customData.customDataSample1.customText@51
1 customData.customDataSample1.customText@51
1 customData.customDataSample1.customText@51
1 customData.customDataSample1.customText@51
1 customData.customDataSample1.customText@51
My input file is
127248|/rr.html|2014-03-10|12:32:08|42.416.153.181
12|/rr12.html|2014-03-11|12:00:00|42.416.153.182
127241|/rr3232.html|2014-03-12|13:32:00|42.416.153.183
1272|/rrw33232.html|2014-03-15|14:32:08|42.416.153.184
121|/rr21212.html|2015-12-10|16:32:08|42.416.153.185
Can someone help me understand why I’m getting this output as well
Secondly, I’m not sure how compareTo
works, I mean when a new group is created in the reducer. I am new to Hadoop and Java programming.
Thanks
Solution
You are using split("|") to split
on |
. This should be split("\\|").
See why escaping a pipe is SO answer needed
Your customText class needs to override toString()
so that it knows how to deserialize the data contained in the object. For example:
@Override
public String toString() {
return url + "," + ip;
}
You also set the Text
object incorrectly:
public void set (Text URL , Text IP){
this.url=URL;
this.ip=IP;
}
This should be:
public void set(Text URL , Text IP){
this.url.set(URL);
this.ip.set(IP);
}
If your custom Writable object is used as a value, it only needs to implement the Writable interface instead of WritableComparable
.
The WritableComparable
interface is only required if Hadoop needs to group and sort keys.
Your compareTo()
method doesn’t make sense (you’re comparing URL to IP):
public int compareTo(customText o){
if(url.compareTo(o.ip)==0){
return (ip.compareTo(o.ip));
}
else return (url.compareTo(o.ip));
}
Should look like:
@Override
public int compareTo(customText o) {
int result = url.compareTo(o.url);
if (result != 0) {
return result;
}
return ip.compareTo(o.ip);
}
Your hash code should look like this:
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((ip == null) ? 0 : ip.hashCode());
result = prime * result + ((url == null) ? 0 : url.hashCode());
return result;
}
Currently it only uses URLs
and ignores IPs
.
You also pass date to
ctext.set(date, ip).
This variable is called a URL
in a custom object.
Stylistically, your variable name should be lowercase URL=url
and the class should start with uppercase customText = CustomText