Java – Enumeration values implement Hadoop’s Writable interface

Enumeration values implement Hadoop’s Writable interface… here is a solution to the problem.

Enumeration values implement Hadoop’s Writable interface

Let’s say I have an enum:

public enum SomeEnumType implements Writable {
  A(0), B(1);

private int value;

private SomeEnumType(int value) {
    this.value = value;
  }

@Override
  public void write(final DataOutput dataOutput) throws IOException {
    dataOutput.writeInt(this.value);
  }

@Override
  public void readFields(final DataInput dataInput) throws IOException {
    this.value = dataInput.readInt();
  }
}

I want to pass one instance of it as part of another class instance.

Equals doesn’t

work because it doesn’t take into account the internal variables of the enumeration, and all enumeration instances are fixed at compile time and cannot be created elsewhere.

Does this mean I can’t send enumerations over the wire in Hadoop or is there a solution?

Solution

For enumerations in Hadoop, my usual and preferred solution is to serialize the enums through their ordinal values.

public class EnumWritable implements Writable {

static enum EnumName {
        ENUM_1, ENUM_2, ENUM_3
    }

private int enumOrdinal;

 never forget your default constructor in Hadoop Writables
    public EnumWritable() {
    }

public EnumWritable(Enum<?> arbitraryEnum) {
        this.enumOrdinal = arbitraryEnum.ordinal();
    }

public int getEnumOrdinal() {
        return enumOrdinal;
    }

@Override
    public void readFields(DataInput in) throws IOException {
        enumOrdinal = in.readInt();
    }

@Override
    public void write(DataOutput out) throws IOException {
        out.writeInt(enumOrdinal);
    }

public static void main(String[] args) {
         use it like this:
        EnumWritable enumWritable = new EnumWritable(EnumName.ENUM_1);
         let Hadoop do the write and read stuff
        EnumName yourDeserializedEnum = EnumName.values()[enumWritable.getEnumOrdinal()];
    }

}

Obviously it has drawbacks: ordinal numbers can change, so if you swap ENUM_2 with ENUM_3 and read a previously serialized file, this will return another wrong enumeration.

So if you know the enumeration class beforehand, you can write down the name of your enum and use it like this:

 enumInstance = EnumName.valueOf(in.readUTF());

This uses more space, but saves changes to the enumeration name.

The complete example is as follows:

public class EnumWritable implements Writable {

static enum EnumName {
        ENUM_1, ENUM_2, ENUM_3
    }

private EnumName enumInstance;

 never forget your default constructor in Hadoop Writables
    public EnumWritable() {
    }

public EnumWritable(EnumName e) {
        this.enumInstance = e;
    }

public EnumName getEnum() {
        return enumInstance;
    }

@Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(enumInstance.name());
    }

@Override
    public void readFields(DataInput in) throws IOException {
        enumInstance = EnumName.valueOf(in.readUTF());
    }

public static void main(String[] args) {
         use it like this:
        EnumWritable enumWritable = new EnumWritable(EnumName.ENUM_1);
         let Hadoop do the write and read stuff
        EnumName yourDeserializedEnum = enumWritable.getEnum();

}

}

Related Problems and Solutions