Java – InvalidProtocolBufferException when attempting to write to HDFS

InvalidProtocolBufferException when attempting to write to HDFS… here is a solution to the problem.

InvalidProtocolBufferException when attempting to write to HDFS

Here is my code :

        Configuration conf = new Configuration();
        conf.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
        conf.addResource(new Path("/etc/hadoop/conf/hdfs-site.xml"));
        conf.addResource(new Path("/etc/hadoop/conf/mapred-site.xml"));
        conf.addResource(new Path("/etc/hadoop/conf/yarn-site.xml"));
        FileSystem fs = FileSystem.get(new URI("hdfs://localhost:8020"),conf);
        Path path = new Path(hdfsDestination);
        FSDataOutputStream outputStream = fs.create(path);

hdfsDestination is /user/msknapp/insurance, which is output:

java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).; Host Details : local host is: "localhost.localdomain/127.0.0.1"; destination host is: "localhost":8020; 
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
    at org.apache.hadoop.ipc.Client.call(Client.java:1351)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at com.sun.proxy.$Proxy9.create(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy9.create(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:227)
    at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1389)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1382)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1307)
    at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:384)
    at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:380)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:380)
    at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:324)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:905)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:886)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:783)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:772)
    at knapp.hdfs.importer.insurance.InsuranceIO.writeToHdfsAsAvro(InsuranceIO.java:68)
    at knapp.hdfs.importer.insurance.InsuranceIO.writeFileToHdfsAsAvro(InsuranceIO.java:51)
    at knapp.hdfs.importer.ImporterCommandLine.run(ImporterCommandLine.java:72)
    at knapp.hdfs.importer.ImporterCommandLine.main(ImporterCommandLine.java:61)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).
    at com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
    at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
    at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.<init>(RpcHeaderProtos.java:1398)
    at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.<init>(RpcHeaderProtos.java:1362)
    at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$1.parsePartialFrom(RpcHeaderProtos.java:1492)
    at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$1.parsePartialFrom(RpcHeaderProtos.java:1487)
    at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
    at com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
    at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
    at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
    at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
    at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcHeaderProtos.java:2364)
    at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:996)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)

I have CDH4.8 running in pseudo-distributed mode:

[msknapp@localhost conf]$ hadoop fs -ls /user/msknapp
Found 2 items
drwxr-xr-x   - msknapp supergroup          0 2014-01-01 10:22 /user/msknapp/input
drwxr-xr-x   - msknapp supergroup          0 2014-01-01 10:35 /user/msknapp/output23
[msknapp@localhost conf]$ hadoop fs -ls /user
Found 1 items
drwxrwxrwx   - msknapp supergroup          0 2014-01-01 10:35 /user/msknapp
[msknapp@localhost conf]$ whereis hadoop
hadoop: /usr/bin/hadoop /etc/hadoop /usr/lib/hadoop /usr/share/man/man1/hadoop.1.gz

If important, I’m using CentOS 6.4. I can put data into HDFS from the command line, but for some reason can’t put it into code. Can someone tell me why I can’t write HDFS with my code?

Solution

I

got the same exception, I found that some dependencies are missing, I only have hadoop-core but need more. It worked when I added these :

            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-core</artifactId>
                <version>2.0.0-mr1-cdh4.3.0</version>
                <scope>test</scope>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-hdfs</artifactId>
                <version>2.0.0-cdh4.3.0</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>2.0.0-cdh4.3.0</version>
            </dependency>

Related Problems and Solutions