Java – Compare values in TSV files and Hbase tables in Java

Compare values in TSV files and Hbase tables in Java… here is a solution to the problem.

Compare values in TSV files and Hbase tables in Java

I have an Hbase table that has a unique row key and a column family and a column. I have a TSV file and it has about 300 more columns. The row keys in this file are combined values for two columns. So now I need to compare the row keys in the table and the file, and if the row keys match, then I need to insert the table column value as the last column in the TSV file for the corresponding row key. I wrote the code below, but it always executes the else part.

package mapReduce;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;

public class Tsv_read{

private static Configuration conf = null;

static {
        conf = HBaseConfiguration.create();
    }

@SuppressWarnings("resource")
    public static void main(String[] arg) throws Exception {

BufferedReader TSVFile = 
                new BufferedReader(new FileReader("Path/to/file/.tsv"));

String dataRow = TSVFile.readLine();
        List<String> list = new ArrayList<String>();

while (dataRow != null){
            list.clear();
            String[] dataArray = dataRow.split("\t");

for (String item:dataArray) { 

HTable table = new HTable(conf, "Table name"); Hbase table name
                Scan s = new Scan();
                ResultScanner ss = table.getScanner(s);
                for(Result r:ss){
                    for(KeyValue kv : r.raw()){
                        System.out.println("Rowkey :" +dataArray[12]+"-"+dataArray[13]);
                        System.out.print(new String(kv.getRow()) + " ");
                        if((dataArray[12]+"-"+dataArray[13]).equals(new String(kv.getRow()))){  //Comparing the rowkeys from file and table  (doesn't work)
                            System.out.println("File Rowkey :"+dataArray[12]+"-"+dataArray[13]);
                            System.out.println("Table Row key"+new String(kv.getRow()));
                            dataArray[392]=new String(kv.getValue());
                            FileWriter fstream = new FileWriter("/path/to/the/file/*.tsv",true);
                            BufferedWriter fbw = new BufferedWriter(fstream);
                            fbw.write(new String(kv.getValue())); inserting the value to the tsv file
                            fbw.newLine();
                            fbw.close();
                            System.out.println("Column value written succesfully");
                        }
                        else //always executes this part
                        {
                            System.out.println("RowKey not found :" +new String(kv.getRow()));
                        }
                        /*System.out.print(new String(kv.getFamily()) + ":");
                       System.out.print(new String(kv.getQualifier()) + " ");
                       System.out.print(kv.getTimestamp() + " "); */
                        System.out.println(new String(kv.getValue()));

list.add(item);
                    }
                }
            } 
            Iterator<String> it = list.iterator();
            while (it.hasNext()) {
                String txt = it.next();
                System.out.print(txt+" ");
            } 
            System.out.println();  Print the data line.
            dataRow = TSVFile.readLine(); 
        }

TSVFile.close();

System.out.println();

} //main()
} 

Example Record:

dataArray[12]+”-“+dataArray[13] = 3049620139673452544-5172983457411783096

In the Hbase table, rowkey also has a value in the same format.

I can’t share the entire record because it has more than 300 columns.

TSV file size e: Approximately 10GB

Hbase table: Approximately 10254950 rows.

Thanks for any help. Thanks in advance.

Solution

And not write like that

if((dataArray[12]+”-“+dataArray[13]).equals(new String(kv.getRow()))))){//Compare row keys in files and tables (does not work).

Try this

if((dataArray[12]+”-“+dataArray[13]).equals(Bytes.toString(kv.getRow()))){

You did not get the row values correctly.

Try this updated code, which uses Get instead of scanning from hbase and takes less time to run

    while (dataRow != null) {
        list.clear();
        String[] dataArray = dataRow.split("\t");

for (String item : dataArray) {

String key = dataArray[12] + "-" + dataArray[13];
            HTable table = new HTable(conf, "Table name");  Hbase table
                                                             name
            Get get = new Get(Bytes.toBytes(key));
            Result r = table.get(get);
            if (r != null && r.size() > 0) {
                for (KeyValue kv : r.raw()) {
                    System.out.println("File Rowkey :" + key);
                    System.out.println("Table Row key"
                            + Bytes.toString(kv.getRow()));
                    FileWriter fstream = new FileWriter(
                            "/path/to/the/file/*.tsv", true);
                    BufferedWriter fbw = new BufferedWriter(fstream);
                    fbw.write(new String(kv.getValue()));  inserting the
                                                             value to the
                                                             tsv file
                    fbw.newLine();
                    fbw.close();
                    System.out.println("Column value written succesfully");
                }
            } else {
                System.out.println("RowKey not found :" + key);
            }
            list.add(item);
        }
    }

Related Problems and Solutions