Java: Read file, first line is missing

Java: Read file, first line is missing … here is a solution to the problem.

Java: Read file, first line is missing

I have a HashMap<String, String> which holds the filename and display name of the image. I read the file line by line and add the key and value to the HashMap.

BufferedReader reader;
String line;
String[] lineSplit;

HashMap<String, String> imenaZnaki = new HashMap<String, String>();

try {
    reader = new BufferedReader(new InputStreamReader(am.open("znaki_imena.txt"), "UTF-8"));
    line = reader.readLine();
    while (line != null) {
        lineSplit = line.split("->");
        imenaZnaki.put(lineSplit[0], lineSplit[1]);
        line = reader.readLine();
    }
    reader.close();
} catch (IOException e) {
    e.printStackTrace();
}

Except for the first added entry, .get(key), everything works as expected. Returns null and .containsKey(key) returns false. All other keys/values are valid and stored correctly in the HashMap.

Edit:

It doesn’t make sense at all… After I added the content to the HashMap, I ran the code provided by MagicMan to check if all entries were in the HashMap

for(String key: imenaZnaki.keySet()) {
    System.out.println("KEY: " + key + "  VALUE: " + imenaZnaki.get(key));
}

If I use CTRL+F for “nevar_andrej”, it shows 4 outputs, which is correct. However, if I search for “nevar_andrej” (with spaces), it only shows 3 of them, which is wrong because the first one is lost. So my guess is that there is something in the first line of the file that causes confusion. So I added a dummy/fake first line (bla_bla-> Bla bla) and it works, but it’s a nasty workaround.

This is my full-text file encoded in UTF-8 http://pastebin.com/6A4r4jm6

Solution

Try running the code below to see if all your entries are there:

for(String key: imenaZnaki.keySet()) {
        System.out.println("KEY: " + key + "  VALUE: " + imenaZnaki.get(key));
}

I was able to run your code and example and have all 7 entries pop up. If it doesn’t work, see Antiduh’s review; The string you used in the get() call may have encountered encoding issues.

This is running from OS X

The console output specifies UTF-8 format:

KEY: nevar_andrejev_kriz_zelezniska_proga_je_dvo_ali_vectirna  VALUE: Andrejev kri? (?elezni?ka proga je dvo ali ve?tirna)
KEY: nevar_andrejev_kriz_zelezniska_proga_je_dvo_ali_vectirna_2  VALUE: Andrejev kri? (?elezni?ka proga je dvo ali ve?tirna)
KEY: nevar_andrejev_kriz_zelezniska_proga_je_enotirna  VALUE: Andrejev kri? (?elezni?ka proga je enotirna)
KEY: nevar_andrejev_kriz_zelezniska_proga_je_enotirna_2  VALUE: Andrejev kri? (?elezni?ka proga je enotirna)
KEY: nevar_blizina_letaliske_steze  VALUE: Bli?ina letali?ke steze
KEY: nevar_blizina_obale  VALUE: Bli?ina obale
KEY: nevar_blizina_svetlobnih_prometnih_znakov  VALUE: Bli?ina svetlobnih prometnih znakov

Update

So this seems to be reading the BOM( http://en.wikipedia.org/wiki/Byte_order_mark ) problem. I originally resaved the file as UTF-8 and it produced the output above. Then I resaved the file encoded as “UTF-8 with BOM” and it produced this:

KEY: ??? nevar_andrejev_kriz_zelezniska_proga_je_dvo_ali_vectirna  VALUE: Andrejev kri?? (?? elezni?? ka proga je dvo ali ve?? tirna)
KEY: nevar_andrejev_kriz_zelezniska_proga_je_dvo_ali_vectirna_2  VALUE: Andrejev kri?? (?? elezni?? ka proga je dvo ali ve?? tirna)
KEY: nevar_andrejev_kriz_zelezniska_proga_je_enotirna  VALUE: Andrejev kri?? (?? elezni?? ka proga je enotirna)
KEY: nevar_andrejev_kriz_zelezniska_proga_je_enotirna_2  VALUE: Andrejev kri?? (?? elezni?? ka proga je enotirna)
KEY: nevar_blizina_letaliske_steze  VALUE: Bli?? ina letali?? ke steze
KEY: nevar_blizina_obale  VALUE: Bli?? ina obale
KEY: nevar_blizina_svetlobnih_prometnih_znakov  VALUE: Bli?? ina svetlobnih prometnih znakov

Notice these three? The character that precedes the first line. These are the hexadecimal BOM characters: EF BB BF. This can be the cause of the problem with your first line. Try resaving the file without a BOM using a text editor such as Notepad++ or SublimeText

Related Problems and Solutions