Java – URLConnection does not read the entire page

URLConnection does not read the entire page… here is a solution to the problem.

URLConnection does not read the entire page

In my application, I need to download some web pages. That’s what I do

URL url = new URL(myUrl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setReadTimeout(5000000);//5 seconds to download
conn.setConnectTimeout(5000000);//5 seconds to connect
conn.setRequestMethod("GET");
conn.setDoInput(true);

conn.connect();
int response = conn.getResponseCode();
is = conn.getInputStream();

String s = readIt(is, len);
System.out.println("got: " + s);

My readIt function is:

public String readIt(InputStream stream) throws IOException {
    int len = 10000;
    Reader reader;
    reader = new InputStreamReader(stream, "UTF-8");
    char[] buffer = new char[len];
    reader.read(buffer);
    return new String(buffer);
}

The problem is that it doesn’t download the entire page. For example, if myUrl is “https://wikipedia.org”, then the output is
enter image description here

How do I download the entire page?

Update
The second answer from here Read/convert an InputStream to a String solved my problem. The problem is in the readIt function. You should read the response from InputStream like this:

static String convertStreamToString(java.io.InputStream is) {
   java.util.Scanner s = new java.util.Scanner(is).useDelimiter("\\A");
   return s.hasNext() ? s.next() : "";
}

Solution

There are a lot of bugs in your code :

  1. You are reading into a fixed-size character buffer.

  2. You ignored the result of the read(char[]) method. It returns the number of characters actually read… You need to use it.

  3. You assume that read(char[]) will read all the data. In fact, it only guarantees the return of at least one character… or zero to indicate that you have reached the end of the flow. When you arrive from the network connection, you can only get the data that has been sent on the other end and buffered locally.

  4. When you create a string from char[], you assume that each position in the character array contains a character from the stream.

There are multiple ways to do this correctly, and this is one way:

public String readIt(InputStream stream) throws IOException {
    Reader reader = new InputStreamReader(stream, "UTF-8");
    char[] buffer = new char[4096];
    StringBuilder builder = new StringBuilder();
    int len;
    while ((len = reader.read(buffer) > 0) {
        builder.append(buffer, 0, len);
    }
    return builder.toString();
}

Another approach is to use the readFully(Reader) method to find existing 3rd-party library methods.

Related Problems and Solutions