Java - Extract text outside of HTML tags

Extract text outside of HTML tags… here is a solution to the problem.

Extract text outside of HTML tags

I have the following HTML code:

<div class=example>Text #1</div> "Another Text 1"
<div class=example>Text #2</div> "Another Text 2"

I want to extract the text outside the tag, “Another Text 1” and “Another Text 2”

I’m using JSoup to achieve this.

Any ideas???

Thanks!

Solution

One solution is to use the ownText() method (see Jsoup ). docs )。 This method returns only the text owned by the specified element and ignores any text owned by its immediate child elements.

Using only the HTML you provide, you can extract <body> your own text:

String html = "<div class='example'>Text #1</div> 'Another Text 1'<div class='example'>Text #2</div> 'Another Text 2'";

Document doc = Jsoup.parse(html);
System.out.println(doc.body().ownText());

Will output:

'Another Text 1' 'Another Text 2'

Note that the ownText() method can be used with any Element. there is another example in docs

Java – Extract text outside of HTML tags

Extract text outside of HTML tags

Solution

Related Problems and Solutions