Java – How do I remove all inline styles and other attributes from html elements using Jsoup?

How do I remove all inline styles and other attributes from html elements using Jsoup?… here is a solution to the problem.

How do I remove all inline styles and other attributes from html elements using Jsoup?

How can I use Jsoup to remove all inline styles and other attributes (class, onclick) from html elements?

Example input:

<div style="padding-top:25px; " onclick="javascript:alert('hi'); ">
This is a sample div <span class='sampleclass'> This is a sample span </span>
</div>

Sample output:

<div>This is a sample div <span> This is a sample span </span> </div>

My code (is this the right way or is there something better?) )

Document doc = Jsoup.parse(html);
Elements el = doc.getAllElements();
for (Element e : el) {
    Attributes at = e.attributes();
    for (Attribute a : at) {    
        e.removeAttr(a.getKey());    
    }
}

Solution

Yes, one way is indeed to iterate through the elements and call removeAttr();

Another way to use jsoup is to use the Whitelist class (see docs), which can be used with the Whitelist class (see a href=” “noreferrer noopener nofollow”docs), which can be used with The Jsoup.clean() function is used together to remove any unspecified tags or attributes from the document.

For example:

String html = "<html><head></head><body><div style='padding-top:25px;' onclick='javascript.alert('hi'); '>This is a sample div <span class='sampleclass'>This is a simple span</span></div></body></html>";

Whitelist wl = Whitelist.simpleText();
wl.addTags("div", "span");  add additional tags here as necessary
String clean = Jsoup.clean(html, wl);
System.out.println(clean);

This results in the following output:

11-05 19:56:39.302: I/System.out(414): <div>
11-05 19:56:39.302: I/System.out(414):  This is a sample div 
11-05 19:56:39.302: I/System.out(414):  <span>This is a simple span</span>
11-05 19:56:39.302: I/System.out(414): </div>

Related Problems and Solutions