How to get XML declaration strings using lxml… here is a solution to the problem.
How to get XML declaration strings using lxml
I use lxml
to parse XML documents
How do I get the claim string?
<?xml version="1.0" encoding="utf-8" ?>
I want to check if it exists, what encoding it has, and what xml version it has.
Solution
When parsing a document, the resulting ElementTree
object should have a DocInfo
object that contains information about the parsed XML or HTML document.
For XML, you may be interested in the xml_version
and encoding
properties of this DocInfo
:
>>> from lxml import etree
>>> tree = etree.parse('input.xml')
>>> tree.docinfo
<lxml.etree.DocInfo object at 0x7f8111f9ecc0>
>>> tree.docinfo.xml_version
'1.0'
>>> tree.docinfo.encoding
'UTF-8'