Find non-root parent nodes whose child nodes contain some text… here is a solution to the problem.
Find non-root parent nodes whose child nodes contain some text
I have some XML;
<root>
<parent>
<child>foo987654</child>
</parent>
<parent>
<child>bar15245</child>
</parent>
<parent>
<child>baz87742</child>
</parent>
<parent>
<child>foo123456</child>
</parent>
</root>
I’m using python and etree modules, and I want to select all <parent>
nodes whose child nodes start with “foo”. I know ETREE has limited support for XPath, but I’m new to XPath, so I’m struggling to find the best solution. I would think of something similar
parent[(contains(child,'foo'))]
But I’d like to reject parent nodes that include foo but don’t start with foo (i.e. <child>125456foo</child>
), so I’m not sure if this works. Also, I’m not sure if etree supports this level of xpath….
Edit:
Another acceptable solution is to select the parent whose child’s text is in the list.
Pseudocode
parent = > child [text=”foo1″|| “bar1” || “bar2”]
Is this possible?
Solution
This will get what you want :
[elem for elem in root.findall('parent') if elem.find('child').text.startswith('foo')]
Watch it in action:
s = """<root>
<parent>
<child>foo987654</child>
</parent>
<parent>
<child>bar15245</child>
</parent>
<parent>
<child>baz87742</child>
</parent>
<parent>
<child>foo123456</child>
</parent>
</root>"""
import xml.etree.ElementTree as ET
root = ET.fromstring(s)
elems = [elem for elem in root.findall('parent') if elem.find('child').text.startswith('foo')]
Check the data:
for elem in elems:
print elem.find('child').text
>>>
foo987654
foo123456