Python – Find non-root parent nodes whose child nodes contain some text

Find non-root parent nodes whose child nodes contain some text… here is a solution to the problem.

Find non-root parent nodes whose child nodes contain some text

I have some XML;

<root>
    <parent>
        <child>foo987654</child>
    </parent>
    <parent>
        <child>bar15245</child>
    </parent>
    <parent>
        <child>baz87742</child>
    </parent>
    <parent>
        <child>foo123456</child>
    </parent>
</root>

I’m using python and etree modules, and I want to select all <parent> nodes whose child nodes start with “foo”. I know ETREE has limited support for XPath, but I’m new to XPath, so I’m struggling to find the best solution. I would think of something similar

parent[(contains(child,'foo'))] 

But I’d like to reject parent nodes that include foo but don’t start with foo (i.e. <child>125456foo</child> ), so I’m not sure if this works. Also, I’m not sure if etree supports this level of xpath….

Edit:

Another acceptable solution is to select the parent whose child’s text is in the list.
Pseudocode
parent = > child [text=”foo1″|| “bar1” || “bar2”]

Is this possible?

Solution

This will get what you want :

[elem for elem in root.findall('parent') if elem.find('child').text.startswith('foo')]

Watch it in action:

s = """<root>
    <parent>
        <child>foo987654</child>
    </parent>
    <parent>
        <child>bar15245</child>
    </parent>
    <parent>
        <child>baz87742</child>
    </parent>
    <parent>
        <child>foo123456</child>
    </parent>
</root>"""

import xml.etree.ElementTree as ET

root = ET.fromstring(s)
elems = [elem for elem in root.findall('parent') if elem.find('child').text.startswith('foo')]

Check the data:

for elem in elems:
    print elem.find('child').text
>>>
foo987654
foo123456

Related Problems and Solutions