Python BeautifulSoup : How get text from self-closing tags

Python BeautifulSoup : How get text from self-closing tags … here is a solution to the problem.

Python BeautifulSoup : How get text from self-closing tags

I’m trying to parse the contents of an evernote list using BeautifulSoup. But when I call the html parser on the content, it keeps correcting the self-closing tag (en-todo), so when I try to get the text of the en-todo tag, it’s either blank.

note_body = '<en-todo checked="true" />window caulk<en-todo />cake pan<en-todo />cake mix<en-todo />salad mix<en-todo checked="true"/> painters tape<br />'

import re
from bs4 import BeautifulSoup 
soup = BeautifulSoup(note_body, 'html.parser')
checklist_items = soup.find_all('en-todo')
print checklist_items

The above code only returns the label, without any text.

[<en-todo checked="true"></en-todo>, <en-todo></en-todo>, <en-todo></en-todo>, <en-todo></en-todo>, <en-todo checked="true"></en-todo>]

Solution

You need to get a text message that is not included in the label!

You need to use tag.next_sibling!

>>> [each.next_sibling for each in checklist_items]
[u'window caulk', u'cake pan', u'cake mix', u'salad mix', u'painters tape']

Related Problems and Solutions