Python – Scrapy uses CSS selectors to grab nested text

Scrapy uses CSS selectors to grab nested text… here is a solution to the problem.

Scrapy uses CSS selectors to grab nested text

I have the following html code:

<div class='article'>
<p>Lorem <strong>ipsum</strong> si ammet</p>
</div>

So get the text data as: Lorem ipsum si ammet, so I try using:

response.css('div.article >p::text ').extract() 

But I only received Lore Sie Ammet

How can I get both <p> and <strong> text using CSS selectors?

Solution

A lining solution.

"".join(a.strip() for a in response.css("div.article *::text").extract())

div.article * means to crawl all content in div.article

Or simply written

text = ""
for a in response.css("div.article *::text").extract()
    text += a.strip()

Both methods are the same

Related Problems and Solutions