Scrapy uses CSS selectors to grab nested text… here is a solution to the problem.
Scrapy uses CSS selectors to grab nested text
I have the following html code:
<div class='article'>
<p>Lorem <strong>ipsum</strong> si ammet</p>
</div>
So get the text data as: Lorem ipsum si ammet
, so I try using:
response.css('div.article >p::text ').extract()
But I only received Lore Sie Ammet
How can I get both <p
> and <strong>
text using CSS selectors?
Solution
A lining solution.
"".join(a.strip() for a in response.css("div.article *::text").extract())
div.article *
means to crawl all content in div.article
Or simply written
text = ""
for a in response.css("div.article *::text").extract()
text += a.strip()
Both methods are the same