Python – How to get crawler return data with scrapy via external script?

How to get crawler return data with scrapy via external script?… here is a solution to the problem.

How to get crawler return data with scrapy via external script?

Executing such a script, how to view the return data of the crawler parse function?

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import log, signals
from testspiders.spiders.followall import FollowAllSpider

spider = FollowAllSpider(domain='scrapinghub.com')
crawler = Crawler(Settings())
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
crawler.stats
#log.start()
reactor.run()

I disabled the log to see print messages in the spider, but the returned data after enabling the log does not show up either.

The spider’s parsing function’s code returns a simple string.

How do I get this data? I tried printing “reactor.run” results, but it was always “none”

Solution

Here’s how I found to get the collection elements :

items = []
def add_item(item):
    items.append(item)

crawler.signals.connect(add_item, signals.item_passed)

I gave my original answer in the link question with more details:
https://stackoverflow.com/a/23892650/2730032

Related Problems and Solutions