How to get crawler return data with scrapy via external script?… here is a solution to the problem.
How to get crawler return data with scrapy via external script?
Executing such a script, how to view the return data of the crawler parse function?
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import log, signals
from testspiders.spiders.followall import FollowAllSpider
spider = FollowAllSpider(domain='scrapinghub.com')
crawler = Crawler(Settings())
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
crawler.stats
#log.start()
reactor.run()
I disabled the log to see print messages in the spider, but the returned data after enabling the log does not show up either.
The spider’s parsing function’s code returns a simple string.
How do I get this data? I tried printing “reactor.run” results, but it was always “none”
Solution
Here’s how I found to get the collection elements :
items = []
def add_item(item):
items.append(item)
crawler.signals.connect(add_item, signals.item_passed)
I gave my original answer in the link question with more details:
https://stackoverflow.com/a/23892650/2730032