Python scrapy – table extracting data from – does not have an id tag

Python scrapy –

extracting data from – does not have an id tag … here is a solution to the problem.

Python scrapy –

extracting data from – does not have an id tag

I

was new to Python and Scrapy, so I created a project to learn on my own. But at the moment I’m struggling to get data from the following page: Website to crawl

As I saw in the Developer Tools in Chrome/Firefox, there are 8 tables with the following classes: <table class="sc-fHxwqH ddWfJE">

In this image< a href=”/image/nIQwR.jpg” rel=”noreferrer noopener nofollow”>structure and table I’d like to extract You will see the structure and column (<td> I want to extract ), where the value is = “Wheelchair accessible”. The value is in the second column, which is a picture label.
It reads like this: If I can find it (in this case, “Wheelchair accessible”), the value is equal to true If I can’t find it at all, the value is equal to false.

I manage things around it, like browsing the parent details of the website three. But now I can’t browse to the correct XPATH to find this table with class="sc-fHxwqH ddWfJE".

I tried to narrow it down to the very basic part of shell cmd :

scrapy shell 'https://www.immoscout24.ch/de/d/wohnung-kaufen-bevilard/4761145?s=2&t=2&l=436&r=40&se=16&ci=3&ct=1290'
tables = response.xpath('//*[@class="sc-fHxwqH ddWfJE"]/table')
for table in tables[1:]:
    print("I found it!!") #this should be returned 8 times, once for each table
    table.xpath('tr/td[1]//text()').extract_first()

The full path to the wheelchair accessibility channel is:
//*[@id="root"]/div/div/div[1]/section/article[7]/table/tbody/tr[1]/td[1]

Unfortunately, the code above returns nothing. I didn’t get any errors, but I didn’t print the results I expected either.

  1. What am I doing wrong? I guess it won’t be that hard, right?
  2. Once you’ve found the right table, how do you extract the data from your table as JSON in the easiest and fastest way? I guess this would be slow, because basically I need to parse the entire HTML code multiple times to find out if each attribute description exists?

Thank you very much for all your help or any tips! I’ve spent days trying to figure it out…

Solution

There’s no need to request HTML, grab node values, and put them into JSON because the required data already comes from the API in JSON format

Try it simply

import requests
import json

url = "https://react-api.immoscout24.ch/v1.3/properties/4761145?ci=3&ct=1290&l=436&lng=de&p=4761145&r=40&s=2&se=16&t=2"
response = requests.get(url).json()

Then you can get the data you need, like

:

print(response['propertyDetails']['agency'])

Output:

{'companyCity': 'Bevilard', 'companyName1': 'avendre.ch ', 'companyName2': 'Agen
ce Berne', 'companyPhoneMobile': '078 868 60 64', 'companyStreet': 'Rue Principa
le 41', 'companyZip': '2735', 'email': '[email protected]', 'firstName': 'Verena'
, 'gender': 'f', 'lastName': 'Pecaut-Steiner', 'logoUrl': 'https://www.immoscout
24.ch/resources/memberlogos/L356353-R.jpg', 'nameFormatted': 'Verena Pecaut-Stei
ner', 'webUrl': 'http://www.avendre.ch'}

Related Problems and Solutions