Skip to content

Scrapping from python results of GBFS-validator #165

@iaguerri

Description

@iaguerri

If you are new to the GBFS Validator, please introduce yourself (name and organization/link to GBFS). It’s helpful to know who we're chatting with!

I'm working in a MaaS application. I need to validate the GBFS that the public operators gives to me.

What is the issue and why is it an issue?

I'm trying to do a request from python to the result of a validation (https://gbfs-validator.mobilitydata.org/validator?url=https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json)
I'm trying from POSTMAN

The problem is that the response is a 200 (OK) but the info is not possible to extract (even with scrapping) because the body says "We're sorry but my-project doesn't work properly without Javascript enabled. Please enable to continue"

The code used:

import requests
from bs4 import BeautifulSoup
 
url_validator = "[https://gbfs-validator.mobilitydata.org/validator"](https://gbfs-validator.mobilitydata.org/validator%22)
 
# Jsons de prueba
json_main_full_brusels = "[https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json"](https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json%22)                                               # Json Correcto
json_main_nolastupdated_brusels = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoLastUpdated.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoLastUpdated.json%22)                 # Json Incorrecto (No last Updated)
json_main_vehiclyType_nolastupdated = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasVehiclyTypeCorrupted.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasVehiclyTypeCorrupted.json%22)      # Json Incorrecto - feed VehicleTypes sin lastUpdated
json_main_nofeed_systeminformation = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoSysteminformationfeed.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoSysteminformationfeed.json%22)    # Json Incorrecto - No feed SystemInformation
 
params = {
    "url": json_main_nolastupdated_brusels
}
 
url_completa = requests.Request('GET', url_validator, params=params).prepare().url
print("URL de la solicitud:", url_completa)
 

#APPROACH 1: access from the request
respuesta = requests.get(url_validator, params=params)

if respuesta.status_code == 200:
     datos_respuesta = respuesta.text
     print("Respuesta del Validador:", datos_respuesta)
else:
     print("Error en la solicitud. Código de estado:", respuesta.status_code)
     print("Contenido de la respuesta:", respuesta.text)`


#APPROACH 2: with selenium
soup = BeautifulSoup(respuesta.content, 'html.parser')
 
for div_element in soup.find_all('div', class_='data-v-7c2075bd'):
    # Extract the text content of the div element
    div_text = div_element.get_text(strip=True)
   
    # Print the value of k
    print("Valor de k es:", div_text)

image

image

Please describe some potential solutions you have considered (even if they aren’t related to GBFS).

I don't know why the html is not loaded after, but maybe activating Javascript it would be nicer to get this info

Thanks!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions