Simple scraping of transparency state aid register from the EU State Aid Transparency Register.
-
Copy the example environment file:
cp .env.example .env -
Edit the
.envfile with your database credentials and other settings:DB_NAME=state_aid_db DB_USER=postgres DB_PASSWORD=your_password DB_HOST=localhost DB_PORT=5432
Run the scraper:
python main.py run
The scraper can be run recurrently (e.g., as a scheduled job) as it will:
- Skip records that have already been inserted
- Report how many new records were inserted and how many duplicates were skipped
The state aid data doesn't have a single unique identifier, so we use a composite key of these fields:
- SA Number (sa_number)
- Reference Number (ref_no)
- National ID (national_id)
- Beneficiary Name (beneficiary_name)
- Date Granted (date_granted)
This combination uniquely identifies each award. The scraper uses PostgreSQL's ON CONFLICT DO NOTHING to skip duplicates when inserting data.