State Aid

Simple scraping of transparency state aid register from the EU State Aid Transparency Register.

Environment Configuration

Copy the example environment file:
```
cp .env.example .env
```

Edit the .env file with your database credentials and other settings:

DB_NAME=state_aid_db
DB_USER=postgres
DB_PASSWORD=your_password
DB_HOST=localhost
DB_PORT=5432

Usage

Run the scraper:

python main.py run

The scraper can be run recurrently (e.g., as a scheduled job) as it will:

Skip records that have already been inserted
Report how many new records were inserted and how many duplicates were skipped

Implementation Details

The state aid data doesn't have a single unique identifier, so we use a composite key of these fields:

SA Number (sa_number)
Reference Number (ref_no)
National ID (national_id)
Beneficiary Name (beneficiary_name)
Date Granted (date_granted)

This combination uniquely identifies each award. The scraper uses PostgreSQL's ON CONFLICT DO NOTHING to skip duplicates when inserting data.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

State Aid

Environment Configuration

Usage

Implementation Details

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Languages

DataFrosch/scraper-stateaid

Folders and files

Latest commit

History

Repository files navigation

State Aid

Environment Configuration

Usage

Implementation Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Languages

Packages