Missing Children Scrapers

Data collection tools for scraping adoption websites and missing children databases. This repository contains the scrapers and raw datasets used in the missing-children-ai-search project for identifying missing Ukrainian children through facial recognition AI.

Data Sources

This repository collects data from three main sources:

Russian data: usynovite.ru (Russian adoption website)
Belarusian data: dadomu.by (Belarusian adoption website)
Ukrainian data: childrenofwar.gov.ua (Ukrainian government database for displaced children)

Requirements

Python 3.x
Jupyter Notebook
curl, awk, and jq (for image downloads)

Install Python dependencies:

pip install -r requirements.txt

Usage

1. Scrape Profile Data

Run the Jupyter notebooks to download profile data from each website:

children_of_war.ipynb - Ukrainian data
dadomy.ipynb - Belarusian data
usynovite.ipynb - Russian data

2. Download Images

Use the bash script to download images from URLs in the scraped data files:

./download_images.sh

Images are saved with filenames matching the profile ID from the source URL.

Data Structure

The data/ directory contains:

Profile data: JSONL files with profile_link, image_url, and description columns
Images: ZIP archives of downloaded profile images (image filename = profile ID)

Notes

Russian website (usynovite.ru) accessed via Serbia VPN
Belarusian website (dadomu.by) requires login credentials (not shared for account owner's security)

Related Repository

The facial recognition analysis using this data is available at texty/missing-children-ai-search

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
children_of_war.ipynb		children_of_war.ipynb
dadomy.ipynb		dadomy.ipynb
download_images.sh		download_images.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
usynovite.ipynb		usynovite.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Missing Children Scrapers

Data Sources

Requirements

Usage

1. Scrape Profile Data

2. Download Images

Data Structure

Notes

Related Repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Missing Children Scrapers

Data Sources

Requirements

Usage

1. Scrape Profile Data

2. Download Images

Data Structure

Notes

Related Repository

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages