Skip to content

texty/missing-children-scrapers

Repository files navigation

Missing Children Scrapers

Data collection tools for scraping adoption websites and missing children databases. This repository contains the scrapers and raw datasets used in the missing-children-ai-search project for identifying missing Ukrainian children through facial recognition AI.

Data Sources

This repository collects data from three main sources:

  1. Russian data: usynovite.ru (Russian adoption website)
  2. Belarusian data: dadomu.by (Belarusian adoption website)
  3. Ukrainian data: childrenofwar.gov.ua (Ukrainian government database for displaced children)

Requirements

  • Python 3.x
  • Jupyter Notebook
  • curl, awk, and jq (for image downloads)

Install Python dependencies:

pip install -r requirements.txt

Usage

1. Scrape Profile Data

Run the Jupyter notebooks to download profile data from each website:

  • children_of_war.ipynb - Ukrainian data
  • dadomy.ipynb - Belarusian data
  • usynovite.ipynb - Russian data

2. Download Images

Use the bash script to download images from URLs in the scraped data files:

./download_images.sh

Images are saved with filenames matching the profile ID from the source URL.

Data Structure

The data/ directory contains:

  • Profile data: JSONL files with profile_link, image_url, and description columns
  • Images: ZIP archives of downloaded profile images (image filename = profile ID)

Notes

  • Russian website (usynovite.ru) accessed via Serbia VPN
  • Belarusian website (dadomu.by) requires login credentials (not shared for account owner's security)

Related Repository

The facial recognition analysis using this data is available at texty/missing-children-ai-search

About

Data collection tools and scraped datasets from Russian, Belarusian, and Ukrainian adoption/missing children databases for AI-powered facial recognition matching.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors