Skip to content
forked from m8sec/pymeta

Utility to download and extract document metadata from an organization. This technique can be used to identify: domains, usernames, software/version numbers and naming conventions.

License

Notifications You must be signed in to change notification settings

d-woosley/pymeta

 
 

Repository files navigation

PyMeta

     

PyMeta is a Python3 rewrite of the tool PowerMeta, created by dafthack in PowerShell. It uses specially crafted search queries to identify and download the following file types (pdf, xls, xlsx, csv, doc, docx, ppt, pptx) from a given domain using Google and Bing scraping.

Once downloaded, metadata is extracted from these files using Phil Harvey's exiftool and added to a .csv report. Alternatively, Pymeta can be pointed at a directory to extract metadata from files manually downloaded using the -dir command line argument. See the Usage, or All Options section for more information.

Note: Due to Google's increasingly aggressive anti-bot measures, web scraping may yield limited results. For improved reliability and more consistent results, consider using the Google Custom Search API with the --api-key and --search-engine-id flags.

Why?

Metadata is a common place for penetration testers and red teamers to find: domains, user accounts, naming conventions, software/version numbers, and more!

Getting Started

Prerequisites

Exiftool is required and can be installed with:

    Ubuntu/Kali - apt-get install exiftool -y

    Mac OS - brew install exiftool

Install:

Recommended: Install with pipx (preferred method)

Using pipx is the preferred installation method as it installs PyMeta in an isolated environment, preventing conflicts with system Python packages:

pipx install pymetasec

If you don't have pipx installed:

# Ubuntu/Debian
sudo apt install pipx
pipx ensurepath

# Mac OS
brew install pipx
pipx ensurepath

# Or via pip (then restart your shell)
python3 -m pip install --user pipx
python3 -m pipx ensurepath

Alternative: Install with pip

You can also install directly with pip, though this may affect system Python packages:

pip3 install pymetasec

Install from source:

Clone the repository and install locally:

git clone https://github.com/m8sec/pymeta
cd pymeta
pipx install .
# Or with pip: pip3 install .

Or install directly from GitHub without cloning:

pipx install git+https://github.com/m8sec/pymeta
# Or with pip: pip3 install git+https://github.com/m8sec/pymeta

Usage

Standard Search (Web Scraping)

  • Search Google and Bing for files within example.com and extract metadata to a csv report:
    pymeta -d example.com

  • Extract metadata from files within the given directory and create csv report:
    pymeta -dir Downloads/

Google API Search

Due to Google's aggressive anti-bot protections, web scraping may produce limited results. For better reliability, use the Google API option:

pymeta -d example.com --api-key "your_api_key_here" --search-engine-id "your_search_engine_id"

Setting up Google API

Step 1: Create Google Cloud Project

  • Go to Google Cloud Console
  • Login with a Google account
  • Click "Select a project" → "New Project"
  • Enter a project name (e.g., "PyMeta-API")
  • Click "Create"

Step 2: Enable Custom Search API

  • In your project, go to "APIs & Services" → "Library"
  • Search for "Custom Search API"
  • Click on it and press "Enable"

Step 3: Create API Key

  • Go to "APIs & Services" → "Credentials"
  • Click "Create Credentials" → "API Key"
  • Copy your API key (you'll need this for the --api-key flag)

Step 4: Create Custom Search Engine

  • Go to Google Programmable Search Engine
  • Click "Add a search engine"
  • Enter any name (e.g., "PyMeta Search")
  • For "Sites to search", select "Search the entire web"
  • Click "Create"
  • Copy your Search Engine ID (you'll need this for the --search-engine-id flag)

API Usage Notes:

  • Google provides 100 free API calls per day
  • Additional requests cost $5 per 1000 queries
  • API searches are more reliable than web scraping and less likely to be blocked
  • When using API mode, only Google search is used (Bing searches are disabled)

NOTE: Thanks to Beau Bullock (@dafthack) and the PowerMeta project for the above steps on getting a Google API key.

All Options

options:
  -h, --help            show this help message and exit
  -T MAX_THREADS        Max threads for file download (Default=5)
  -t TIMEOUT            Max timeout per search (Default=8)
  -j JITTER             Jitter between requests (Default=1)

Search Options:
  -s ENGINE, --search ENGINE    Search Engine (Default='google,bing')
  --file-type FILE_TYPE         File types to search (default=pdf,xls,xlsx,csv,doc,docx,ppt,pptx)
  -m MAX_RESULTS                Max results per type search

Google API Options:
  --api-key API_KEY             Google API key for Custom Search API
  --search-engine-id ID         Google Custom Search Engine ID

Proxy Options:
  --proxy PROXY         Proxy requests (IP:Port)
  --proxy-file PROXY    Load proxies from file for rotation

Output Options:
  -o DWNLD_DIR          Path to create downloads directory (Default: ./)
  -f REPORT_FILE        Custom report name ("pymeta_report.csv")

Target Options:
  -d DOMAIN             Target domain
  -dir FILE_DIR         Pre-existing directory of file

Credit

About

Utility to download and extract document metadata from an organization. This technique can be used to identify: domains, usernames, software/version numbers and naming conventions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%