Skip to content

tammytorbert/fraudbuilder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Banking Fraud Detection Data Generator

A comprehensive application for generating realistic banking transaction data with embedded fraud patterns for testing Elastic Security fraud detection capabilities.

Overview

This application generates:

  • 1000 banking accounts with realistic customer profiles
  • 6 months of transaction data (configurable)
  • Normal banking activities (deposits, withdrawals, purchases, credits)
  • Fraud patterns in 1-5% of accounts
  • ECS-compliant data format for Elasticsearch
  • Detection rules, ML jobs, dashboards, and saved searches

Features

Data Generation

  • Realistic Account Profiles: Generated customer names, account numbers, and demographics
  • Normal Transaction Patterns: Typical banking activities with realistic amounts and frequencies
  • Fraud Simulation: Multiple fraud patterns including:
    • Rapid succession transactions
    • Unusual transaction amounts
    • Off-hours activity
    • Geographic anomalies
    • Suspicious vendors

Vendor Behavior

  • Banks vs Retailers: Deposits/withdrawals route to vendors.banks; purchases, credits, and refunds route to vendors.retailers.
  • Suspicious Bias for Fraud: For fraud transactions, vendors are selected from suspicious lists about 30% of the time.
  • Refund Weighting per Retailer: Configure refund vendor bias via transactions.vendor_weights.refund.retailers.normal and .suspicious.
  • Transaction Type Weights: Configure rarity of types (e.g., refund) under transactions.normal_accounts.transaction_type_weights.

Elasticsearch Integration

  • ECS Compliance: All data follows Elastic Common Schema format
  • Index Templates: Pre-configured mappings for optimal search performance
  • Bulk Import Ready: NDJSON format for efficient data loading

Security Analytics

  • Detection Rules: Pre-built rules for common fraud patterns
  • Machine Learning Jobs: Anomaly detection configurations
  • Dashboards: Visualization for fraud monitoring
  • Saved Searches: Common investigation queries

Quick Start Guide

Follow these steps in order to set up the complete fraud detection system:

Step 1: Prerequisites

Ensure you have:

  • Python 3.7+ installed
  • Elasticsearch 8.x or 9.x running (with security enabled/disabled as needed)
  • Kibana 8.x or 9.x running and accessible
  • Network access to your Elasticsearch and Kibana instances

Step 2: Installation

# Navigate to the fraudbuilder directory
cd fraudbuilder

# Install Python dependencies
pip install -r requirements.txt

Step 3: Configuration

Edit config.yaml to match your environment. Key areas:

  • vendors.banks and vendors.retailers define bank vs retailer names.
  • transactions.normal_accounts.transaction_type_weights controls type rarity (e.g., refunds vs purchases).
  • transactions.vendor_weights.refund.retailers.{normal|suspicious} controls per-retailer refund weighting.
elasticsearch:
  host: "${ELASTICSEARCH_HOST:localhost}"
  port: "${ELASTICSEARCH_PORT:9200}"
  scheme: "${ELASTICSEARCH_SCHEME:http}"
  username: "${ELASTICSEARCH_USERNAME:elastic}"
  password: "${ELASTICSEARCH_PASSWORD:changeme}"
  api_key: "${ELASTICSEARCH_API_KEY:}"  # Alternative to username/password
  verify_certs: "${ELASTICSEARCH_VERIFY_CERTS:true}"

kibana:
  host: "${KIBANA_HOST:localhost}"
  port: "${KIBANA_PORT:5601}"
  scheme: "${KIBANA_SCHEME:http}"
  username: "${KIBANA_USERNAME:elastic}"
  password: "${KIBANA_PASSWORD:changeme}"

Option 1: Use defaults (for local development)

  • No changes needed - the config uses sensible defaults

Option 2: Set environment variables (recommended for production)

export ELASTICSEARCH_HOST="your-es-host.com"
export ELASTICSEARCH_USERNAME="your-username"
export ELASTICSEARCH_PASSWORD="your-password"

Option 3: Edit config.yaml directly

  • Replace ${VAR_NAME:default} with actual values

Step 4: Deploy Elasticsearch Components

This must be done BEFORE generating data

python3 setup_elasticsearch.py

This script will:

  • âś… Create index templates
  • âś… Deploy detection rules
  • âś… Set up ML jobs
  • âś… Import dashboards
  • âś… Import saved searches
  • âś… Verify the setup

Step 5: Generate Fraud Data

python3 fraud_generator.py

This will create:

  • banking_transactions.ndjson - Transaction data ready for Elasticsearch
  • Summary report of generated accounts and transactions

Step 6: Load Data into Elasticsearch

The loader now accepts either a single file or a directory.

# Load a single NDJSON file
python3 elasticsearch_loader.py --data-path output/banking_transactions.ndjson

# Load all supported files from a directory (JSON/NDJSON/JSONL)
python3 elasticsearch_loader.py --data-path output/

# Specify a custom config file
python3 elasticsearch_loader.py --data-path /path/to/data --config config.yaml

This will:

  • Bulk load all transaction data (optimized for performance)
  • Create index aliases
  • Verify data was loaded correctly

Step 7: Start Fraud Detection

  1. Open Kibana in your browser
  2. Navigate to Security > Dashboards
  3. Open "Banking Fraud Detection Overview"
  4. Navigate to Security > Rules to see active detection rules
  5. Check Machine Learning > Anomaly Detection for ML jobs

Alternative: Manual Setup

If you prefer manual setup or the automated script fails:

Manual Elasticsearch Setup

# Create index template
curl -X PUT "localhost:9200/_index_template/banking-transactions-template" \
  -H 'Content-Type: application/json' \
  -d @elasticsearch/index_template.json

# Load data manually
curl -X POST "localhost:9200/banking-transactions/_bulk" \
  -H 'Content-Type: application/x-ndjson' \
  --data-binary @banking_transactions.ndjson

Manual Kibana Setup

  1. Import dashboards via Stack Management > Saved Objects > Import
  2. Import saved searches the same way
  3. Manually create detection rules using the JSON files in detection_rules/
  4. Set up ML jobs using configurations in ml_jobs/

Advanced Configuration

Customize generation behavior via config.yaml:

  • accounts.total_count: Number of accounts to generate
  • accounts.fraud_percentage_min / accounts.fraud_percentage_max: Percent of accounts flagged as fraud
  • accounts.fraud_no_activity_percentage: Percent of fraud accounts with no fraud activity
  • vendors.banks.normal / vendors.banks.suspicious: Bank vendor pools
  • vendors.retailers.normal / vendors.retailers.suspicious: Retailer vendor pools
  • transactions.normal_accounts.transaction_types: Enabled transaction types
  • transactions.normal_accounts.transaction_type_weights: Rarity/weight per type (e.g., purchase, refund)
  • amounts.normal.refund: Typical refund amount range
  • transactions.vendor_weights.refund.retailers.normal and .suspicious: Per-retailer refund weighting

Example snippet:

accounts:
  total_count: 1000
  fraud_percentage_min: 1.0
  fraud_percentage_max: 5.0
  fraud_no_activity_percentage: 20

transactions:
  normal_accounts:
    transaction_type_weights:
      deposit: 20
      withdrawal: 15
      purchase: 50
      credit: 10
      refund: 5
  vendor_weights:
    refund:
      retailers:
        normal:
          Amazon: 12
          Walmart: 10
          Grocery Store: 7
        suspicious:
          "Unknown Merchant": 8
          "Cryptocurrency Exchange": 3

Environment Variables

You can override connection settings using environment variables:

# Elasticsearch connection
export ELASTICSEARCH_HOST="your-es-host.com"
export ELASTICSEARCH_PORT="9200"
export ELASTICSEARCH_SCHEME="https"
export ELASTICSEARCH_USERNAME="your-username"
export ELASTICSEARCH_PASSWORD="your-password"
export ELASTICSEARCH_API_KEY="your-api-key"  # Alternative to username/password
export ELASTICSEARCH_CA_CERTS="/path/to/ca.crt"
export ELASTICSEARCH_VERIFY_CERTS="true"

# Kibana connection
export KIBANA_HOST="your-kibana-host.com"
export KIBANA_PORT="5601"
export KIBANA_SCHEME="https"
export KIBANA_USERNAME="your-username"
export KIBANA_PASSWORD="your-password"

The format ${VAR_NAME:default_value} allows fallback to default values when environment variables are not set.

Elastic Cloud environment example

Use the following environment variables when connecting to Elastic Cloud (replace values with yours):

export ELASTICSEARCH_HOST="latam-fraud-e1bf9e.es.us-central1.gcp.cloud.es.io"
export ELASTICSEARCH_PORT="9243"
export ELASTICSEARCH_SCHEME="https"
export ELASTICSEARCH_USERNAME="logaroo"
export ELASTICSEARCH_PASSWORD="password"  # replace with your real password
  • Port 9243 and https are typical for Elastic Cloud.
  • You can alternatively use an API key instead of username/password:
export ELASTICSEARCH_HOST="latam-fraud-e1bf9e.es.us-central1.gcp.cloud.es.io"
export ELASTICSEARCH_PORT="9243"
export ELASTICSEARCH_SCHEME="https"
export ELASTICSEARCH_API_KEY="your-base64-api-key"
export ELASTICSEARCH_VERIFY_CERTS="true"

If you also access Kibana via Elastic Cloud:

export KIBANA_HOST="your-kibana-host"
export KIBANA_PORT="9243"
export KIBANA_SCHEME="https"
export KIBANA_USERNAME="your-username"
export KIBANA_PASSWORD="your-password"

Generated Files and Components

After running the complete setup, you'll have:

Data Files

  • banking_transactions.ndjson - Transaction data for Elasticsearch (ECS format)
  • accounts_summary.json - Summary of generated accounts and fraud patterns

Elasticsearch Components

  • elasticsearch/index_template.json - Index template with ECS mappings
  • detection_rules/ - 4 detection rules for common fraud patterns
  • ml_jobs/ - 2 ML job configurations for anomaly detection
  • dashboards/ - Banking fraud overview dashboard
  • saved_searches/ - 7 pre-configured investigation queries

Transaction Schema (ECS Format)

Each transaction follows Elastic Common Schema and includes:

{
  "@timestamp": "2024-01-15T14:30:00Z",
  "banking": {
    "account_number": "ACC-789012345",
    "customer_name": "John Smith",
    "transaction_type": "purchase",
    "amount": 125.50,
    "vendor": "Amazon",
    "is_fraud": false,
    "fraud_indicators": []
  },
  "event": {
    "category": ["network"],
    "type": ["connection"],
    "risk_score": 25
  },
  "user": {
    "id": "user-12345",
    "name": "John Smith"
  },
  "source": {
    "geo": {
      "city_name": "New York",
      "country_name": "United States",
      "location": {"lat": 40.7128, "lon": -74.0060}
    }
  }
}

Fraud Patterns Implemented

The system generates sophisticated fraud scenarios:

  1. Rapid Succession Fraud (40% of fraud cases)

    • Multiple transactions within minutes
    • Triggers velocity-based detection rules
  2. Amount Anomalies (30% of fraud cases)

    • Unusually large transactions (>$5000)
    • Micro-transactions for testing limits
  3. Temporal Anomalies (20% of fraud cases)

    • Transactions between 11 PM - 5 AM
    • Weekend and holiday activity
  4. Geographic Anomalies (10% of fraud cases)

    • Transactions from high-risk countries
    • Impossible travel scenarios
  5. Vendor Anomalies (Cross-cutting)

    • Suspicious merchants (Cash Advance, Cryptocurrency)
    • Unknown or offshore vendors

Detection Capabilities

Detection Rules

  • Rapid Succession Transactions: >5 transactions in 10 minutes
  • Unusual Transaction Amounts: Transactions >$5000 or <$1
  • Off-Hours Activity: Transactions between 11 PM - 5 AM
  • Geographic Anomalies: Transactions from high-risk countries

Machine Learning Jobs

  • Transaction Amount Anomaly: Detects unusual spending patterns
  • Transaction Frequency Anomaly: Identifies abnormal transaction velocity

Dashboards and Visualizations

  • Fraud vs Normal transaction ratios
  • Geographic distribution of transactions
  • Transaction amount distributions
  • Time-based fraud patterns
  • Account risk scoring

Elasticsearch Version Compatibility

This fraud detection system is compatible with both Elasticsearch 8.x and 9.x:

  • API Compatibility: All Elasticsearch Python client API calls used are compatible across both versions
  • Feature Support: Machine Learning jobs, detection rules, and dashboards work identically
  • Index Templates: Uses the modern _index_template API available in both versions
  • Security: Supports both basic authentication and API key authentication methods

Version-Specific Notes

  • Elasticsearch 8.x: Fully tested and supported
  • Elasticsearch 9.x: Compatible with all features, using the same API endpoints
  • Client Library: Uses elasticsearch>=8.0.0,<10.0.0 for broad compatibility

Troubleshooting

Common Issues

Connection Errors:

  • Verify Elasticsearch is running on the configured host/port
  • Check authentication credentials in config.yaml
  • Ensure network connectivity to Elasticsearch/Kibana

Data Loading Issues:

  • Run setup_elasticsearch.py before generating data
  • Check Elasticsearch disk space and cluster health
  • Verify index template was created successfully

Missing Dashboards:

  • Ensure Kibana is accessible
  • Check that saved objects were imported correctly
  • Verify user has appropriate Kibana permissions

Getting Help

  1. Check the console output for detailed error messages
  2. Verify all prerequisites are met
  3. Test Elasticsearch connectivity manually
  4. Review the generated log files for debugging information

License

This project is for educational and testing purposes only. Use responsibly and in compliance with your organization's security policies.

About

Fraud builder scripts and Elastic setup

Resources

Stars

Watchers

Forks

Packages

No packages published