A comprehensive application for generating realistic banking transaction data with embedded fraud patterns for testing Elastic Security fraud detection capabilities.
This application generates:
- 1000 banking accounts with realistic customer profiles
- 6 months of transaction data (configurable)
- Normal banking activities (deposits, withdrawals, purchases, credits)
- Fraud patterns in 1-5% of accounts
- ECS-compliant data format for Elasticsearch
- Detection rules, ML jobs, dashboards, and saved searches
- Realistic Account Profiles: Generated customer names, account numbers, and demographics
- Normal Transaction Patterns: Typical banking activities with realistic amounts and frequencies
- Fraud Simulation: Multiple fraud patterns including:
- Rapid succession transactions
- Unusual transaction amounts
- Off-hours activity
- Geographic anomalies
- Suspicious vendors
- Banks vs Retailers: Deposits/withdrawals route to
vendors.banks; purchases, credits, and refunds route tovendors.retailers. - Suspicious Bias for Fraud: For fraud transactions, vendors are selected from
suspiciouslists about 30% of the time. - Refund Weighting per Retailer: Configure refund vendor bias via
transactions.vendor_weights.refund.retailers.normaland.suspicious. - Transaction Type Weights: Configure rarity of types (e.g.,
refund) undertransactions.normal_accounts.transaction_type_weights.
- ECS Compliance: All data follows Elastic Common Schema format
- Index Templates: Pre-configured mappings for optimal search performance
- Bulk Import Ready: NDJSON format for efficient data loading
- Detection Rules: Pre-built rules for common fraud patterns
- Machine Learning Jobs: Anomaly detection configurations
- Dashboards: Visualization for fraud monitoring
- Saved Searches: Common investigation queries
Follow these steps in order to set up the complete fraud detection system:
Ensure you have:
- Python 3.7+ installed
- Elasticsearch 8.x or 9.x running (with security enabled/disabled as needed)
- Kibana 8.x or 9.x running and accessible
- Network access to your Elasticsearch and Kibana instances
# Navigate to the fraudbuilder directory
cd fraudbuilder
# Install Python dependencies
pip install -r requirements.txtEdit config.yaml to match your environment. Key areas:
vendors.banksandvendors.retailersdefine bank vs retailer names.transactions.normal_accounts.transaction_type_weightscontrols type rarity (e.g., refunds vs purchases).transactions.vendor_weights.refund.retailers.{normal|suspicious}controls per-retailer refund weighting.
elasticsearch:
host: "${ELASTICSEARCH_HOST:localhost}"
port: "${ELASTICSEARCH_PORT:9200}"
scheme: "${ELASTICSEARCH_SCHEME:http}"
username: "${ELASTICSEARCH_USERNAME:elastic}"
password: "${ELASTICSEARCH_PASSWORD:changeme}"
api_key: "${ELASTICSEARCH_API_KEY:}" # Alternative to username/password
verify_certs: "${ELASTICSEARCH_VERIFY_CERTS:true}"
kibana:
host: "${KIBANA_HOST:localhost}"
port: "${KIBANA_PORT:5601}"
scheme: "${KIBANA_SCHEME:http}"
username: "${KIBANA_USERNAME:elastic}"
password: "${KIBANA_PASSWORD:changeme}"Option 1: Use defaults (for local development)
- No changes needed - the config uses sensible defaults
Option 2: Set environment variables (recommended for production)
export ELASTICSEARCH_HOST="your-es-host.com"
export ELASTICSEARCH_USERNAME="your-username"
export ELASTICSEARCH_PASSWORD="your-password"Option 3: Edit config.yaml directly
- Replace
${VAR_NAME:default}with actual values
This must be done BEFORE generating data
python3 setup_elasticsearch.pyThis script will:
- âś… Create index templates
- âś… Deploy detection rules
- âś… Set up ML jobs
- âś… Import dashboards
- âś… Import saved searches
- âś… Verify the setup
python3 fraud_generator.pyThis will create:
banking_transactions.ndjson- Transaction data ready for Elasticsearch- Summary report of generated accounts and transactions
The loader now accepts either a single file or a directory.
# Load a single NDJSON file
python3 elasticsearch_loader.py --data-path output/banking_transactions.ndjson
# Load all supported files from a directory (JSON/NDJSON/JSONL)
python3 elasticsearch_loader.py --data-path output/
# Specify a custom config file
python3 elasticsearch_loader.py --data-path /path/to/data --config config.yamlThis will:
- Bulk load all transaction data (optimized for performance)
- Create index aliases
- Verify data was loaded correctly
- Open Kibana in your browser
- Navigate to Security > Dashboards
- Open "Banking Fraud Detection Overview"
- Navigate to Security > Rules to see active detection rules
- Check Machine Learning > Anomaly Detection for ML jobs
If you prefer manual setup or the automated script fails:
# Create index template
curl -X PUT "localhost:9200/_index_template/banking-transactions-template" \
-H 'Content-Type: application/json' \
-d @elasticsearch/index_template.json
# Load data manually
curl -X POST "localhost:9200/banking-transactions/_bulk" \
-H 'Content-Type: application/x-ndjson' \
--data-binary @banking_transactions.ndjson- Import dashboards via Stack Management > Saved Objects > Import
- Import saved searches the same way
- Manually create detection rules using the JSON files in
detection_rules/ - Set up ML jobs using configurations in
ml_jobs/
Customize generation behavior via config.yaml:
accounts.total_count: Number of accounts to generateaccounts.fraud_percentage_min/accounts.fraud_percentage_max: Percent of accounts flagged as fraudaccounts.fraud_no_activity_percentage: Percent of fraud accounts with no fraud activityvendors.banks.normal/vendors.banks.suspicious: Bank vendor poolsvendors.retailers.normal/vendors.retailers.suspicious: Retailer vendor poolstransactions.normal_accounts.transaction_types: Enabled transaction typestransactions.normal_accounts.transaction_type_weights: Rarity/weight per type (e.g.,purchase,refund)amounts.normal.refund: Typical refund amount rangetransactions.vendor_weights.refund.retailers.normaland.suspicious: Per-retailer refund weighting
Example snippet:
accounts:
total_count: 1000
fraud_percentage_min: 1.0
fraud_percentage_max: 5.0
fraud_no_activity_percentage: 20
transactions:
normal_accounts:
transaction_type_weights:
deposit: 20
withdrawal: 15
purchase: 50
credit: 10
refund: 5
vendor_weights:
refund:
retailers:
normal:
Amazon: 12
Walmart: 10
Grocery Store: 7
suspicious:
"Unknown Merchant": 8
"Cryptocurrency Exchange": 3You can override connection settings using environment variables:
# Elasticsearch connection
export ELASTICSEARCH_HOST="your-es-host.com"
export ELASTICSEARCH_PORT="9200"
export ELASTICSEARCH_SCHEME="https"
export ELASTICSEARCH_USERNAME="your-username"
export ELASTICSEARCH_PASSWORD="your-password"
export ELASTICSEARCH_API_KEY="your-api-key" # Alternative to username/password
export ELASTICSEARCH_CA_CERTS="/path/to/ca.crt"
export ELASTICSEARCH_VERIFY_CERTS="true"
# Kibana connection
export KIBANA_HOST="your-kibana-host.com"
export KIBANA_PORT="5601"
export KIBANA_SCHEME="https"
export KIBANA_USERNAME="your-username"
export KIBANA_PASSWORD="your-password"The format ${VAR_NAME:default_value} allows fallback to default values when environment variables are not set.
Use the following environment variables when connecting to Elastic Cloud (replace values with yours):
export ELASTICSEARCH_HOST="latam-fraud-e1bf9e.es.us-central1.gcp.cloud.es.io"
export ELASTICSEARCH_PORT="9243"
export ELASTICSEARCH_SCHEME="https"
export ELASTICSEARCH_USERNAME="logaroo"
export ELASTICSEARCH_PASSWORD="password" # replace with your real password- Port
9243andhttpsare typical for Elastic Cloud. - You can alternatively use an API key instead of username/password:
export ELASTICSEARCH_HOST="latam-fraud-e1bf9e.es.us-central1.gcp.cloud.es.io"
export ELASTICSEARCH_PORT="9243"
export ELASTICSEARCH_SCHEME="https"
export ELASTICSEARCH_API_KEY="your-base64-api-key"
export ELASTICSEARCH_VERIFY_CERTS="true"If you also access Kibana via Elastic Cloud:
export KIBANA_HOST="your-kibana-host"
export KIBANA_PORT="9243"
export KIBANA_SCHEME="https"
export KIBANA_USERNAME="your-username"
export KIBANA_PASSWORD="your-password"After running the complete setup, you'll have:
banking_transactions.ndjson- Transaction data for Elasticsearch (ECS format)accounts_summary.json- Summary of generated accounts and fraud patterns
elasticsearch/index_template.json- Index template with ECS mappingsdetection_rules/- 4 detection rules for common fraud patternsml_jobs/- 2 ML job configurations for anomaly detectiondashboards/- Banking fraud overview dashboardsaved_searches/- 7 pre-configured investigation queries
Each transaction follows Elastic Common Schema and includes:
{
"@timestamp": "2024-01-15T14:30:00Z",
"banking": {
"account_number": "ACC-789012345",
"customer_name": "John Smith",
"transaction_type": "purchase",
"amount": 125.50,
"vendor": "Amazon",
"is_fraud": false,
"fraud_indicators": []
},
"event": {
"category": ["network"],
"type": ["connection"],
"risk_score": 25
},
"user": {
"id": "user-12345",
"name": "John Smith"
},
"source": {
"geo": {
"city_name": "New York",
"country_name": "United States",
"location": {"lat": 40.7128, "lon": -74.0060}
}
}
}The system generates sophisticated fraud scenarios:
-
Rapid Succession Fraud (40% of fraud cases)
- Multiple transactions within minutes
- Triggers velocity-based detection rules
-
Amount Anomalies (30% of fraud cases)
- Unusually large transactions (>$5000)
- Micro-transactions for testing limits
-
Temporal Anomalies (20% of fraud cases)
- Transactions between 11 PM - 5 AM
- Weekend and holiday activity
-
Geographic Anomalies (10% of fraud cases)
- Transactions from high-risk countries
- Impossible travel scenarios
-
Vendor Anomalies (Cross-cutting)
- Suspicious merchants (Cash Advance, Cryptocurrency)
- Unknown or offshore vendors
- Rapid Succession Transactions: >5 transactions in 10 minutes
- Unusual Transaction Amounts: Transactions >$5000 or <$1
- Off-Hours Activity: Transactions between 11 PM - 5 AM
- Geographic Anomalies: Transactions from high-risk countries
- Transaction Amount Anomaly: Detects unusual spending patterns
- Transaction Frequency Anomaly: Identifies abnormal transaction velocity
- Fraud vs Normal transaction ratios
- Geographic distribution of transactions
- Transaction amount distributions
- Time-based fraud patterns
- Account risk scoring
This fraud detection system is compatible with both Elasticsearch 8.x and 9.x:
- API Compatibility: All Elasticsearch Python client API calls used are compatible across both versions
- Feature Support: Machine Learning jobs, detection rules, and dashboards work identically
- Index Templates: Uses the modern
_index_templateAPI available in both versions - Security: Supports both basic authentication and API key authentication methods
- Elasticsearch 8.x: Fully tested and supported
- Elasticsearch 9.x: Compatible with all features, using the same API endpoints
- Client Library: Uses
elasticsearch>=8.0.0,<10.0.0for broad compatibility
Connection Errors:
- Verify Elasticsearch is running on the configured host/port
- Check authentication credentials in
config.yaml - Ensure network connectivity to Elasticsearch/Kibana
Data Loading Issues:
- Run
setup_elasticsearch.pybefore generating data - Check Elasticsearch disk space and cluster health
- Verify index template was created successfully
Missing Dashboards:
- Ensure Kibana is accessible
- Check that saved objects were imported correctly
- Verify user has appropriate Kibana permissions
- Check the console output for detailed error messages
- Verify all prerequisites are met
- Test Elasticsearch connectivity manually
- Review the generated log files for debugging information
This project is for educational and testing purposes only. Use responsibly and in compliance with your organization's security policies.