This repository contains Infrastructure-as-Code (IaC) definitions written in Terraform to deploy a fully automated, end-to-end data pipeline for fraud detection. The pipeline ingests raw data into S3, validates schema, processes and transforms data using AWS Glue, triggers ML predictions using Amazon SageMaker, and stores enriched fraud-classified records into DynamoDB.
The Terraform configuration provisions all required AWS resources, including:
- Amazon S3 (Landing, Processed, Curated buckets)
- AWS Lambda for schema validation and model inferenc
- AWS Glue Crawler + ETL Job
- Amazon SageMaker Model, Endpoint Configuration, Endpoint
- AWS DynamoDB final results table
- IAM Roles & Policies for Glue, Lambda, and SageMaker
Terraform does not deploy code scripts (Python files, Glue scripts, etc.). Instead, users must upload these ZIP/script files to S3 beforehand and supply their S3 paths in terraform.tfvars.
terraform/
│ main.tf
│ variables.tf
│ outputs.tf
│ terraform.tfvars (user-created)
│
├── s3.tf # Landing / Processed / Curated S3 buckets
├── iam.tf # IAM roles + policies
├── lambda.tf # Lambda functions + S3 triggers
├── glue.tf # Glue crawler + ETL job
├── dynamodb.tf # Final DynamoDB table
└── sagemaker.tf # SageMaker model + endpoint
You must install the following before deploying: Terraform 1.5+ AWS CLI v2+ Python 3.12(optional) For creating Lambda ZIPs manually ZIP utility To compress Lambda code for upload
The AWS account or IAM user running Terraform must have:
- S3: CreateBucket, PutBucketPolicy, PutObject
- IAM: CreateRole, AttachRolePolicy, PassRole
- Lambda: CreateFunction, AddPermission
- Glue: CreateCrawler, CreateJob
- SageMaker: CreateModel, CreateEndpoint
- DynamoDB: CreateTable
You may use AdministratorAccess if working in a sandbox environment.
Terraform expects you to pre-upload: You must upload:schema_validator.zip
inference_lambda.zip
lambda_layer.zip
to an S3 deployment bucket you control.
Example:
s3://my-lambda-code-bucket/schema_validator.zip
s3://my-lambda-code-bucket/inference_lambda.zip
s3://my-lambda-code-bucket/lambda_layer.zip
s3://my-glue-script-bucket/etl/etl_transform.py
model.tar.gz (must contain XGBoost model + metadata)
Example:
s3://my-model-artifacts/fraud/model.tar.gz
Or train the model in SageMaker using provided JupyterNotebook file name: FraudDetectionXGB.ipynb
Terraform uses variables defined in variables.tf. You must create a new file:
terraform.tfvars
Example Template:
aws_region = "us-east-1"
landing_bucket = "my-landing-bucket"
processed_bucket = "my-processed-bucket"
curated_bucket = "my-curated-bucket"
lambda_code_bucket = "my-lambda-code-bucket"
schema_lambda_s3_key = "schema_validator.zip"
inference_lambda_s3_key = "inference_lambda.zip"
glue_script_bucket = "my-glue-script-bucket"
glue_script_key = "etl/etl_transform.py"
model_artifact_bucket = "my-model-artifacts"
model_artifact_key = "fraud/model.tar.gz"
dynamodb_table_name = "fraud-detection-results"All values must be updated to your environment.
Run inside the terraform/ directory:
terraform init
This:
- download the AWS provider
- sets up the Terraform working directory
terraform validateif you see:
Success! The configuration is valid.
you may continue.
terraform plan
You should see resources such as:
- 3 S3 buckets
- 2 Lambda functions
- 1 Glue crawler
- 1 Glue job
- 1 SageMaker model + endpoint
- DynamoDB table
- IAM roles
Verify everything looks correct.
terraform apply
confirm with:
yes
Terraform will now provision the entire pipeline.
Expected deployment time:
- IAM roles: immediate
- S3 buckets: immediate
- DynamoDB table: immediate
- Lambda functions: < 1 min
- Glue job & crawler: < 30 sec
- SageMaker endpoint: 6–10 minutes
Once finished, Terraform will output:
Apply complete! Resources: XX added, 0 changed, 0 destroyed.
s3://landing-bucket/input.csv
Lambda should automatically trigger and:
- Validate schema
- Write processed file to processed-bucket/
Within an hour (based on your schedule):
- Glue Crawler updates schema
- Glue Job transforms and outputs curated data The final curated file triggers the inference Lambda:
- The lambda calls the SageMaker fraud model
- It writes the final record to DynamoDB
If you update Lambda Python code:
- ZIP it again
- Upload new ZIP to the same S3 path
- Run:
terraform apply
Terraform will detect that the file was updated in S3 and update the Lambda function.
To clean up all AWS resources:terraform destroy
confirm with:
yes
This deletes:
- Buckets
- Lambdas
- IAM roles
- Glue jobs
- SageMaker endpoint
- DynamoDB table
Member list:
- 66102010179
- 66102010250
- 66102010251
- 66102010252