AWS Fraud Detection Data Pipeline (Terraform Deployment)

Overview

This repository contains Infrastructure-as-Code (IaC) definitions written in Terraform to deploy a fully automated, end-to-end data pipeline for fraud detection. The pipeline ingests raw data into S3, validates schema, processes and transforms data using AWS Glue, triggers ML predictions using Amazon SageMaker, and stores enriched fraud-classified records into DynamoDB.

The Terraform configuration provisions all required AWS resources, including:

Amazon S3 (Landing, Processed, Curated buckets)
AWS Lambda for schema validation and model inferenc
AWS Glue Crawler + ETL Job
Amazon SageMaker Model, Endpoint Configuration, Endpoint
AWS DynamoDB final results table
IAM Roles & Policies for Glue, Lambda, and SageMaker

Terraform does not deploy code scripts (Python files, Glue scripts, etc.). Instead, users must upload these ZIP/script files to S3 beforehand and supply their S3 paths in terraform.tfvars.

1. Repository Structure

terraform/
│ main.tf
│ variables.tf
│ outputs.tf
│ terraform.tfvars (user-created)
│
├── s3.tf                # Landing / Processed / Curated S3 buckets
├── iam.tf               # IAM roles + policies
├── lambda.tf            # Lambda functions + S3 triggers
├── glue.tf              # Glue crawler + ETL job
├── dynamodb.tf          # Final DynamoDB table
└── sagemaker.tf         # SageMaker model + endpoint

2. Prerequisites

2.1 Required Tools

You must install the following before deploying: Terraform 1.5+ AWS CLI v2+ Python 3.12(optional) For creating Lambda ZIPs manually ZIP utility To compress Lambda code for upload

2.2 AWS Permissions Needed

The AWS account or IAM user running Terraform must have:

S3: CreateBucket, PutBucketPolicy, PutObject
IAM: CreateRole, AttachRolePolicy, PassRole
Lambda: CreateFunction, AddPermission
Glue: CreateCrawler, CreateJob
SageMaker: CreateModel, CreateEndpoint
DynamoDB: CreateTable

You may use AdministratorAccess if working in a sandbox environment.

3. Upload Required Artifacts Before Deployment

Terraform expects you to pre-upload:

3.1 Lambda Code ZIPs

You must upload:

schema_validator.zip
inference_lambda.zip
lambda_layer.zip

to an S3 deployment bucket you control.

Example:
s3://my-lambda-code-bucket/schema_validator.zip
s3://my-lambda-code-bucket/inference_lambda.zip
s3://my-lambda-code-bucket/lambda_layer.zip

3.2 Glue Script

Upload your ETL script (e.g., etl_transform.py) to S3:

s3://my-glue-script-bucket/etl/etl_transform.py

3.3 SageMaker Model Artifact

Upload your trained model:

model.tar.gz (must contain XGBoost model + metadata)

Example:
s3://my-model-artifacts/fraud/model.tar.gz

Or train the model in SageMaker using provided JupyterNotebook file name: FraudDetectionXGB.ipynb

4. Terraform Configuration

4.1 Create terraform.tfvars

Terraform uses variables defined in variables.tf. You must create a new file:

terraform.tfvars

Example Template:

aws_region = "us-east-1"

landing_bucket     = "my-landing-bucket"
processed_bucket   = "my-processed-bucket"
curated_bucket     = "my-curated-bucket"

lambda_code_bucket      = "my-lambda-code-bucket"
schema_lambda_s3_key    = "schema_validator.zip"
inference_lambda_s3_key = "inference_lambda.zip"

glue_script_bucket = "my-glue-script-bucket"
glue_script_key    = "etl/etl_transform.py"

model_artifact_bucket = "my-model-artifacts"
model_artifact_key    = "fraud/model.tar.gz"

dynamodb_table_name = "fraud-detection-results"

All values must be updated to your environment.

5. Deployment Instructions

Follow these steps exactly.

Step 1 — Initialize Terraform

Run inside the terraform/ directory:

terraform init

This:

download the AWS provider
sets up the Terraform working directory

Step 2 Validate Configuration

terraform validate

if you see:

Success! The configuration is valid.

you may continue.

Step 3 — Review Deployment Plan

terraform plan

You should see resources such as:

3 S3 buckets
2 Lambda functions
1 Glue crawler
1 Glue job
1 SageMaker model + endpoint
DynamoDB table
IAM roles

Verify everything looks correct.

Step 4 — Deploy

terraform apply

confirm with:

yes

Terraform will now provision the entire pipeline.

Expected deployment time:

IAM roles: immediate
S3 buckets: immediate
DynamoDB table: immediate
Lambda functions: < 1 min
Glue job & crawler: < 30 sec
SageMaker endpoint: 6–10 minutes

Once finished, Terraform will output:

Apply complete! Resources: XX added, 0 changed, 0 destroyed.

6. Post-Deployment Verification

6.1 Upload a test file to the Landing Bucket

Upload a CSV file:

s3://landing-bucket/input.csv

Lambda should automatically trigger and:

Validate schema
Write processed file to processed-bucket/

Within an hour (based on your schedule):

Glue Crawler updates schema
Glue Job transforms and outputs curated data The final curated file triggers the inference Lambda:
The lambda calls the SageMaker fraud model
It writes the final record to DynamoDB

7. Redeploying Updated Code

Terraform manages infrastructure only, not code.

If you update Lambda Python code:

ZIP it again
Upload new ZIP to the same S3 path
Run:

terraform apply

Terraform will detect that the file was updated in S3 and update the Lambda function.

8. Destroying All Resources

To clean up all AWS resources:

terraform destroy

confirm with:

yes

This deletes:

Buckets
Lambdas
IAM roles
Glue jobs
SageMaker endpoint
DynamoDB table

Member list:

66102010179
66102010250
66102010251
66102010252

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
lambda_layer		lambda_layer
samples		samples
src		src
terraform		terraform
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWS Fraud Detection Data Pipeline (Terraform Deployment)

Overview

1. Repository Structure

2. Prerequisites

2.1 Required Tools

2.2 AWS Permissions Needed

3. Upload Required Artifacts Before Deployment

3.1 Lambda Code ZIPs

3.2 Glue Script

3.3 SageMaker Model Artifact

4. Terraform Configuration

4.1 Create terraform.tfvars

5. Deployment Instructions

Follow these steps exactly.

Step 1 — Initialize Terraform

Step 2 Validate Configuration

Step 3 — Review Deployment Plan

Step 4 — Deploy

6. Post-Deployment Verification

6.1 Upload a test file to the Landing Bucket

7. Redeploying Updated Code

8. Destroying All Resources

About

Uh oh!

Releases

Packages

Languages

License

tawusap/DE486_Data_Engineering_Project

Folders and files

Latest commit

History

Repository files navigation

AWS Fraud Detection Data Pipeline (Terraform Deployment)

Overview

1. Repository Structure

2. Prerequisites

2.1 Required Tools

2.2 AWS Permissions Needed

3. Upload Required Artifacts Before Deployment

3.1 Lambda Code ZIPs

3.2 Glue Script

3.3 SageMaker Model Artifact

4. Terraform Configuration

4.1 Create terraform.tfvars

5. Deployment Instructions

Follow these steps exactly.

Step 1 — Initialize Terraform

Step 2 Validate Configuration

Step 3 — Review Deployment Plan

Step 4 — Deploy

6. Post-Deployment Verification

6.1 Upload a test file to the Landing Bucket

7. Redeploying Updated Code

8. Destroying All Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages