GitHub Repository Archive Script

A Python utility used to archive old, unused GitHub repositories from an organisation.

Prerequisites

A Docker Daemon (Colima is recommended)
- Colima
Terraform (For deployment)
- Terraform
Python >3.12
- Python
Make
- GNU make

Makefile

This repository makes use of a Makefile to execute common commands. To view all commands, execute make all.

make all

Documentation

This project uses MkDocs for documentation. The documentation is located in the docs directory. To view the documentation locally, you can run the following commands:

Install MkDocs and its dependencies:
```
make install-docs
```
Serve the documentation locally:
```
mkdocs serve
```
Open your web browser and navigate to http://localhost:8000.

Development

To work on this project, you need to:

Create a virtual environment and activate it.

Create:
```
python3 -m venv venv
```
Activate:
```
source venv/bin/activate
```
Install dependencies

Production dependencies only:
```
make install
```
Dependencies including dev dependencies (used for Linting and Testing)
```
make install-dev
```

To run the project during development, we recommend you run the project outside of a container

Running the Project

Containerised (Recommended)

To run the project, a Docker Daemon is required to containerise and execute the project. We recommend using Colima.

Before the doing the following, make sure your Daemon is running. If using Colima, run colima start to check this.

Containerise the project.

docker build -t github-repository-archive-script .

Check the image exists (Optional).

docker images

Example Output:

REPOSITORY                         TAG       IMAGE ID       CREATED          SIZE
github-repository-archive-script   latest    b4a1e32ce51b   12 minutes ago   840MB

Run the image.

docker run --platform linux/amd64 -p 9000:8080 \
-e AWS_ACCESS_KEY_ID=<access_key_id> \
-e AWS_SECRET_ACCESS_KEY=<secret_access_key> \
-e AWS_DEFAULT_REGION=<region> \
-e AWS_SECRET_NAME=<secret_name> \
-e GITHUB_ORG=<org> \
-e GITHUB_APP_CLIENT_ID=<client_id> \
-e S3_BUCKET_NAME=<bucket_name>\
-e AWS_LAMBDA_FUNCTION_TIMEOUT=300
github-repository-archive-script

When running the container, you are required to pass some environment variables:

Variable	Description
GITHUB_ORG	The organisation you would like to run the tool in.
GITHUB_APP_CLIENT_ID	The Client ID for the GitHub App which the tool uses to authenticate with the GitHub API.
AWS_DEFAULT_REGION	The AWS Region which the Secret Manager Secret is in.
AWS_SECRET_NAME	The name of the AWS Secret Manager Secret to get.
AWS_BUCKET_NAME	The name of the S3 bucket which has the cloud config in (Only used when `use_local_config=False`).
AWS_LAMBDA_FUNCTION_TIMEOUT	The timeout time in seconds (Default: 300s / 5 minutes).

Once the container is running, a local endpoint is created at localhost:9000/2015-03-31/functions/function/invocations.

Check the container is running (Optional).

docker ps

Example Output:

CONTAINER ID   IMAGE                              COMMAND                  CREATED         STATUS         PORTS                                       NAMES
ca890d30e24d   github-repository-archive-script   "/lambda-entrypoint.…"   5 seconds ago   Up 4 seconds   0.0.0.0:9000->8080/tcp, :::9000->8080/tcp   recursing_bartik

Post to the endpoint (localhost:9000/2015-03-31/functions/function/invocations).
```
curl "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'
```
This will run the Lambda function and, once complete, will return a success message.
After testing stop the container.
```
docker stop <container_id>
```

Outside of a Container (Development only)

To run the Lambda function outside of a container, we need to execute the handler() function.

Uncomment the following at the bottom of main.py.
```
...
# if __name__ == "__main__":
#     handler(None, None)
...
```
Please Note: If uncommenting the above in main.py, make sure you re-comment the code before pushing back to GitHub.

Export the required environment variables:

export AWS_ACCESS_KEY_ID=<access_key_id>
export AWS_SECRET_ACCESS_KEY=<secret_access_key>
export AWS_DEFAULT_REGION=eu-west-2
export AWS_SECRET_NAME=<secret_name>
export S3_BUCKET_NAME=<bucket_name>
export GITHUB_ORG=<org>
export GITHUB_APP_CLIENT_ID=<client_id>

An explanation of each variable is available within the containerised instructions.

Run the script.
```
python3 src/main.py
```

Deployment

Overview

This repository is designed to be hosted on AWS Lambda using a container image as the Lambda's definition.

There are 2 parts to deployment:

Updating the ECR Image.
Updating the Lambda.

Deployment Prerequisites

Before following the instructions below, we assume that:

An ECR repository exists on AWS that aligns with the Lambda's naming convention, {env_name}-{lambda_name} (these can be set within the .tfvars file. See example_tfvars.txt).
The AWS account contains underlying infrastructure to deploy on top of. This infrastructure is defined within sdp-infrastructure on GitHub.
An AWS IAM user has been setup with appropriate permissions.

Additionally, we recommend that you keep the container versioning in sync with GitHub releases. Internal documentation for this is available on Confluence (GitHub Releases and AWS ECR Versions). We follow Semantic Versioning (Learn More).

Storing the Container on AWS Elastic Container Registry (ECR)

When changes are made to the repository's source code, the code must be containerised and pushed to AWS for the lambda to use.

The following instructions deploy to an ECR repository called sdp-dev-repository-archive-script. Please change this to <env_name>-<lambda_name> based on your AWS instance.

All of the commands (steps 2-5) are available for your environment within the AWS GUI. Navigate to ECR > {repository_name} > View push commands.

Export AWS credential into the environment. This makes it easier to ensure you are using the correct credentials.
```
export AWS_ACCESS_KEY_ID="<aws_access_key_id>"
export AWS_SECRET_ACCESS_KEY="<aws_secret_access_key>"
```

Login to AWS.

aws ecr get-login-password --region eu-west-2 | docker login --username AWS --password-stdin <aws_account_id>.dkr.ecr.eu-west-2.amazonaws.com

Ensuring you're at the root of the repository, build a docker image of the project.
```
docker build -t sdp-dev-github-repository-archive-script .
```
Please Note: Change sdp-dev-github-repository-archive-script within the above command to <env_name>-<lambda_name>.
Tag the docker image to push to AWS, using the correct versioning mentioned in prerequisites.
```
docker tag sdp-dev-github-repository-archive-script:latest <aws_account_id>.dkr.ecr.eu-west-2.amazonaws.com/sdp-dev-github-repository-archive-script:<semantic_version>
```
Please Note: Change sdp-dev-github-repository-archive-script within the above command to <env_name>-<lambda_name>.

Push the image to ECR.

docker push <aws_account_id>.dkr.ecr.eu-west-2.amazonaws.com/sdp-dev-github-repository-archive-script:<semantic_version>

Once pushed, you should be able to see your new image version within the ECR repository.

Deploying the Lambda

Once AWS ECR has the new container image, we need to update the Lambda's configuration to use it. To do this, use the repository's provided Terraform.

Within the terraform directory, there is a service subdirectory which contains the terraform to setup the lambda on AWS.

Change directory to the service terraform.
```
cd terraform/service
```
Fill out the appropriate environment variables file
- env/dev/dev.tfvars for sdp-dev.
- env/prod/prod.tfvars for sdp-prod.
These files can be created based on example_tfvars.txt.

It is crucial that the completed .tfvars file does not get committed to GitHub.
Initialise the terraform using the appropriate .tfbackend file for the environment (env/dev/backend-dev.tfbackend or env/prod/backend-prod.tfbackend).
```
terraform init -backend-config=env/dev/backend-dev.tfbackend -reconfigure
```
Please Note: This step requires an AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to be loaded into the environment if not already in place. This can be done using:
```
export AWS_ACCESS_KEY_ID="<aws_access_key_id>"
export AWS_SECRET_ACCESS_KEY="<aws_secret_access_key>"
```
Refresh the local state to ensure it is in sync with the backend, using the appropriate .tfvars file for the environment (env/dev/dev.tfvars or env/prod/prod.tfvars).
```
terraform refresh -var-file=env/dev/dev.tfvars
```
Plan the changes, using the appropriate .tfvars file.

i.e. for dev use
```
terraform plan -var-file=env/dev/dev.tfvars
```
Apply the changes, using the appropriate .tfvars file.

i.e. for dev use
```
terraform apply -var-file=env/dev/dev.tfvars
```

Once applied successfully, the Lambda and EventBridge Schedule will be created.

Destroying / Removing the Lambda

To delete the service resources, run the following:

cd terraform/service
terraform init -backend-config=env/dev/backend-dev.tfbackend -reconfigure
terraform refresh -var-file=env/dev/dev.tfvars
terraform destroy -var-file=env/dev/dev.tfvars

Please Note: Make sure to use the correct .tfbackend and .tfvars files for your environment.

Deployments with Concourse

Allowlisting your IP

To setup the deployment pipeline with concourse, you must first allowlist your IP address on the Concourse server. IP addresses are flushed everyday at 00:00 so this must be done at the beginning of every working day whenever the deployment pipeline needs to be used. Follow the instructions on the Confluence page (SDP Homepage > SDP Concourse > Concourse Login) to login. All our pipelines run on sdp-pipeline-prod, whereas sdp-pipeline-dev is the account used for changes to Concourse instance itself. Make sure to export all necessary environment variables from sdp-pipeline-prod (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN).

Setting up a pipeline

When setting up our pipelines, we use ecs-infra-user on sdp-dev to be able to interact with our infrastructure on AWS. The credentials for this are stored on AWS Secrets Manager so you do not need to set up anything yourself.

To set the pipeline, run the following script:

chmod u+x ./concourse/scripts/set_pipeline.sh
./concourse/scripts/set_pipeline.sh github-repo-archive-script

Note that you only have to run chmod the first time running the script in order to give permissions. This script will set the branch and pipeline name to whatever branch you are currently on. It will also set the image tag on ECR to the current commit hash at the time of setting the pipeline.

The pipeline name itself will usually follow a pattern as follows: <repo-name>-<branch-name> If you wish to set a pipeline for another branch without checking out, you can run the following:

./concourse/scripts/set_pipeline.sh github-repo-archive-script <branch_name>

If the branch you are deploying is "main" or "master", it will trigger a deployment to the sdp-prod environment. To set the ECR image tag, you must draft a Github release pointing to the latest release of the main/master branch that has a tag in the form of vX.Y.Z. Drafting up a release will automatically deploy the latest version of the main/master branch with the associated release tag, but you can also manually trigger a build through the Concourse UI or the terminal prompt.

Triggering a pipeline

Once the pipeline has been set, you can manually trigger a build on the Concourse UI, or run the following command:

fly -t aws-sdp trigger-job -j github-repo-archive-script-<branch-name>/build-and-push

Linting and Testing

GitHub Actions

This file contains 2 GitHub Actions to automatically lint and test code on pull request creation and pushing to the main branch.

Running Tests Locally

To lint and test locally, you need to:

Install dev dependencies
```
make install-dev
```
Run all the linters
```
make lint
```
Run all the tests
```
make test
```
Run Megalinter
```
make megalint
```

Please Note: This requires a docker daemon to be running. We recommend using Colima if using MacOS or Linux. A docker daemon is required because Megalinter is ran from a docker image.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.github		.github
concourse		concourse
config		config
docs		docs
src		src
terraform/service		terraform/service
tests		tests
.checkov.yml		.checkov.yml
.coveragerc		.coveragerc
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.mega-linter.yml		.mega-linter.yml
.pylintrc		.pylintrc
.python-version		.python-version
.trivyignore		.trivyignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
kics.config		kics.config
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GitHub Repository Archive Script

Table of Contents

Prerequisites

Makefile

Documentation

Development

Running the Project

Containerised (Recommended)

Outside of a Container (Development only)

Deployment

Overview

Deployment Prerequisites

Storing the Container on AWS Elastic Container Registry (ECR)

Deploying the Lambda

Destroying / Removing the Lambda

Deployments with Concourse

Allowlisting your IP

Setting up a pipeline

Triggering a pipeline

Linting and Testing

GitHub Actions

Running Tests Locally

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

ONS-Innovation/github-repository-archive-script

Folders and files

Latest commit

History

Repository files navigation

GitHub Repository Archive Script

Table of Contents

Prerequisites

Makefile

Documentation

Development

Running the Project

Containerised (Recommended)

Outside of a Container (Development only)

Deployment

Overview

Deployment Prerequisites

Storing the Container on AWS Elastic Container Registry (ECR)

Deploying the Lambda

Destroying / Removing the Lambda

Deployments with Concourse

Allowlisting your IP

Setting up a pipeline

Triggering a pipeline

Linting and Testing

GitHub Actions

Running Tests Locally

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages