-
Notifications
You must be signed in to change notification settings - Fork 3
feat: implement nested zips #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ajaxbits
merged 10 commits into
BrandonLWhite:main
from
ajaxbits:INFRA-6259-implement-nested-zips
May 2, 2025
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
5399784
feat: implement reproducible_zip
ajaxbits 2d595d6
refactor: use reproducible functions for dependency zip
ajaxbits 12047f1
feat: use reproducible methods for nested zips
ajaxbits ad3575e
refactor: set up test data harness
ajaxbits ea109de
feat: handle source_date_epoch edge case
ajaxbits d60029d
feat: add tests for source_date_epoch
ajaxbits 1686d54
refactor: parameterize tests
ajaxbits 22f701b
chore: improve pytest output and settings
ajaxbits 4c48e0d
chore: clean up README
ajaxbits 076d0a9
chore: update readme with zip info
ajaxbits File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,47 +1,47 @@ | ||
| # package-python-function | ||
| Python command-line (CLI) tool to package a Python function for deploying to AWS Lambda, and possibly other | ||
| cloud platforms. | ||
| Python command-line (CLI) tool to package a Python function for deploying to AWS Lambda, and possibly other cloud platforms. | ||
|
|
||
| This tool builds a ZIP file from a virtual environment with all depedencies installed that are to be included in the final deployment asset. If the content is larger than AWS Lambda's maximum unzipped package size of 250 MiB, | ||
| then this tool will employ the ZIP-inside-ZIP (nested-ZIP) workaround. This allows deploying Lambdas with large | ||
| dependency packages, especially those with native code compiled extensions like Pandas, PyArrow, etc. | ||
| This tool builds a ZIP file from a virtual environment with all dependencies installed that are to be included in the final deployment asset. If the content is larger than AWS Lambda's maximum unzipped package size of 250 MiB, This tool will then employ the ZIP-inside-ZIP (nested-ZIP) workaround. This allows deploying Lambdas with large dependency packages, especially those with native code compiled extensions like Pandas, PyArrow, etc. The ZIP files are generated [reproducibly](#a-note-on-reproducability), ensuring that the same source will always generate a ZIP file with the same hash. | ||
|
|
||
| This technique was originally pioneered by [serverless-python-requirements](https://github.com/serverless/serverless-python-requirements), which is a NodeJS (JavaScript) plugin for the [Serverless Framework](https://github.com/serverless/serverless). The technique has been improved here to not require any special imports in your entrypoint source file. That is, no changes are needed to your source code to leverage the nested ZIP deployment. | ||
| This technique was originally pioneered by [serverless-python-requirements](https://github.com/serverless/serverless-python-requirements), which is a NodeJS (JavaScript) plugin for the [Serverless Framework](https://github.com/serverless/serverless). The technique has been improved here to not require any special imports in your entrypoint source file. That is, no changes are needed to your source code to leverage the nested ZIP deployment. | ||
|
|
||
| The motivation for this Python tool is to achieve the same results as serverless-python-requirements but with a | ||
| purely Python tool. This can simplify and speed up developer and CI/CD workflows. | ||
| The motivation for this Python tool is to achieve the same results as [serverless-python-requirements](https://www.serverless.com/plugins/serverless-python-requirements) but with a purely Python tool. This can simplify and speed up developer and CI/CD workflows. | ||
|
|
||
| One important thing that this tool does not do is build the target virtual environment and install all of the | ||
| dependencies. You must first generate that with a tool like [Poetry](https://github.com/python-poetry/poetry) and the [poetry-plugin-bundle](https://github.com/python-poetry/poetry-plugin-bundle). | ||
| One important thing that this tool does not do is build the target virtual environment and install all of the dependencies. You must first generate that with a tool like [Poetry](https://github.com/python-poetry/poetry) and the [poetry-plugin-bundle](https://github.com/python-poetry/poetry-plugin-bundle). | ||
|
|
||
| ## Example command sequence | ||
| ``` | ||
| ```shell | ||
| poetry bundle venv .build/.venv --without dev | ||
| package-python-function .build/.venv --output-dir .build/lambda | ||
| ``` | ||
|
|
||
| The output will be a .zip file with the same name as your project from your pyproject.toml file (with dashes replaced | ||
| The output will be a .zip file with the same name as your project from your `pyproject.toml` file (with dashes replaced | ||
| with underscores). | ||
|
|
||
| ## Installation | ||
| Use [pipx](https://github.com/pypa/pipx) to install: | ||
|
|
||
| ``` | ||
| ```shell | ||
| pipx install package-python-function | ||
| ``` | ||
|
|
||
| ## Usage / Arguments | ||
| `package-python-function venv_dir [--project PROJECT] [--output-dir OUTPUT_DIR] [--output OUTPUT]` | ||
|
|
||
| - `venv_dir` [Required]: The path to the virtual environment to package. | ||
| ```shell | ||
| package-python-function venv_dir [--project PROJECT] [--output-dir OUTPUT_DIR] [--output OUTPUT] | ||
| ``` | ||
|
|
||
| - `--project` [Optional]: Path to the pyproject.toml file. Omit to use the pyproject.toml file in the current working directory. | ||
| - `venv_dir` [Required]: The path to the virtual environment to package. | ||
| - `--project` [Optional]: Path to the `pyproject.toml` file. Omit to use the `pyproject.toml` file in the current working directory. | ||
|
|
||
| One of the following must be specified: | ||
| - `--output`: The full output path of the final zip file. | ||
| - `--output-dir`: The output directory for the final zip file. The name of the zip file will be based on the project's | ||
| name in the `pyproject.toml` file (with dashes replaced with underscores). | ||
|
|
||
| - `--output-dir`: The output directory for the final zip file. The name of the zip file will be based on the project's | ||
| name in the pyproject.toml file (with dashes replaced with underscores). | ||
| ## A Note on Reproducibility | ||
|
|
||
| The ZIP files generated adhere with [reproducible builds](https://reproducible-builds.org/docs/archives/). This means that file permissions and timestamps are modified inside the ZIP, such that the ZIP will have a deterministic hash. By default, the date is set to `1980-01-01`. | ||
|
|
||
| Additionally, the tool respects the standardized `$SOURCE_DATE_EPOCH` [environment variable](https://reproducible-builds.org/docs/source-date-epoch/), which will allow you to set that date as needed. | ||
|
|
||
| One important caveat is that ZIP files do not support files with timestamps earlier than `1980-01-01` inside them, due to MS-DOS compatibility. Therefore, the tool will throw a `SourceDateEpochError` is `$SOURCE_DATE_EPOCH` is below `315532800`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| from __future__ import annotations | ||
|
|
||
| import os | ||
| import shutil | ||
| import time | ||
| import zipfile | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| if TYPE_CHECKING: | ||
| from os import PathLike | ||
| from pathlib import Path | ||
| from typing import Optional, Tuple, Union | ||
|
|
||
| DEFAULT_DATE_TIME = (1980, 1, 1, 0, 0, 0) | ||
| DEFAULT_DIR_MODE = 0o755 | ||
| DEFAULT_FILE_MODE = 0o644 | ||
|
|
||
| class SourceDateEpochError(Exception): | ||
| """Raise when there are issues with $SOURCE_DATE_EPOCH""" | ||
|
|
||
| def date_time() -> Tuple[int, int, int, int, int, int]: | ||
| """Returns date_time value used to force overwrite on all ZipInfo objects. Defaults to | ||
| 1980-01-01 00:00:00. You can set this with the environment variable SOURCE_DATE_EPOCH as an | ||
| integer value representing seconds since Epoch. | ||
| """ | ||
| source_date_epoch = os.environ.get("SOURCE_DATE_EPOCH", None) | ||
| if source_date_epoch is not None: | ||
| dt = time.gmtime(int(source_date_epoch))[:6] | ||
| if dt[0] < 1980: | ||
| raise SourceDateEpochError( | ||
| "$SOURCE_DATE_EPOCH must be >= 315532800, since ZIP files need MS-DOS date/time format, which can be 1/1/1980, at minimum." | ||
| ) | ||
| return dt | ||
| return DEFAULT_DATE_TIME | ||
|
|
||
| class ZipFile(zipfile.ZipFile): | ||
| def write_reproducibly( | ||
| self, | ||
| filename: PathLike, | ||
| arcname: Optional[Union[Path, str]] = None, | ||
| compress_type: Optional[int] = None, | ||
| compresslevel: Optional[int] = None, | ||
| ): | ||
| if not self.fp: | ||
| raise ValueError("Attempt to write to ZIP archive that was already closed") | ||
| if self._writing: | ||
| raise ValueError("Can't write to ZIP archive while an open writing handle exists") | ||
|
|
||
| zinfo = zipfile.ZipInfo.from_file(filename, arcname, strict_timestamps=self._strict_timestamps) | ||
| zinfo.date_time = date_time() | ||
| if zinfo.is_dir(): | ||
| zinfo.external_attr = (0o40000 | DEFAULT_DIR_MODE) << 16 | ||
| zinfo.external_attr |= 0x10 # MS-DOS directory flag | ||
| else: | ||
| zinfo.external_attr = DEFAULT_FILE_MODE << 16 | ||
|
|
||
| if zinfo.is_dir(): | ||
| zinfo.compress_size = 0 | ||
| zinfo.CRC = 0 | ||
| self.mkdir(zinfo) | ||
| else: | ||
| if compress_type is not None: | ||
| zinfo.compress_type = compress_type | ||
| else: | ||
| zinfo.compress_type = self.compression | ||
|
|
||
| if compresslevel is not None: | ||
| zinfo._compresslevel = compresslevel | ||
| else: | ||
| zinfo._compresslevel = self.compresslevel | ||
|
|
||
| with open(filename, "rb") as src, self.open(zinfo, "w") as dest: | ||
| shutil.copyfileobj(src, dest, 1024 * 8) | ||
|
|
||
| def writestr_reproducibly( | ||
| self, | ||
| zinfo_or_arcname: Union[str, zipfile.ZipInfo], | ||
| data: Union[str, bytes], | ||
| compress_type: Optional[int] = None, | ||
| compresslevel: Optional[int] = None, | ||
| ): | ||
| if isinstance(data, str): | ||
| data = data.encode("utf-8") | ||
|
|
||
| if not isinstance(zinfo_or_arcname, zipfile.ZipInfo): | ||
| zinfo = zipfile.ZipInfo(filename=zinfo_or_arcname, date_time=date_time()) | ||
| zinfo.compress_type = self.compression | ||
| zinfo._compresslevel = self.compresslevel | ||
| if zinfo.is_dir(): | ||
| zinfo.external_attr = (0o40000 | DEFAULT_DIR_MODE) << 16 | ||
| zinfo.external_attr |= 0x10 # MS-DOS directory flag | ||
| else: | ||
| zinfo.external_attr = DEFAULT_FILE_MODE << 16 | ||
| else: | ||
| zinfo = zinfo_or_arcname | ||
|
|
||
| zinfo.file_size = len(data) | ||
| if compress_type is not None: | ||
| zinfo.compress_type = compress_type | ||
|
|
||
| if compresslevel is not None: | ||
| zinfo._compresslevel = compresslevel | ||
|
|
||
| if not self.fp: | ||
| raise ValueError("Attempt to write to ZIP archive that was already closed") | ||
| if self._writing: | ||
| raise ValueError("Can't write to ZIP archive while an open writing handle exists.") | ||
|
|
||
| with self._lock: | ||
| with self.open(zinfo, mode="w") as dest: | ||
| dest.write(data) |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holy moly I didn't know you could do this. TIL about PEP-515. These days it is very rare for me to learn something new while reviewing a PR. Thanks!