Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 19 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,47 @@
# package-python-function
Python command-line (CLI) tool to package a Python function for deploying to AWS Lambda, and possibly other
cloud platforms.
Python command-line (CLI) tool to package a Python function for deploying to AWS Lambda, and possibly other cloud platforms.

This tool builds a ZIP file from a virtual environment with all depedencies installed that are to be included in the final deployment asset. If the content is larger than AWS Lambda's maximum unzipped package size of 250 MiB,
then this tool will employ the ZIP-inside-ZIP (nested-ZIP) workaround. This allows deploying Lambdas with large
dependency packages, especially those with native code compiled extensions like Pandas, PyArrow, etc.
This tool builds a ZIP file from a virtual environment with all dependencies installed that are to be included in the final deployment asset. If the content is larger than AWS Lambda's maximum unzipped package size of 250 MiB, This tool will then employ the ZIP-inside-ZIP (nested-ZIP) workaround. This allows deploying Lambdas with large dependency packages, especially those with native code compiled extensions like Pandas, PyArrow, etc. The ZIP files are generated [reproducibly](#a-note-on-reproducability), ensuring that the same source will always generate a ZIP file with the same hash.

This technique was originally pioneered by [serverless-python-requirements](https://github.com/serverless/serverless-python-requirements), which is a NodeJS (JavaScript) plugin for the [Serverless Framework](https://github.com/serverless/serverless). The technique has been improved here to not require any special imports in your entrypoint source file. That is, no changes are needed to your source code to leverage the nested ZIP deployment.
This technique was originally pioneered by [serverless-python-requirements](https://github.com/serverless/serverless-python-requirements), which is a NodeJS (JavaScript) plugin for the [Serverless Framework](https://github.com/serverless/serverless). The technique has been improved here to not require any special imports in your entrypoint source file. That is, no changes are needed to your source code to leverage the nested ZIP deployment.

The motivation for this Python tool is to achieve the same results as serverless-python-requirements but with a
purely Python tool. This can simplify and speed up developer and CI/CD workflows.
The motivation for this Python tool is to achieve the same results as [serverless-python-requirements](https://www.serverless.com/plugins/serverless-python-requirements) but with a purely Python tool. This can simplify and speed up developer and CI/CD workflows.

One important thing that this tool does not do is build the target virtual environment and install all of the
dependencies. You must first generate that with a tool like [Poetry](https://github.com/python-poetry/poetry) and the [poetry-plugin-bundle](https://github.com/python-poetry/poetry-plugin-bundle).
One important thing that this tool does not do is build the target virtual environment and install all of the dependencies. You must first generate that with a tool like [Poetry](https://github.com/python-poetry/poetry) and the [poetry-plugin-bundle](https://github.com/python-poetry/poetry-plugin-bundle).

## Example command sequence
```
```shell
poetry bundle venv .build/.venv --without dev
package-python-function .build/.venv --output-dir .build/lambda
```

The output will be a .zip file with the same name as your project from your pyproject.toml file (with dashes replaced
The output will be a .zip file with the same name as your project from your `pyproject.toml` file (with dashes replaced
with underscores).

## Installation
Use [pipx](https://github.com/pypa/pipx) to install:

```
```shell
pipx install package-python-function
```

## Usage / Arguments
`package-python-function venv_dir [--project PROJECT] [--output-dir OUTPUT_DIR] [--output OUTPUT]`

- `venv_dir` [Required]: The path to the virtual environment to package.
```shell
package-python-function venv_dir [--project PROJECT] [--output-dir OUTPUT_DIR] [--output OUTPUT]
```

- `--project` [Optional]: Path to the pyproject.toml file. Omit to use the pyproject.toml file in the current working directory.
- `venv_dir` [Required]: The path to the virtual environment to package.
- `--project` [Optional]: Path to the `pyproject.toml` file. Omit to use the `pyproject.toml` file in the current working directory.

One of the following must be specified:
- `--output`: The full output path of the final zip file.
- `--output-dir`: The output directory for the final zip file. The name of the zip file will be based on the project's
name in the `pyproject.toml` file (with dashes replaced with underscores).

- `--output-dir`: The output directory for the final zip file. The name of the zip file will be based on the project's
name in the pyproject.toml file (with dashes replaced with underscores).
## A Note on Reproducibility

The ZIP files generated adhere with [reproducible builds](https://reproducible-builds.org/docs/archives/). This means that file permissions and timestamps are modified inside the ZIP, such that the ZIP will have a deterministic hash. By default, the date is set to `1980-01-01`.

Additionally, the tool respects the standardized `$SOURCE_DATE_EPOCH` [environment variable](https://reproducible-builds.org/docs/source-date-epoch/), which will allow you to set that date as needed.

One important caveat is that ZIP files do not support files with timestamps earlier than `1980-01-01` inside them, due to MS-DOS compatibility. Therefore, the tool will throw a `SourceDateEpochError` is `$SOURCE_DATE_EPOCH` is below `315532800`.
46 changes: 10 additions & 36 deletions package_python_function/packager.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,18 @@
from __future__ import annotations

import logging
import os
import shutil
import time
import zipfile
from pathlib import Path
from tempfile import NamedTemporaryFile
from typing import TYPE_CHECKING
from zipfile import ZIP_DEFLATED, ZIP_STORED

from .python_project import PythonProject

if TYPE_CHECKING:
from typing import Tuple
from .reproducible_zipfile import ZipFile

logger = logging.getLogger(__name__)

class Packager:
AWS_LAMBDA_MAX_UNZIP_SIZE = 262144000
AWS_LAMBDA_MAX_UNZIP_SIZE = 262_144_000
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holy moly I didn't know you could do this. TIL about PEP-515. These days it is very rare for me to learn something new while reviewing a PR. Thanks!


def __init__(self, venv_path: Path, project_path: Path, output_dir: Path, output_file: Path | None):
self.project = PythonProject(project_path)
Expand Down Expand Up @@ -46,35 +41,14 @@ def package(self) -> None:
def zip_all_dependencies(self, target_path: Path) -> None:
logger.info(f"Zipping to {target_path}...")

def date_time() -> Tuple[int, int, int, int, int, int]:
"""Returns date_time value used to force overwrite on all ZipInfo objects. Defaults to
1980-01-01 00:00:00. You can set this with the environment variable SOURCE_DATE_EPOCH as an
integer value representing seconds since Epoch.
"""
source_date_epoch = os.environ.get("SOURCE_DATE_EPOCH", None)
if source_date_epoch is not None:
return time.gmtime(int(source_date_epoch))[:6]
return (1980, 1, 1, 0, 0, 0)

with zipfile.ZipFile(target_path, "w", zipfile.ZIP_DEFLATED) as zip_file:

with ZipFile(target_path, "w", ZIP_DEFLATED) as zip_file:
def zip_dir(path: Path) -> None:
for item in path.iterdir():
if item.is_dir():
zip_dir(item)
else:
zinfo = zipfile.ZipInfo.from_file(
item, item.relative_to(self.input_path)
)
zinfo.date_time = date_time()
zinfo.external_attr = 0o644 << 16
zinfo.compress_type = zipfile.ZIP_DEFLATED
self._uncompressed_bytes += item.stat().st_size
with (
open(item, "rb") as src,
zip_file.open(zinfo, "w") as dest,
):
shutil.copyfileobj(src, dest, 1024 * 8)
zip_file.write_reproducibly(item, item.relative_to(self.input_path))

zip_dir(self.input_path)

Expand All @@ -96,15 +70,15 @@ def zip_dir(path: Path) -> None:
def generate_nested_zip(self, inner_zip_path: Path) -> None:
logger.info(f"Generating nested-zip and __init__.py loader using entrypoint package '{self.project.entrypoint_package_name}'...")

with zipfile.ZipFile(self.output_file, 'w') as outer_zip_file:
with ZipFile(self.output_file, 'w') as outer_zip_file:
entrypoint_dir = Path(self.project.entrypoint_package_name)
outer_zip_file.write(
outer_zip_file.write_reproducibly(
inner_zip_path,
arcname=str(entrypoint_dir / ".dependencies.zip"),
compresslevel=zipfile.ZIP_STORED
compresslevel=ZIP_STORED
)
outer_zip_file.writestr(
outer_zip_file.writestr_reproducibly(
str(entrypoint_dir / "__init__.py"),
Path(__file__).parent.joinpath("nested_zip_loader.py").read_text(),
compresslevel=zipfile.ZIP_DEFLATED
compresslevel=ZIP_DEFLATED
)
111 changes: 111 additions & 0 deletions package_python_function/reproducible_zipfile.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
from __future__ import annotations

import os
import shutil
import time
import zipfile
from typing import TYPE_CHECKING

if TYPE_CHECKING:
from os import PathLike
from pathlib import Path
from typing import Optional, Tuple, Union

DEFAULT_DATE_TIME = (1980, 1, 1, 0, 0, 0)
DEFAULT_DIR_MODE = 0o755
DEFAULT_FILE_MODE = 0o644

class SourceDateEpochError(Exception):
"""Raise when there are issues with $SOURCE_DATE_EPOCH"""

def date_time() -> Tuple[int, int, int, int, int, int]:
"""Returns date_time value used to force overwrite on all ZipInfo objects. Defaults to
1980-01-01 00:00:00. You can set this with the environment variable SOURCE_DATE_EPOCH as an
integer value representing seconds since Epoch.
"""
source_date_epoch = os.environ.get("SOURCE_DATE_EPOCH", None)
if source_date_epoch is not None:
dt = time.gmtime(int(source_date_epoch))[:6]
if dt[0] < 1980:
raise SourceDateEpochError(
"$SOURCE_DATE_EPOCH must be >= 315532800, since ZIP files need MS-DOS date/time format, which can be 1/1/1980, at minimum."
)
return dt
return DEFAULT_DATE_TIME

class ZipFile(zipfile.ZipFile):
def write_reproducibly(
self,
filename: PathLike,
arcname: Optional[Union[Path, str]] = None,
compress_type: Optional[int] = None,
compresslevel: Optional[int] = None,
):
if not self.fp:
raise ValueError("Attempt to write to ZIP archive that was already closed")
if self._writing:
raise ValueError("Can't write to ZIP archive while an open writing handle exists")

zinfo = zipfile.ZipInfo.from_file(filename, arcname, strict_timestamps=self._strict_timestamps)
zinfo.date_time = date_time()
if zinfo.is_dir():
zinfo.external_attr = (0o40000 | DEFAULT_DIR_MODE) << 16
zinfo.external_attr |= 0x10 # MS-DOS directory flag
else:
zinfo.external_attr = DEFAULT_FILE_MODE << 16

if zinfo.is_dir():
zinfo.compress_size = 0
zinfo.CRC = 0
self.mkdir(zinfo)
else:
if compress_type is not None:
zinfo.compress_type = compress_type
else:
zinfo.compress_type = self.compression

if compresslevel is not None:
zinfo._compresslevel = compresslevel
else:
zinfo._compresslevel = self.compresslevel

with open(filename, "rb") as src, self.open(zinfo, "w") as dest:
shutil.copyfileobj(src, dest, 1024 * 8)

def writestr_reproducibly(
self,
zinfo_or_arcname: Union[str, zipfile.ZipInfo],
data: Union[str, bytes],
compress_type: Optional[int] = None,
compresslevel: Optional[int] = None,
):
if isinstance(data, str):
data = data.encode("utf-8")

if not isinstance(zinfo_or_arcname, zipfile.ZipInfo):
zinfo = zipfile.ZipInfo(filename=zinfo_or_arcname, date_time=date_time())
zinfo.compress_type = self.compression
zinfo._compresslevel = self.compresslevel
if zinfo.is_dir():
zinfo.external_attr = (0o40000 | DEFAULT_DIR_MODE) << 16
zinfo.external_attr |= 0x10 # MS-DOS directory flag
else:
zinfo.external_attr = DEFAULT_FILE_MODE << 16
else:
zinfo = zinfo_or_arcname

zinfo.file_size = len(data)
if compress_type is not None:
zinfo.compress_type = compress_type

if compresslevel is not None:
zinfo._compresslevel = compresslevel

if not self.fp:
raise ValueError("Attempt to write to ZIP archive that was already closed")
if self._writing:
raise ValueError("Can't write to ZIP archive while an open writing handle exists.")

with self._lock:
with self.open(zinfo, mode="w") as dest:
dest.write(data)
89 changes: 86 additions & 3 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading