Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 22 additions & 21 deletions API.md
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ Get public token information including uploads.
"allow_public_downloads": false,
"uploads": [
{
"id": 1,
"public_id": "rT72ZKGMPdldiEmA9eDI7kik",
"filename": "document.pdf",
"ext": "pdf",
"mimetype": "application/pdf",
Expand All @@ -328,8 +328,8 @@ Get public token information including uploads.
"status": "completed",
"created_at": "2025-12-23T12:00:00Z",
"completed_at": "2025-12-23T12:01:00Z",
"download_url": "http://localhost:8000/api/tokens/fbc_token/uploads/1",
"upload_url": "http://localhost:8000/api/uploads/1/tus"
"download_url": "http://localhost:8000/api/tokens/fbc_token/uploads/rT72ZKGMPdldiEmA9eDI7kik",
"upload_url": "http://localhost:8000/api/uploads/rT72ZKGMPdldiEmA9eDI7kik/tus"
}
]
}
Expand Down Expand Up @@ -364,7 +364,7 @@ List all uploads for a specific token.
```json
[
{
"id": 1,
"public_id": "rT72ZKGMPdldiEmA9eDI7kik",
"filename": "document.pdf",
"ext": "pdf",
"mimetype": "application/pdf",
Expand All @@ -375,8 +375,8 @@ List all uploads for a specific token.
"status": "completed",
"created_at": "2025-12-23T12:00:00Z",
"completed_at": "2025-12-23T12:01:00Z",
"download_url": "http://localhost:8000/api/tokens/fbc_token/uploads/1",
"upload_url": "http://localhost:8000/api/uploads/1/tus"
"download_url": "http://localhost:8000/api/tokens/fbc_token/uploads/rT72ZKGMPdldiEmA9eDI7kik",
"upload_url": "http://localhost:8000/api/uploads/rT72ZKGMPdldiEmA9eDI7kik/tus"
}
]
```
Expand All @@ -399,7 +399,7 @@ Get metadata information about a completed upload.
**Response (200):**
```json
{
"id": 1,
"public_id": "rT72ZKGMPdldiEmA9eDI7kik",
"filename": "document.pdf",
"ext": "pdf",
"mimetype": "application/pdf",
Expand All @@ -412,8 +412,8 @@ Get metadata information about a completed upload.
"status": "completed",
"created_at": "2025-01-01T12:00:00Z",
"completed_at": "2025-01-01T12:05:00Z",
"upload_url": "http://localhost:8000/api/uploads/1/tus",
"download_url": "http://localhost:8000/api/tokens/fbc_token/uploads/1/download"
"upload_url": "http://localhost:8000/api/uploads/rT72ZKGMPdldiEmA9eDI7kik/tus",
"download_url": "http://localhost:8000/api/tokens/fbc_token/uploads/rT72ZKGMPdldiEmA9eDI7kik/download"
}
```

Expand Down Expand Up @@ -476,9 +476,9 @@ Initiate a new file upload.
**Response (201):**
```json
{
"upload_id": 1,
"upload_url": "http://localhost:8000/api/uploads/1/tus",
"download_url": "http://localhost:8000/api/tokens/fbc_token/uploads/1",
"upload_id": "rT72ZKGMPdldiEmA9eDI7kik",
"upload_url": "http://localhost:8000/api/uploads/rT72ZKGMPdldiEmA9eDI7kik/tus",
"download_url": "http://localhost:8000/api/tokens/fbc_token/uploads/rT72ZKGMPdldiEmA9eDI7kik",
"meta_data": {
"title": "My Document",
"category": "reports"
Expand Down Expand Up @@ -545,7 +545,7 @@ Upload file chunk (TUS protocol).
**Authentication:** None

**Path Parameters:**
- `upload_id` (integer): The upload record ID
- `upload_id` (string): The upload record public ID (random string)

**Required Headers:**
- `Upload-Offset` (integer): Current upload offset (must match server state)
Expand Down Expand Up @@ -588,7 +588,7 @@ Delete an upload and its associated file (TUS protocol).
**Authentication:** None

**Path Parameters:**
- `upload_id` (integer): The upload record ID
- `upload_id` (string): The upload record public ID (random string)

**Response (204):**
No content
Expand All @@ -609,7 +609,7 @@ Cancel an in-progress upload and restore the token slot.
**Authentication:** Required via query parameter

**Path Parameters:**
- `upload_id` (integer): The upload record ID
- `upload_id` (string): The upload record public ID (random string)

**Query Parameters:**
- `token` (string, required): The upload token
Expand Down Expand Up @@ -640,12 +640,12 @@ Manually mark an upload as complete.
**Authentication:** None

**Path Parameters:**
- `upload_id` (integer): The upload record ID
- `upload_id` (string): The upload record public ID (random string)

**Response (200):**
```json
{
"id": 1,
"public_id": "rT72ZKGMPdldiEmA9eDI7kik",
"filename": "document.pdf",
"ext": "pdf",
"mimetype": "application/pdf",
Expand Down Expand Up @@ -819,7 +819,7 @@ Delete an upload record and its file (Admin only).
**Authentication:** Required (Admin)

**Path Parameters:**
- `upload_id` (integer): The upload record ID
- `upload_id` (string): The upload record public ID (random string)

**Response (204):**
No content
Expand Down Expand Up @@ -897,10 +897,10 @@ Typical upload flow:
4. **Upload File Chunks (TUS Protocol)**
```http
# Check current offset
HEAD /api/uploads/1/tus
HEAD /api/uploads/rT72ZKGMPdldiEmA9eDI7kik/tus

# Upload chunk
PATCH /api/uploads/1/tus
PATCH /api/uploads/rT72ZKGMPdldiEmA9eDI7kik/tus
Upload-Offset: 0
Tus-Resumable: 1.0.0
Content-Type: application/offset+octet-stream
Expand All @@ -910,7 +910,7 @@ Typical upload flow:

5. **Download File**
```http
GET /api/tokens/{download_token}/uploads/1
GET /api/tokens/{download_token}/uploads/rT72ZKGMPdldiEmA9eDI7kik
Authorization: Bearer YOUR_API_KEY
```

Expand Down Expand Up @@ -941,6 +941,7 @@ Typical upload flow:
- File paths are resolved and stored as absolute paths
- Upload tokens are 18-character URL-safe strings
- Download tokens are prefixed with `fbc_` followed by 16-character URL-safe strings
- Upload IDs (`public_id`) are 18-character URL-safe random strings (not sequential integers for security)
- Metadata is stored as JSON in the database (`meta_data` column)
- TUS protocol is recommended for files larger than a few MB for reliability
- Maximum chunk size is controlled by `FBC_MAX_CHUNK_BYTES` (default: 90MB)
187 changes: 103 additions & 84 deletions backend/app/cleanup.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,26 @@
import logging
from datetime import UTC, datetime, timedelta
from pathlib import Path
from typing import TYPE_CHECKING, Any

from sqlalchemy import select, update
from sqlalchemy.engine.result import Result
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.sql.selectable import Select

from . import config, models
from .db import SessionLocal

if TYPE_CHECKING:
from sqlalchemy.engine.result import Result
from sqlalchemy.sql.dml import Update
from sqlalchemy.sql.selectable import Select

logger: logging.Logger = logging.getLogger(__name__)


async def _cleanup_once() -> None:
"""Perform a single cleanup operation."""
async with SessionLocal() as session:
await _disable_expired_tokens(session)
if config.settings.incomplete_ttl_hours > 0:
Expand All @@ -22,114 +31,122 @@ async def _cleanup_once() -> None:
await _remove_disabled_tokens(session)


async def _disable_expired_tokens(session: AsyncSession) -> None:
now = datetime.now(UTC)
stmt = (
async def _disable_expired_tokens(session: AsyncSession) -> int:
"""
Disable tokens that have expired.

Args:
session (AsyncSession): The database session.

Returns:
int: The number of tokens disabled.

"""
now: datetime = datetime.now(UTC)
stmt: Update = (
update(models.UploadToken)
.where(models.UploadToken.expires_at < now)
.where(models.UploadToken.disabled.is_(False))
.values(disabled=True)
)
res = await session.execute(stmt)

res: Result[Any] = await session.execute(stmt)

if res.rowcount:
logger.info("Disabled %d expired tokens", res.rowcount)

await session.commit()
return res.rowcount


async def _remove_stale_uploads(session: AsyncSession) -> None:
"""Remove stale uploads in batches to avoid loading millions of records into memory."""
cutoff = datetime.now(UTC) - timedelta(hours=config.settings.incomplete_ttl_hours)
cutoff_naive = cutoff.replace(tzinfo=None)
async def _remove_stale_uploads(session: AsyncSession) -> int:
"""
Remove stale uploads.

total_removed = 0
batch_size = 100
Args:
session (AsyncSession): The database session.

while True:
stmt = (
select(models.UploadRecord)
.where(models.UploadRecord.status != "completed")
.where(models.UploadRecord.created_at < cutoff_naive)
.limit(batch_size)
)
res = await session.execute(stmt)
batch = res.scalars().all()
Returns:
int: The number of uploads removed.

if not batch:
break
"""
cutoff: datetime = datetime.now(UTC) - timedelta(hours=config.settings.incomplete_ttl_hours)
cutoff_naive: datetime = cutoff.replace(tzinfo=None)

for record in batch:
if record.storage_path:
path = Path(record.storage_path)
if path.exists():
try:
path.unlink()
except OSError:
logger.warning("Failed to remove stale upload file: %s", path)
await session.delete(record)
total_removed = 0

await session.flush()
await session.commit()
total_removed += len(batch)
stmt: Select[tuple[models.UploadRecord]] = (
select(models.UploadRecord).where(models.UploadRecord.status != "completed").where(models.UploadRecord.created_at < cutoff_naive)
)
res: Result[tuple[models.UploadRecord]] = await session.execute(stmt)

if len(batch) < batch_size:
break
for record in res.scalars().all():
if record.storage_path:
path = Path(record.storage_path)
if path.exists():
try:
path.unlink()
except OSError:
logger.warning("Failed to remove stale upload file: %s", path)

total_removed += 1
await session.delete(record)

await session.flush()
await session.commit()

if total_removed > 0:
logger.info("Removed %d stale uploads", total_removed)

return total_removed

async def _remove_disabled_tokens(session: AsyncSession) -> None:
"""Remove old disabled tokens in batches to avoid loading millions of records into memory."""
cutoff = datetime.now(UTC) - timedelta(days=config.settings.disabled_tokens_ttl_days)

async def _remove_disabled_tokens(session: AsyncSession) -> int:
"""
Remove old disabled tokens.

Args:
session (AsyncSession): The database session.

Returns:
int: The number of tokens removed

"""
cutoff: datetime = datetime.now(UTC) - timedelta(days=config.settings.disabled_tokens_ttl_days)

total_removed = 0
batch_size = 50

while True:
stmt = (
select(models.UploadToken)
.where(models.UploadToken.disabled.is_(True))
.where(models.UploadToken.expires_at < cutoff)
.limit(batch_size)
)
res = await session.execute(stmt)
batch = res.scalars().all()

if not batch:
break

for token in batch:
if config.settings.delete_files_on_token_cleanup:
uploads_stmt = select(models.UploadRecord).where(models.UploadRecord.token_id == token.id)
uploads_res = await session.execute(uploads_stmt)
uploads = uploads_res.scalars().all()

for upload in uploads:
if upload.storage_path:
path = Path(upload.storage_path)
if path.exists():
try:
path.unlink()
except OSError:
logger.warning(
"Failed to remove upload file during token cleanup: %s",
path,
)
await session.delete(upload)

storage_dir = Path(config.settings.storage_path).expanduser().resolve() / token.token
if storage_dir.exists() and storage_dir.is_dir():
with contextlib.suppress(OSError):
storage_dir.rmdir()

await session.delete(token)

await session.flush()
await session.commit()
total_removed += len(batch)

if len(batch) < batch_size:
break
stmt: Select[tuple[models.UploadToken]] = (
select(models.UploadToken).where(models.UploadToken.disabled.is_(True)).where(models.UploadToken.expires_at < cutoff)
)
res: Result[tuple[models.UploadToken]] = await session.execute(stmt)

for token in res.scalars().all():
if config.settings.delete_files_on_token_cleanup:
uploads_stmt: Select[tuple[models.UploadRecord]] = select(models.UploadRecord).where(models.UploadRecord.token_id == token.id)
uploads_res: Result[tuple[models.UploadRecord]] = await session.execute(uploads_stmt)

for upload in uploads_res.scalars().all():
if upload.storage_path:
path = Path(upload.storage_path)
if path.exists():
try:
path.unlink()
except OSError:
logger.warning("Failed to remove upload file during token cleanup: %s", path)

total_removed += 1
await session.delete(upload)

storage_dir: Path = Path(config.settings.storage_path).expanduser().resolve() / token.token
if storage_dir.exists() and storage_dir.is_dir():
with contextlib.suppress(OSError):
storage_dir.rmdir()

await session.delete(token)

await session.flush()
await session.commit()

if total_removed > 0:
logger.info(
Expand All @@ -138,6 +155,8 @@ async def _remove_disabled_tokens(session: AsyncSession) -> None:
config.settings.delete_files_on_token_cleanup,
)

return total_removed


async def start_cleanup_loop() -> None:
while True:
Expand Down
Loading