Skip to content

Comments

Claude/test concierge modal rew gs#85

Open
mirai-gpro wants to merge 123 commits intoaigc3d:masterfrom
mirai-gpro:claude/test-concierge-modal-rewGs
Open

Claude/test concierge modal rew gs#85
mirai-gpro wants to merge 123 commits intoaigc3d:masterfrom
mirai-gpro:claude/test-concierge-modal-rewGs

Conversation

@mirai-gpro
Copy link

No description provided.

claude and others added 30 commits February 7, 2026 13:02
Root cause: defaults.py's default_setup() and default_config_parser()
assume a distributed training environment with writable filesystem.
On Cloud Run (read-only /app), this causes silent init failures.

Changes:
- app.py: Skip default_setup() entirely, manually set CPU/single-process config
- app.py: Redirect save_path to /tmp (only writable dir on Cloud Run)
- app.py: Add GCS FUSE mount path resolution with Docker-baked fallback
- cloudbuild.yaml: Add Cloud Storage FUSE volume mount for model serving
- cloudbuild.yaml: Increase max-instances to 4
- Include handoff docs and full LAM_Audio2Expression codebase

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
The LAM model file was misidentified as .tar but is actually a PyTorch
weights file. Gemini renamed it to .pth on GCS. Also source wav2vec2
config.json from the model directory instead of LAM configs/.

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
- Import gourmet-sp from implementation-testing branch
- Add sendAudioToExpression() to shop introduction TTS flow
  (firstShop and remainingShops now get lip sync data before playback)
- Remove legacy event hooks in concierge-controller init()
  (replaced with clean linkTtsPlayer helper)
- Clean up LAMAvatar.astro: remove legacy frame playback code
  (startFramePlaybackFromQueue, stopFramePlayback, frameQueue, etc.)
- Simplify to single sync mechanism: frameBuffer + ttsPlayer.currentTime
- Reduce health check interval from 2s to 10s

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
Using official LAM sample avatar as placeholder. Will be replaced with
custom-generated avatar later.

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
- Add fade-in/fade-out smoothing (6 frames / 200ms) to prevent
  Gaussian Splat visual distortion at speech start/end
- Parallelize expression generation with TTS synthesis:
  remaining sentence expression is pre-fetched during first
  sentence playback, eliminating wait time between segments
- Add fetchExpressionFrames() for background expression fetch
  with pendingExpressionFrames buffer swap pattern
- Apply same optimization to shop introduction flow

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
sendAudioToExpression fetch could hang indefinitely (Cloud Run cold
start / service down), blocking await and preventing TTS play().

- Add AbortController timeout (8s) to all expression API fetches
- Wrap expression await with Promise.race so TTS plays even if
  expression API is slow/down (lip sync degrades gracefully)
- Applied to speakTextGCP, speakResponseInChunks, and shop flow

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
Root cause: sendAudioToExpression fetch hung in browser, blocking
await and preventing TTS play() from ever being called.

Fix: all expression API calls are now fire-and-forget - TTS playback
starts immediately without waiting for expression frames. Frames
arrive asynchronously and getExpressionData() picks them up in
real-time from the frameBuffer.

- Remove await/Promise.race from all sendAudioToExpression calls
- Remove fetchExpressionFrames and pendingExpressionFrames
  (no longer needed - direct fire-and-forget is simpler)
- Keep AbortController timeout (8s) inside sendAudioToExpression
  to prevent leaked connections

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
… calls

Architecture change: expression frames are now returned WITH TTS audio
from the backend, instead of the frontend calling audio2exp directly.

Backend (app_customer_support_modified.py):
- Replace fire-and-forget send_to_audio2exp with get_expression_frames
  that returns {names, frames, frame_rate}
- Send MP3 directly to audio2exp (no separate PCM generation needed)
- TTS response: {success, audio, expression: {...}}
- Server-to-server communication: no CORS, stable, fast

Frontend (concierge-controller.ts):
- New queueExpressionFromTtsResponse() reads expression from TTS response
- Remove sendAudioToExpression (direct browser→audio2exp REST calls)
- Remove audio2expApiUrl, audio2expWsUrl, connectLAMAvatarWebSocket
- Remove EXPRESSION_API_TIMEOUT_MS, AbortController timeout
- Existing 1st-sentence-ahead pattern now automatically includes
  expression data (no separate API call needed)

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
…orget proxy

- Backend: TTS endpoint no longer blocks on expression generation
- Backend: New /api/audio2expression proxy (server-to-server, CORS-free)
- Frontend: All expression calls use fireAndForgetExpression() (never blocks TTS play)
- Removes ~2s first-sentence delay caused by synchronous expression in TTS

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
…aining

Two bugs fixed:
1. Buffer corruption: frames from segment 1 mixed with segment 2
   (ttsPlayer.currentTime resets but frameBuffer was concatenated)
   → Now clear buffer before each new TTS segment

2. 3-second delay: expression frames arrived after TTS started playing
   → Pre-fetch remaining segment's expression during first segment playback
   → When second segment starts, pre-fetched frames are immediately available

New prefetchExpression() method returns Promise with parsed frames,
applied non-blocking via .then() to never delay TTS playback.

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
Architecture change: backend includes expression data in TTS response
(server-to-server audio2exp call ~150ms) instead of separate proxy.

- Backend TTS endpoint calls audio2exp synchronously, includes result
- Frontend applyExpressionFromTts(): instant buffer queue from TTS data
- Proxy fireAndForgetExpression kept as fallback (timeout/error cases)
- All 5 call sites (speakTextGCP, speakResponseInChunks x2, shop x2) updated
- Removes prefetch complexity (TTS response already carries expression)

Result: lip sync starts from frame 0, no 2-3 second gap.

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
Architecture redesign for true zero-delay TTS playback:
- Backend TTS endpoint starts audio2exp in background thread, returns
  audio + expression_token immediately (no blocking)
- New /api/expression/poll endpoint: frontend polls for result
- Frontend pollExpression(): fire-and-forget polling at 150ms intervals
- Removes sync expression, proxy, and prefetch approaches

Timeline: TTS returns ~500ms, audio2exp completes ~150ms later (background),
frontend first poll arrives ~200ms after TTS → expression available ~350ms
after playback starts. Previous: 2-3 seconds delay or TTS blocked.

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
…aster response

Backend: revert to sync expression in TTS response (remove async cache/polling).
Frontend: replace pollExpression with applyExpressionFromTts (sync from TTS response).
Frontend: fire sendMessage() immediately while ack plays (don't await firstAckPromise).
pendingAckPromise is awaited before TTS playback to prevent ttsPlayer conflict.

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
…nterrupt)

unlockAudioParams() does play→pause→reset on ttsPlayer for iOS unlock.
When called during ack playback (parallel LLM mode), it kills the ack audio.
Skip it when pendingAckPromise is active (audio already unlocked by ack).

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
…rentAudio safety

Root cause: ack "はい" gets paused (not ended) by some interruption, so
pendingAckPromise never resolves → speakResponseInChunks stuck forever.
Fix 1: resolve pendingAckPromise on both 'ended' and 'pause' events.
Fix 2: call stopCurrentAudio() after pendingAckPromise resolves to ensure
ttsPlayer is clean before new TTS playback.

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
- Container: max-height 650px → height calc(100dvh - 40px), max-height 960px
- Avatar stage: 140px → 300px (desktop), 100px → 200px (mobile)
- Chat area: min-height 150px guaranteed for message display

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
Post-init camera: Z 1→0.6 (closer), Y 1.8→1.75 (slight down), FOV 50→36 (zoom in).
Eliminates wasted space above avatar head in the 300px avatar-stage.

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
Previous: lookAt y=1.8 (head center) + tight zoom → mouth cut off at bottom.
Fix: lower target to y=1.62 (nose/mouth center), adjust OrbitControls target
to match. Camera Z=0.55, FOV=38 for balanced framing.

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
targetY 1.62→1.66 (avatar lower in frame), camera Y 1.62→1.72
(above target, slight downward angle instead of looking up from below)

https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
Key improvements over existing lam_modal.py:
- @modal.asgi_app() + Gradio 4.x instead of subprocess + patching
- Direct Python integration with LAM pipeline (no regex patching)
- Blender 4.2 included for GLB generation (OpenAvatarChat format)
- Focused UI for concierge.zip generation with progress feedback
- Proper ASGI serving resolves Gradio UI display issue on Modal

Pipeline: Image → FLAME Tracking → LAM Inference → Blender GLB → ZIP

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
Major update to concierge_modal.py:
- Custom video upload: VHAP FLAME tracking extracts per-frame
  expression/pose parameters from user's own motion video
- Video preprocessing pipeline: frame extraction, face detection
  (VGGHead), background matting, landmark detection per frame
- VHAP GlobalTracker integration for multi-frame optimization
- Export to NeRF dataset format (transforms.json + flame_param/*.npz)
- Gradio UI: motion source selector (custom video or sample)
- Preview video with optional audio from source video
- Max 300 frames (10s@30fps) cap for manageable processing

This enables generating high-quality concierge.zip with custom
expressions/movements instead of being limited to pre-set samples.

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
- Replace add_local_dir("./assets") with HuggingFace downloads for all
  required model assets (FLAME tracking, parametric models, LAM assets)
- Remove REQUIRED_ASSET local check since assets are fetched at build time
- Build VHAP config programmatically instead of loading from YAML file
- Remove deprecated allow_concurrent_inputs parameter
- Add flame_vhap symlink for VHAP tracking compatibility
- Add critical file verification in _download_models()

Fixes FileNotFoundError: flame2023.pkl not found in container

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
Replace container-build-time HuggingFace downloads with add_local_dir
to mount model files from the user's local LAM repo. This is faster
and avoids dependency on HuggingFace availability.

- Add _has_model_zoo / _has_assets detection at module level
- Mount ./model_zoo and ./assets via add_local_dir (conditional)
- Add _setup_paths() to bridge directory layout differences:
  - assets/human_parametric_models → model_zoo/human_parametric_models
  - flame_assets/flame2023.pkl → flame_assets/flame/ (flat layout)
  - flame_vhap symlink for VHAP tracker
- Add model file verification with find-based search

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
Modal requires add_local_dir to be the last image build step.
Move _setup_model_paths() from run_function (build time) to
_init_lam_pipeline() (container startup) to comply with this.

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
User keeps all models under assets/ (not model_zoo/).
Instead of symlinking individual subdirectories, symlink the entire
model_zoo -> assets when model_zoo doesn't exist. This bridges
lam_models, flame_tracking_models, and human_parametric_models
all at once.

Also adds model.safetensors to the verification checklist.

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
Three files are not available locally and must be downloaded:
- model.safetensors (LAM-20K model weights from 3DAIGC/LAM-20K)
- template_file.fbx, animation.glb (from Ethan18/test_model LAM_assets.tar)

Download runs via run_function BEFORE add_local_dir to satisfy
Modal's ordering constraint. Downloads are cached in the image layer.

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
1. Downloaded LAM assets (template_file.fbx, animation.glb) were
   being overwritten by the add_local_dir mount of assets/.
   Fix: copy extracted assets into model_zoo/ during build so they
   survive the mount. Update all path references accordingly.

2. Pin gradio==4.44.0 and gradio_client==1.3.0 to avoid the
   json_schema_to_python_type TypeError on additionalProperties.

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
1. Switch assets download from Ethan18/test_model (incomplete) to
   official 3DAIGC/LAM-assets which includes sample_oac/ with
   template_file.fbx and animation.glb.

2. Monkey-patch gradio_client._json_schema_to_python_type to handle
   boolean additionalProperties schema (TypeError on bool).

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
claude and others added 30 commits February 15, 2026 08:31
FLAME model buffers are loaded from .pkl files during init, so missing
from the checkpoint is expected. Only flag non-FLAME missing keys as
critical issues that indicate randomly initialized parameters.

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
Tests model loading + inference directly on Modal GPU, using app_lam.py's
core code path. Reports weight loading status, Gaussian quality stats,
and a verdict on whether the issue is in our pipeline or the environment.

Usage:
  modal run concierge_modal.py --smoke-test
  modal run concierge_modal.py --smoke-test --image face.png

This isolates: is the model on Modal producing bad Gaussians (environment
problem) or is it our pipeline code that's wrong?

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
…M env

Root cause: DINOv2's MemEffAttention (attention.py:72-89) uses
xformers.ops.memory_efficient_attention when available, but falls back
to standard Attention.forward() when xformers is not installed.

The LAM model was TRAINED with xformers. Without it, the fallback
attention produces different features across 24 DINOv2 ViT-L layers,
causing the GS decoder to output ~83% opacity > 0.9 instead of ~4%.

Changes:
- PyTorch 2.2.0 → 2.3.0 (matches scripts/install/install_cu118.sh)
- Add xformers 0.0.26.post1 (matches scripts/install/install_cu118.sh)
- pytorch3d: unpin version (matches official requirements.txt)
- Add xformers availability check at container startup

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
Shows [ENV] line in diagnostics so we can verify the Modal image
was actually rebuilt with PyTorch 2.3.0 + xformers 0.0.26.post1.

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
Extracts the verified inference pipeline from concierge_modal.py into
a standalone Dockerfile + app_concierge.py that runs on HF Spaces
Docker SDK or any Docker+GPU environment.

- Dockerfile: nvidia/cuda:11.8 base with PyTorch 2.3.0, xformers 0.0.26.post1,
  Blender 4.2, all CUDA extensions pre-built
- app_concierge.py: Single-process Gradio app, no Modal dependencies,
  same generation pipeline (FLAME tracking + VHAP + LAM + Blender GLB)
- download_models.py: Fetches all model weights during Docker build

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
Integrated several fixes to improve model loading and GLB export. Updated comments for clarity and ensured proper usage of official tools.
The concierge_modal.py had a hand-written inline convert_and_order.py Blender
script that re-implemented the GLB generation logic from
tools/generateARKITGLBWithBlender.py. This "re-invention" diverged from the
official pipeline in subtle ways:

1. Vertex order was generated from the FBX mesh (using matrix_world for Z),
   while the official generateVertexIndices.py imports the OBJ and applies a
   90-degree rotation before sorting by Z.

2. The inline script combined GLB export + vertex order in one Blender session,
   bypassing the official convertFBX2GLB.py and generateVertexIndices.py scripts.

Now we call generate_glb() directly — the same function app_lam.py uses — which
runs the official Blender scripts (convertFBX2GLB.py, generateVertexIndices.py)
and handles temp file cleanup internally.

This eliminates the inline Blender script as a potential source of quality
divergence, while keeping all other improvements intact (xformers, weight
validation, diagnostics, torch.compile disabled).

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
…leanup

Root cause: lam/, vhap/, configs/, external/, app_lam.py were never mounted
into the Modal container. The container was using stale upstream code from
`git clone github.com/aigc3d/LAM`. Local modifications had no effect.

Changes:
- Mount lam/, configs/, vhap/, external/, app_lam.py, app_concierge.py
- Add BUILD_VERSION env var to force image rebuild on every deploy
- Clear old Volume output (concierge.zip, preview.mp4, etc.) before each
  generation to prevent returning stale cached results
- Log BUILD_VERSION on GPU worker startup for verification

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
…ling

- Replace bare-bones UI with polished design using gr.themes.Soft() and
  custom CSS (gradient header, step labels, tip/usage boxes)
- Add step-by-step guided input flow (Step 1/2/3 labels with tips)
- GPU worker now writes intermediate progress to per-job JSON on Volume,
  so UI displays real pipeline step names instead of just elapsed time
- Poll interval reduced from 5s to 2s for snappier feedback
- Per-job file scoping (tracked_{job_id}.png etc.) for multi-user safety
- Proper error propagation: GPU thread errors surface in UI instead of
  being silently swallowed by except/pass
- 30-minute timeout guard to prevent infinite polling
- Output section with autoplay preview, usage instructions, and labeled
  visualization panels

https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
Updated the concierge ZIP generator script with final fixes and optimizations. Adjusted error handling, improved file management, and ensured consistent behavior with the official tools.
generate_glb() already produces correct vertex_order.json via Blender's
gen_vertex_order_with_blender(). The trimesh-based overwrite replaced it
with naive sequential ordering (list(range(n_verts))), which is wrong
because Blender reorders vertices on FBX import. This mismatch caused
mesh vertices to map to wrong animation bones → "bird monster" avatar.

Also adds 49 tests covering:
- ZIP structure validation (GLB magic, PLY header, vertex_order permutation)
- Code correctness static analysis (no sequential vertex_order pattern)
- Comparison of fne (working) vs now (broken) ZIPs
- Pipeline logic and code consistency checks
- Bug regression tests

https://claude.ai/code/session_013XCjSHD8gSKrUKs77SM9TB
Previous runs' results persisted across generations because:
1. Modal Volume files (concierge.zip, preview.mp4, etc.) were never cleaned
2. FLAME tracking output/tracking/ was only partially cleaned (subdirs only)
3. generate_glb() temp files (temp_ascii.fbx, temp_bin.fbx) could survive crashes
4. Leftover status_*.json from previous jobs confused the polling logic

Fixes applied at 3 layers (defense in depth):
- Generator.generate(): Wipes volume + tracking dir + temp files BEFORE work
- _generate_concierge_zip(): rmtree entire output/tracking/ + clean temp files
- Web UI process(): Clears volume before launching GPU job

Added 6 cache prevention tests (55 total, all passing).

https://claude.ai/code/session_013XCjSHD8gSKrUKs77SM9TB
Root cause: _call_gpu() swallowed exceptions with print(), so when
the GPU container crashed/timed out, the UI polling loop waited
30 minutes for a status file that would never be written.

Three fixes:
1. _call_gpu() now writes error status to volume AND shared dict
   on failure, instead of just print()
2. UI polling loop detects dead GPU thread via shared gpu_error
   state — immediately reports error instead of waiting 30 min
3. Generator.generate() has finally block that writes status file
   as last resort even if except block also fails

Also:
- GPU timeout: 600s → 1800s (pipeline takes 10-25 min)
- scaledown_window: 10s → 60s (avoid cold starts during iteration)

60 tests passing.

https://claude.ai/code/session_013XCjSHD8gSKrUKs77SM9TB
Root cause: gen.generate.remote() blocks for the ENTIRE duration
(GPU provisioning + cold start + @modal.enter() + pipeline).
UI had a 30-min wall-clock timeout that was shorter than the
actual total time (provisioning ~5min + setup ~10min + pipeline ~20min).

Fixes:
1. GPU writes progress_<job_id>.json heartbeat at each pipeline step
   via _write_progress() — proves to UI that GPU is alive
2. UI reads heartbeat and resets idle timer on each update
3. Two-tier timeout:
   - IDLE_TIMEOUT=600s: no heartbeat for 10min = GPU dead
   - MAX_TIMEOUT=3600s: absolute 1-hour cap for everything
4. UI now shows "Waiting for GPU... (provisioning/startup)" when
   no heartbeat received yet, then shows actual pipeline step name
   once GPU starts reporting progress

Before: "Processing... (25m00s)" → timeout at 30min
After:  "[5m30s] Step 1: FLAME tracking..." → "[15m20s] Step 4: Running LAM inference..."

64 tests passing.

https://claude.ai/code/session_013XCjSHD8gSKrUKs77SM9TB
Root cause: _track_video_to_motion() blocks for 5-15 minutes during
VHAP tracker.optimize() with NO heartbeat sent to volume. The
status_callback parameter was available but never connected. UI's
IDLE_TIMEOUT (10 min) triggers "No GPU heartbeat" error.

Fixes:
1. Connect progress_callback through the full chain:
   Generator.generate() → _generate_concierge_zip(progress_callback=)
   → _track_video_to_motion(status_callback=)
2. VHAP optimize() now runs in a thread with a background heartbeat
   thread that sends "VHAP tracking in progress... (Nm)" every 60s
3. Frame extraction reports every 30 frames
4. All sub-step progress forwarded to volume via _write_progress()

Before: Step 2 starts → 10 min silence → idle timeout
After:  Step 2 starts → "Extracting frames (30 done)" → "VHAP tracking
        in progress (1m)" → "VHAP tracking (2m)" → ... → Step 3

68 tests passing.

https://claude.ai/code/session_013XCjSHD8gSKrUKs77SM9TB
Eliminate 2-container split (UI + GPU) that caused all timeout/heartbeat
bugs. Now uses @app.cls with @modal.enter() + @modal.asgi_app() in one
GPU container — process() calls pipeline directly, no Volume/polling/threading.
scaledown_window=300 handles GPU cost optimization (auto-shutdown after 5min idle).

Tests updated from 68 → 64 (removed Volume/heartbeat/polling tests,
added single-container architecture validation).

https://claude.ai/code/session_013XCjSHD8gSKrUKs77SM9TB
Root cause: nvdiffrast uses torch.utils.cpp_extension.load() which
JIT-compiles CUDA C++ code on first import. Without caching, this
recompiles on EVERY container cold start (~10-30 min each time).

Fix:
- Add image build step that triggers nvdiffrast JIT compilation
  with -Wno-c++11-narrowing flag (required for clang)
- Set TORCH_EXTENSIONS_DIR=/root/.cache/torch_extensions so compiled
  .so files persist in the image layer
- Container startup reuses pre-compiled cache instead of recompiling
- Add [TIMING] logs to @modal.enter() and _init_lam_pipeline() to
  identify remaining bottlenecks

https://claude.ai/code/session_013XCjSHD8gSKrUKs77SM9TB
Shell quoting broke with inline python -c containing def statement.
Use Modal's run_function() which serializes Python directly, avoiding
all shell escaping issues.

https://claude.ai/code/session_013XCjSHD8gSKrUKs77SM9TB
27349.6s (7.6h) wait was caused by cold start: container scales down
after scaledown_window, next request triggers full @modal.enter()
(model loading + possible CUDA JIT). keep_warm=1 maintains one
container always running so users never hit cold start.

Also includes [TIMING] logs to identify remaining bottlenecks -
check with: modal app logs concierge-zip-generator

https://claude.ai/code/session_013XCjSHD8gSKrUKs77SM9TB
Root cause: DINOv2 weights (1.13GB) were not baked into the image,
causing every container to download them via torch.hub at runtime.
Without max_containers, Modal auto-scaled to 10+ containers, each
downloading 1.13GB simultaneously and competing for bandwidth,
inflating setup from ~30s to 100s+ and causing crash cascades.

- Pre-download dinov2_vitl14_reg4_pretrain.pth in image build step
- Add max_containers=1 to prevent cascade spawning

https://claude.ai/code/session_013XCjSHD8gSKrUKs77SM9TB
…ontainer

The container was using upstream git-cloned code for the LAM package,
configs, VHAP tracker, and external dependencies.  The local repo has
critical fixes (compile disable in config, potential attention/model
differences) that the upstream clone lacks.

Mount all local source directories so the container runs identical
code to what works locally.

https://claude.ai/code/session_013XCjSHD8gSKrUKs77SM9TB
Three changes to fix the corrupted 3D output:

1. Mount local app_lam.py into container — `from app_lam import
   parse_configs` was using the upstream git-cloned version instead of
   the local one.

2. Set TORCHDYNAMO_DISABLE=1 as image-level env var — this is a more
   reliable kill-switch than torch._dynamo.config.disable = True for
   the @torch.compile decorators on Dinov2FusionWrapper.forward and
   ModelLAM.forward_latent_points.

3. Add runtime diagnostics that print xformers availability and dynamo
   disable state during container startup, making future debugging
   easier.

https://claude.ai/code/session_013XCjSHD8gSKrUKs77SM9TB
Setting TORCHDYNAMO_DISABLE=1 in the image .env() invalidated Modal's
layer cache, forcing a full rebuild (~2 hours) of nvdiffrast compilation
and all model downloads.

Move the env var to _init_lam_pipeline() before `import torch._dynamo`
so it still takes effect but doesn't bust the image cache.

https://claude.ai/code/session_013XCjSHD8gSKrUKs77SM9TB
ROOT CAUSE: The LAM-20K model was fine-tuned with encoder_freeze=false
using standard attention (xformers is NOT in requirements.txt and not
installed locally). On Modal, xformers was explicitly installed, causing
DINOv2 to use memory_efficient_attention instead of standard attention.

While mathematically equivalent, the different floating-point computation
order compounds over 24 transformer layers, producing completely wrong
features when weights were optimized for the other attention path.

Fix: Set XFORMERS_DISABLED=1 before importing the DINOv2 attention
module, forcing the same standard attention path used during training.

https://claude.ai/code/session_013XCjSHD8gSKrUKs77SM9TB
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants