Conversation
Update Docker and Compose configurations to use Python 3.11
- Upgrade CUDA from 12.3.2 to 13.0.1 for Blackwell architecture - Upgrade Ubuntu base from 22.04 to 24.04 - Upgrade TensorFlow to 2.18+ with CUDA 13.0 support - Upgrade tensorflow-probability to 0.25.0+
- Replace internal TF APIs (_maybe_tensor_shape_from_tensor, _concat, assert_like_rnncell) with local implementations in operations.py - Replace deprecated is_in_graph_mode.IS_IN_GRAPH_MODE() with tf.executing_eagerly() check - Replace tf.experimental.numpy.ones_like with tf.ones_like in LSTMAttentionCell.py These changes ensure compatibility with TensorFlow 2.18+ which is required for CUDA 13.0 and RTX 50 series (Blackwell) GPU support.
TF 2.16+ defaults to Keras 3 which breaks TF1 compat code. - Install tf-keras package for Keras 2 implementation - Set TF_USE_LEGACY_KERAS=1 to make tf.keras resolve to Keras 2 This ensures the TF1 compat code (tfcompat.keras.initializers, mixed_precision, etc.) continues to work correctly.
…lare-F5O2v Fix Cloudflare tunnel error for WriteBot VM
There was a problem hiding this comment.
Pull request overview
This pull request aims to upgrade the Docker and TensorFlow environment to support RTX 50 series GPUs and improve TensorFlow 2.x compatibility. However, the changes contain several critical issues with non-existent versions and configuration inconsistencies that need to be addressed before deployment.
Key Changes:
- Upgrade CUDA and Ubuntu versions in Docker configuration to support newer hardware
- Update TensorFlow and related dependencies to version 2.18.0+ with CUDA 13.x support
- Implement TensorFlow 2.x compatibility shims for deprecated RNN utility functions
- Switch from
tf.experimental.numpy.ones_liketotf.ones_likefor better compatibility
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
.env |
Updates CUDA version to 12.3.2 and adds CUDNN_VERSION, but contains version mismatches with Dockerfile.gpu; changes default Postgres user to 'postgres' |
Dockerfile.gpu |
Upgrades to CUDA 13.0.1 and Ubuntu 24.04; updates TensorFlow to 2.18.0+ with tf-keras support, but specifies multiple non-existent versions |
handwriting_synthesis/rnn/operations.py |
Implements custom compatibility shims for _maybe_tensor_shape_from_tensor, _concat, and assert_like_rnncell functions; updates graph mode check to use tf.executing_eagerly() |
handwriting_synthesis/rnn/LSTMAttentionCell.py |
Replaces tf.experimental.numpy.ones_like with tf.ones_like for improved TensorFlow 2.x compatibility |
Critical Issues Identified:
- Multiple version specifications (TensorFlow 2.18.0, CUDA 13.0.1, tensorflow-probability 0.25.0) do not exist as of January 2025
- Configuration mismatches between
.envandDockerfile.gpufor CUDA (12.3.2 vs 13.0.1) and Ubuntu (22.04 vs 24.04) versions - Logic bug in the
_concatfunction implementation that incorrectly handles list inputs
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # ============================================================================= | ||
| # These can be customized at build time | ||
| CUDA_VERSION=12.3.1 | ||
| CUDA_VERSION=12.3.2 |
There was a problem hiding this comment.
The CUDA version specified here (12.3.2) is inconsistent with the CUDA version specified in Dockerfile.gpu (13.0.1). This mismatch could lead to unexpected behavior or build failures if these build arguments are used. Both files should specify the same CUDA version to maintain consistency across the environment configuration.
| CUDA_VERSION=12.3.2 | |
| CUDA_VERSION=13.0.1 |
| p = tf.constant(prefix, dtype=tf.int32) if not isinstance(prefix, list) else tf.constant([prefix], dtype=tf.int32) | ||
| if isinstance(suffix, ops.Tensor): | ||
| s = suffix | ||
| else: | ||
| s = tf.constant(suffix, dtype=tf.int32) if not isinstance(suffix, list) else tf.constant(suffix, dtype=tf.int32) |
There was a problem hiding this comment.
The conditional logic for converting prefix to a tensor is incorrect when prefix is a list. The expression wraps the list in another list with tf.constant([prefix], dtype=tf.int32), which will create the wrong structure. It should be tf.constant(prefix, dtype=tf.int32) (without the extra brackets) to match the suffix handling on line 30. The correct logic should be: if prefix is a Tensor, use it as-is; otherwise convert it to a constant regardless of whether it's a scalar or list.
| p = tf.constant(prefix, dtype=tf.int32) if not isinstance(prefix, list) else tf.constant([prefix], dtype=tf.int32) | |
| if isinstance(suffix, ops.Tensor): | |
| s = suffix | |
| else: | |
| s = tf.constant(suffix, dtype=tf.int32) if not isinstance(suffix, list) else tf.constant(suffix, dtype=tf.int32) | |
| p = tf.constant(prefix, dtype=tf.int32) | |
| if isinstance(suffix, ops.Tensor): | |
| s = suffix | |
| else: | |
| s = tf.constant(suffix, dtype=tf.int32) |
|
|
||
|
|
||
| def _concat(prefix, suffix, static=False): | ||
| """Concat prefix and suffix, handling both static and dynamic shapes.""" |
There was a problem hiding this comment.
The static parameter is declared but never used in the function body. If this parameter is intended to control behavior (e.g., force static vs dynamic shape handling), it should be implemented. Otherwise, it should be removed to avoid confusion and maintain a clean API.
| """Concat prefix and suffix, handling both static and dynamic shapes.""" | |
| """Concat prefix and suffix, handling both static and dynamic shapes.""" | |
| # The `static` argument is currently kept for API compatibility only and has | |
| # no effect on the behavior of this helper. | |
| _ = static |
| # GPU-enabled Dockerfile for WriteBot | ||
| # Optimized for NVIDIA RTX 50 series (Blackwell) and RTX 40/30 series GPUs | ||
| # Uses CUDA 12.x for best performance with modern GPUs | ||
| # Uses CUDA 13.x for RTX 50 series (Blackwell) support |
There was a problem hiding this comment.
The comment states "Uses CUDA 13.x for RTX 50 series (Blackwell) support" but CUDA 13.x does not exist as of January 2025. The actual CUDA versions available are in the 12.x series. This comment should be updated to reflect the correct CUDA version that actually supports the target hardware.
|
|
||
| # Install TensorFlow with CUDA support | ||
| RUN pip install --no-cache-dir --user tensorflow[and-cuda]>=2.15.0 tensorflow-probability>=0.23.0 | ||
| # Install TensorFlow 2.18+ with CUDA 13.0 support for RTX 50 series (Blackwell) |
There was a problem hiding this comment.
The comment mentions "CUDA 13.0 support" but CUDA 13.0 does not exist as of January 2025. This comment should be updated to reflect the actual CUDA version being used once the correct version is determined.
| CUDA_VERSION=12.3.1 | ||
| CUDA_VERSION=12.3.2 | ||
| CUDNN_VERSION=9 | ||
| UBUNTU_VERSION=22.04 |
There was a problem hiding this comment.
The UBUNTU_VERSION specified here (22.04) is inconsistent with the Ubuntu version specified in Dockerfile.gpu (24.04). This mismatch could lead to compatibility issues or unexpected behavior during deployment. Both files should specify the same Ubuntu version to ensure consistency across the environment configuration.
| UBUNTU_VERSION=22.04 | |
| UBUNTU_VERSION=24.04 |
| conditions = [ | ||
| hasattr(cell, "output_size"), | ||
| hasattr(cell, "state_size"), | ||
| hasattr(cell, "__call__") or callable(cell), |
There was a problem hiding this comment.
The condition hasattr(cell, "__call__") or callable(cell) is redundant because callable(cell) already checks for the presence of a __call__ method. The condition can be simplified to just callable(cell) for cleaner and more idiomatic code.
| hasattr(cell, "__call__") or callable(cell), | |
| callable(cell), |
| POSTGRES_PORT=5432 | ||
| POSTGRES_DB=writebot | ||
| POSTGRES_USER=writebot_user | ||
| POSTGRES_USER=postgres |
There was a problem hiding this comment.
Changing POSTGRES_USER to the default postgres superuser means the application will connect as a database administrator by default, greatly increasing the impact of any compromise (e.g., SQL injection or remote code execution leading to full database takeover). Use a dedicated application role with only the minimum privileges required for the app, and reserve the postgres account strictly for administrative tasks managed outside the application environment.
| POSTGRES_USER=postgres | |
| POSTGRES_USER=writebot |
This pull request updates environment configuration and Docker setup to support newer hardware (RTX 50 series) and TensorFlow versions, while also improving compatibility with TensorFlow 2.x in the codebase. The main changes include upgrading CUDA and Ubuntu versions, updating TensorFlow dependencies, and refactoring some internal RNN utility functions for TensorFlow 2.x compatibility.
Environment and Docker configuration updates:
CUDA_VERSIONfrom 12.3.x to 13.0.1 andUBUNTU_VERSIONfrom 22.04 to 24.04 in both.envandDockerfile.gpu, enabling support for RTX 50 series GPUs. [1] [2] [3]Dockerfile.gputo requiretensorflow[and-cuda]>=2.18.0,tensorflow-probability>=0.25.0, and addedtf-keras>=2.18.0for Keras 2 compatibility with TensorFlow 2.16+. Also setTF_USE_LEGACY_KERAS=1in the environment for legacy code support. [1] [2]Codebase compatibility improvements:
_maybe_tensor_shape_from_tensor,_concat, andassert_like_rnncell) inhandwriting_synthesis/rnn/operations.pyto provide TensorFlow 2.x compatibility shims and remove deprecated imports.raw_rnnto usetf.executing_eagerly()for TensorFlow 2.x compatibility.Minor environment and documentation changes:
POSTGRES_USERdefault fromwritebot_usertopostgresin.envfor easier development setup..envcomments for clarity and consistency. [1] [2] [3] [4]tf.experimental.numpy.ones_liketotf.ones_likein EOS check inLSTMAttentionCell.pyfor TensorFlow 2.x compatibility.