SyntaxSpirits · kholdrex · Jul 6, 2025 · Jul 6, 2025
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,47 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.4.0] - 2025-07-06
+
+### Added
+- **Advanced Learning Rate Scheduling**: Comprehensive expansion of learning rate scheduling capabilities
+  - **PolynomialLR**: Polynomial decay with configurable power for smooth learning rate transitions
+  - **CyclicalLR**: Cyclical learning rates with triangular, triangular2, and exponential range modes
+  - **WarmupScheduler**: Generic warmup wrapper that can be applied to any base scheduler
+  - **LRScheduleVisualizer**: ASCII visualization tool for learning rate schedules
+
+- **Enhanced Scheduler Integration**:
+  - Convenience factory methods for new schedulers in `ScheduledOptimizer`
+  - Helper functions: `polynomial`, `cyclical`, `cyclical_triangular2`, `cyclical_exp_range`
+  - Complete integration with existing training infrastructure
+  - Comprehensive test coverage for all new schedulers
+
+- **Learning Rate Visualization**:
+  - ASCII-based schedule visualization with customizable dimensions
+  - Schedule generation utilities for analysis and debugging
+  - Visual comparison tools for different scheduler behaviors
+  - Integration examples showing visualization usage
+
+- **Advanced Training Examples**:
+  - `advanced_lr_scheduling.rs`: Comprehensive demonstration of new schedulers
+  - Warmup + cyclical learning rate combinations
+  - Best practices example with dropout + gradient clipping + advanced scheduling
+  - Performance comparison between different scheduling strategies
+
+### Technical Improvements
+- Extended scheduler trait system to support generic warmup wrapper
+- Robust cyclical learning rate computation with proper cycle handling
+- Polynomial decay implementation with numerical stability
+- Comprehensive error handling and edge case management
+- Enhanced documentation with visual examples and mathematical formulations
+
+### Benefits
+- More sophisticated learning rate control for better training quality
+- Modern scheduling techniques used in state-of-the-art deep learning
+- Visualization capabilities for schedule analysis and debugging
+- Flexible warmup support for any existing scheduler
+- Production-ready implementations with comprehensive testing
+
 ## [0.3.0] - 2025-07-03
 
 ### Added
@@ -189,4 +230,5 @@ When contributing to this project, please:
 
 - **v0.1.0**: Initial LSTM implementation with forward pass
 - **v0.2.0**: Complete training system with BPTT and optimizers
-- **v0.3.0**: Learning rate scheduling, GRU implementation, BiLSTM, enhanced dropout, and model persistence 
+- **v0.3.0**: Learning rate scheduling, GRU implementation, BiLSTM, enhanced dropout, and model persistence
+- **v0.4.0**: Advanced learning rate scheduling with 12 different schedulers, warmup support, cyclical rates, and visualization 
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "rust-lstm"
-version = "0.3.0"
+version = "0.4.0"
 authors = ["Alex Kholodniak <alexandrkholodniak@gmail.com>"]
 edition = "2021"
 rust-version = "1.70"
@@ -65,6 +65,10 @@ path = "examples/text_classification_bilstm.rs"
 name = "learning_rate_scheduling"
 path = "examples/learning_rate_scheduling.rs"
 
+[[example]]
+name = "advanced_lr_scheduling"
+path = "examples/advanced_lr_scheduling.rs"
+
 [[example]]
 name = "gru_example"
 path = "examples/gru_example.rs"

diff --git a/README.md b/README.md
@@ -35,9 +35,11 @@ graph TD
 
 - **LSTM, BiLSTM & GRU Networks** with multi-layer support
 - **Complete Training System** with backpropagation through time (BPTT)
-- **Multiple Optimizers**: SGD, Adam, RMSprop with learning rate scheduling
+- **Multiple Optimizers**: SGD, Adam, RMSprop with comprehensive learning rate scheduling
+- **Advanced Learning Rate Scheduling**: 12 different schedulers including OneCycle, Warmup, Cyclical, and Polynomial
 - **Loss Functions**: MSE, MAE, Cross-entropy with softmax
 - **Advanced Dropout**: Input, recurrent, output dropout, variational dropout, and zoneout
+- **Schedule Visualization**: ASCII visualization of learning rate schedules
 - **Model Persistence**: Save/load models in JSON or binary format
 - **Peephole LSTM variant** for enhanced performance
 
@@ -47,7 +49,7 @@ Add to your `Cargo.toml`:
 
 ```toml
 [dependencies]
-rust-lstm = "0.3.0"
+rust-lstm = "0.4.0"
 ```
 
 ### Basic Usage
@@ -185,18 +187,50 @@ graph LR
     style D2 fill:#fff3e0
 ```
 
-### Learning Rate Scheduling
+### Advanced Learning Rate Scheduling
+
+The library includes 12 different learning rate schedulers with visualization capabilities:
 
 ```rust
-use rust_lstm::{create_step_lr_trainer, create_one_cycle_trainer};
+use rust_lstm::{
+    create_step_lr_trainer, create_one_cycle_trainer, create_cosine_annealing_trainer,
+    ScheduledOptimizer, PolynomialLR, CyclicalLR, WarmupScheduler,
+    LRScheduleVisualizer, Adam
+};
 
 // Step decay: reduce LR by 50% every 10 epochs
 let mut trainer = create_step_lr_trainer(network, 0.01, 10, 0.5);
 
 // OneCycle policy for modern deep learning
 let mut trainer = create_one_cycle_trainer(network, 0.1, 100);
+
+// Cosine annealing with warm restarts
+let mut trainer = create_cosine_annealing_trainer(network, 0.01, 20, 1e-6);
+
+// Advanced combinations - Warmup + Cyclical scheduling
+let base_scheduler = CyclicalLR::new(0.001, 0.01, 10);
+let warmup_scheduler = WarmupScheduler::new(5, base_scheduler, 0.0001);
+let optimizer = ScheduledOptimizer::new(Adam::new(0.01), warmup_scheduler, 0.01);
+
+// Polynomial decay with visualization
+let poly_scheduler = PolynomialLR::new(100, 2.0, 0.001);
+LRScheduleVisualizer::print_schedule(poly_scheduler, 0.01, 100, 60, 10);
 ```
 
+#### Available Schedulers:
+- **ConstantLR**: No scheduling (baseline)
+- **StepLR**: Step decay at regular intervals
+- **MultiStepLR**: Multi-step decay at specific milestones
+- **ExponentialLR**: Exponential decay each epoch
+- **CosineAnnealingLR**: Smooth cosine oscillation
+- **CosineAnnealingWarmRestarts**: Cosine with periodic restarts
+- **OneCycleLR**: One cycle policy for super-convergence
+- **ReduceLROnPlateau**: Adaptive reduction on validation plateaus
+- **LinearLR**: Linear interpolation between rates
+- **PolynomialLR** ✨: Polynomial decay with configurable power
+- **CyclicalLR** ✨: Triangular, triangular2, and exponential range modes
+- **WarmupScheduler** ✨: Gradual warmup wrapper for any base scheduler
+
 ## Architecture
 
 - **`layers`**: LSTM and GRU cells (standard, peephole, bidirectional) with dropout
@@ -223,7 +257,8 @@ cargo run --example bilstm_example           # Bidirectional LSTM
 cargo run --example dropout_example          # Comprehensive dropout demo
 
 # Learning and scheduling
-cargo run --example learning_rate_scheduling
+cargo run --example learning_rate_scheduling    # Basic schedulers
+cargo run --example advanced_lr_scheduling      # Advanced schedulers with visualization
 
 # Real-world applications
 cargo run --example stock_prediction
@@ -257,8 +292,12 @@ cargo run --example model_inspection
 ### Learning Rate Schedulers
 - **StepLR**: Decay by factor every N epochs
 - **OneCycleLR**: One cycle policy (warmup + annealing)
-- **CosineAnnealingLR**: Smooth cosine oscillation
+- **CosineAnnealingLR**: Smooth cosine oscillation with warm restarts
 - **ReduceLROnPlateau**: Reduce when validation loss plateaus
+- **PolynomialLR**: Polynomial decay with configurable power
+- **CyclicalLR**: Triangular oscillation with multiple modes
+- **WarmupScheduler**: Gradual increase wrapper for any scheduler
+- **LinearLR**: Linear interpolation between learning rates
 
 ## Testing
 
@@ -295,6 +334,7 @@ cargo run --example text_classification_bilstm  # Classification accuracy
 
 ## Version History
 
+- **v0.4.0**: Advanced learning rate scheduling with 12 different schedulers, warmup support, cyclical learning rates, polynomial decay, and ASCII visualization
 - **v0.3.0**: Bidirectional LSTM networks with flexible combine modes
 - **v0.2.0**: Complete training system with BPTT and comprehensive dropout
 - **v0.1.0**: Initial LSTM implementation with forward pass