Skip to content

Conversation

@dapopov-st
Copy link
Contributor

@dapopov-st dapopov-st commented Apr 23, 2025

Llama Training with Adapter Integration

Overview

This PR introduces a custom Llama implementation with an adapter pattern for integration with LLM Foundry and MosaicML Composer. The changes provide an end-to-end workflow for training Llama models (tested locally, need to be ported over to Modal).

Key Changes

Custom Llama Implementation (llmfoundry/models/llama/*)

  • Updated model architecture
  • Model definition and training utilities
  • Support for both LoRA fine-tuning and full model training

Adapter Pattern Integration (llmfoundry/models/llama/model.py)

  • Custom adapter implementation working with Composer's training framework
  • Weight transfer between HuggingFace and custom implementation
  • Parameter management for memory usage considerations

Local Training Workflow (local_llama_training_instruct.py)

  • End-to-end script for local model training
  • Dataset preparation, training, conversion, evaluation, and inference
  • Configured for hardware with memory constraints

Evaluation Improvements (llmfoundry/command_utils/eval.py)

  • PEFT adapter format handling to address device metadata issues
  • Conversion between safetensors and bin formats
  • Error handling for model evaluation

Configuration Templates (scripts/train/yamls/llama/*)

  • YAML templates for various training scenarios
  • Settings for training performance
  • Examples for both LoRA and full model fine-tuning

Modal Integration

For Modal deployment, the integration follows the same pattern with these key considerations:

  • Model accessibility through HF_TOKEN and secrets management
  • Using get_hf_token() and download_model_if_needed() functions where appropriate
  • Container setup based on Dockerfile-dpv-branch
  • Training configuration using the YAML template structure

The implementation is designed to work in both local and Modal environments.

Additional Documentation

Please refer to the updated README.md for implementation details, including:

  • Custom training configuration options
  • Weight loading mechanisms
  • Model architecture approaches
  • Adapter pattern (from programming) for bridging custom model files with LLM Foundry's training architecture
  • Model registration and framework integration

Testing

The implementation has been tested on hardware with 2x RTX 3090 for both LoRA fine-tuning and full model training scenarios. A single 24 GB GPU should suffice for training: in case of OOM errors, just adjust the yamls.

@galopyz
Copy link
Contributor

galopyz commented Apr 23, 2025

Awesome. Here is a simple sequence packing code that can be added in _generate_batches(self) from GreedyBestFitSequencePacker for decoder models:

if self.suppress_masking:
    labels = np.full_like(batch, self.pad_token_id)
    labels[:, :-1] = batch[:, 1:]
    yieldval = {
        "input_ids": torch.from_numpy(batch),
        "labels": torch.from_numpy(labels),
        "cu_seqlens": cu_seq_lens,
        "max_seqlen": max_seq_lens,
    }

Basically creating labels when there is no masking.

If llama model takes input batches that look like yieldval, then it would be very easy to add sequence packing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants