Bug in torch_xla._XLAC._xla_tensors_from_aten

<img width="1087" height="67" alt="Image" src="https://github.com/user-attachments/assets/1b5d6e26-71cd-49fb-a04e-7addb5e8b2f8" />
Seems it doesn't support sharding on the sequence length when fsdp and dp is enabled, it can't figure out the sharding logic when three sharding specs are enabled.

The issue is here: https://github.com/pytorch/xla/blob/cd3bd91f1b959c27047196855649a6a933023428/torch_xla/core/xla_model.py#L1297

After printed out the sharding of the convert_fn, it is correct, but the torch-xla function couldn't finish the sharding, the RCA is on the lower level in the _xla_tensors_from_aten function

<img width="2624" height="65" alt="Image" src="https://github.com/user-attachments/assets/0f31baa4-8ccc-4697-a8e0-7f64af0b3bf1" />

To use context parallelism, we need to bypass the parrallel_loader and directly use activations that are sharded correctly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in torch_xla._XLAC._xla_tensors_from_aten #353

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug in torch_xla._XLAC._xla_tensors_from_aten #353

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions