Add RayClusterHandler to enable fairseq2 run on Ray#1096
Add RayClusterHandler to enable fairseq2 run on Ray#1096dongwang218 wants to merge 7 commits intomainfrom
Conversation
|
nice contribution ! |
setup.py
Outdated
| # listed as optional in tiktoken's pyproject.toml | ||
| # (https://github.com/openai/tiktoken/blob/main/pyproject.toml#L9) | ||
| "blobfile~=3.0.0", | ||
| "ray~=2.40", |
There was a problem hiding this comment.
I believe this should be optional. In fact, instead of making it optional, we can just do a quick import check in the cluster implementation, and raise an error advising the use to install Ray themselves. For majority of use cases, people won't be using Ray. See this as an example.
There was a problem hiding this comment.
made ray as extra. also make sure the test and existing code works without install ray.
src/fairseq2/cluster.py
Outdated
| def set_torch_distributed_variables(self) -> None: | ||
| env = self._env | ||
|
|
||
| rank_str = env.get("RANK") |
There was a problem hiding this comment.
You can (ideally should) use the get_rank, get_local_rank, get_world_size, and get_local_world_size utility functions defined here: https://github.com/facebookresearch/fairseq2/blob/main/src/fairseq2/utils/env.py#L85
They also raise an appropriate exception that causes the process to terminate gracefully.
There was a problem hiding this comment.
thanks, changed to use utility functions.
23b04af to
e20c5cf
Compare
|
I would be interested in reanimating this work! |
sure, will merge master and push the latest. |
What does this PR do? Please describe:
Allow fairseq2 to run inside ray cluster.
Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.
Check list: