-
Notifications
You must be signed in to change notification settings - Fork 86
Implement GCS Checkpoint #463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This reverts commit 6ddfe52.
Co-authored-by: Pedro Fontana <fontana.pedro93@gmail.com>
|
@jquesnelle @crypto-vincent @arilotter From what I see, we'd need to redeploy the program since Checkpoint now accepts the new Gcs and P2PGcs variants. But So can we remove the storage change tag? |
|
Yeah, i agree that it should be marked as a contract change only, and that the storage test passing is evidence that it hasn't changed in representation. |
Implement GCS Checkpoint to download model from Google Cloud Storage
Now, in the run config, we can set the model URL as a Google Cloud Storage bucket:
The GCS checkpoint is completely decoupled from Hugging Face one, because in the mid term the idea is to remove the HF checkpoint and just leave the GCS one
To access the bucket, pass the credentials as an env var:
GOOGLE_APPLICATION_CREDENTIALS=<CREDENTIAL_PATH> just start-training-localnet-light-clientAlso supports evaluation with models stored in GCS:
Summary
Checkpoint::Gcs(GcsRepo)Checkpoint::GcsP2P(gcs_repo)to falls back to GCS if P2P failspub async fn download_model_from_gcs_async()to download model from GCSIncludes