Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
Hi Haolin Chen @hlnchen , thanks so much for starting this PR! There are a few things I'd ask to get started on this.
Thanks so much! Hopefully the above isn't too much work to get a contribution in, we'd love to have your support. |
|
One more comment, please make sure that if checkpointing is not enabled, PR has no effect on the way models are currently trained. We can relax this limitation later, but for now let's play it safe to make sure that new functionality doesn't break anything. |
|
@yaoshiang actually there is something we are not sure: when using |
3 config added to control checkpoints:
checkpoint_dir: directory of the checkpoint, use gs://bucket/path/to/checkpoint to save to gcs bucketresume_from_checkpoint:null, will not load checkpoint but load from huggingface pretrained weightscheckpoint_dir/resume_from_checkpoint/latestor the step not found by the manager, then last checkpoint will be loadedafter checkpoint loading will skip first
stepiterations by looping the dataloader.What's not included: