-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
when using the DDP mode to train the model, it would raise the error of "This error indicates that your module has parameters that were not used in producing loss".
Since the minilm model only uses the attention parameters, so the parameters of student model like "bert.encoder.layer.-1.output.xx" and "cls.predictions.transform.xx" would have no gradient updates. So how to fix this problem? thanks.
Metadata
Metadata
Assignees
Labels
No labels