Error in the DDP mode

when using the DDP mode to train the model, it would raise the error of "This error indicates that your module has parameters that were not used in producing loss".
Since the minilm model only uses the attention parameters, so the parameters of student model like "bert.encoder.layer.-1.output.xx" and "cls.predictions.transform.xx" would have no gradient updates.  So how to fix this problem? thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error in the DDP mode #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Error in the DDP mode #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions