Skip to content

Error in the DDP mode #3

@jinxinglu

Description

@jinxinglu

when using the DDP mode to train the model, it would raise the error of "This error indicates that your module has parameters that were not used in producing loss".
Since the minilm model only uses the attention parameters, so the parameters of student model like "bert.encoder.layer.-1.output.xx" and "cls.predictions.transform.xx" would have no gradient updates. So how to fix this problem? thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions