Skip to content

why does LSTM can be discarded during inference? #43

@MXuer

Description

@MXuer

I am confused about this sentence in your papar of "GPT Understands, Too":

Moreover, in the inference, we only need the output embedding h and can discard the LSTM head.

If the LSTM encoder was used during training, and the finally embeddings was combined by the outputs of LSTM encoder and the original embeddings, while it was discarded duraing inference, the finally embeddings was just the outputs of two embedding layers. Does this make different performance?

So why LSTM can be discarded in the inference?

Thanks a lot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions