why does LSTM can be discarded during inference?

I am confused about this sentence in your papar of "GPT Understands, Too":

**Moreover, in the inference, we only need the output embedding h and can discard the LSTM head.**

If the LSTM encoder was used during training, and the finally embeddings was combined by the outputs of LSTM encoder and the original embeddings, while it was discarded duraing inference, the finally embeddings was just the outputs of two embedding layers. Does this make different performance?

So why LSTM can be discarded in the inference?

Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why does LSTM can be discarded during inference? #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

why does LSTM can be discarded during inference? #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions