-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
I tried adapting the model by adding new parameter into a submodel as follows:
# Added within init of ESMplusplusForMaskedLM
self.some_submodel = Submodel()
# Added within init of new Submodel class
self.register_parameter(
"some_param", torch.nn.Parameter(torch.randn(n1, n2)))
While torch.randn(n1, n2) does not return 0-values (or at least it is very unlikely) the initialisation of the new param in from_pretrained actually contains many 0 values and requires re initialization (as else the losses do not behave well).
model = ESMplusplusForMaskedLM.from_pretrained(
path_esm, local_files_only=True,trust_remote_code=True,
)
print((model.some_submodel.some_param==0).sum())
torch.nn.init.normal_(model.some_submodel.some_param)
print((model.some_submodel.some_param==0).sum())
# Out
tensor(61426) # After loading
tensor(0) # After reinitialisation
Is there a recommended way to get over that in from_pretrained rather than doing the reinitialization manually?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels