-
Notifications
You must be signed in to change notification settings - Fork 46
Open
Labels
enhancementNew feature or requestNew feature or request
Description
After merging #80 more optimization can possibly be applied:
- Evaluate if the attn_mask (mask out padded inputs and cls token) is necessary for training
- If yes, then try to use https://pytorch.org/blog/flexattention/
- If no, remove the mask during training and benefit from flash attention
- Evaluate if we can change the feed-forward dimension back to 512 (like in torch.v1)
- Try to implement
torch.compilefor deployment (probably not working due to variable input shapes) and for preprocessing
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request