-
Notifications
You must be signed in to change notification settings - Fork 57
Open
Description
Have you tried using ReLU activations and residual connections in your network? They might make it easier to train, reduce the time needed to optimize it and maybe even improve the final results. 8 layers is a lot to be using a tanh function with no residuals. I am really curious about this possibility.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels