ReLU and residual connections

Have you tried using ReLU activations and residual connections in your network? They might make it easier to train, reduce the time needed to optimize it and maybe even improve the final results. 8 layers is a lot to be using a tanh function with no residuals. I am really curious about this possibility.