The current implementation only has hand-optimized size 3 windows and a general implementation for >=5. Maybe those shoudl be optimized as well.