There are various implementations given and described here: http://www.mersenneforum.org/showthread.php?p=182647 Some of these implementations are faster than BPSW for numbers that fit in 32 or 64 bits.