I recently needed to use Qualcomm’s Turing stream cipher for a Java project. Some Googling found a Java port of Qualcomm’s C++ reference implementation, which behaves almost exactly as expected: its output is always correct, but its runtime is significantly slower than the C++ implementation. As in, an order of magnitude slower. If this were the late ’90s, such a performance difference between C++ and Java might be expected… but it isn’t, so it’s not.
Digging into the code revealed several areas that required sequential execution, thus preventing HotSpot’s JIT-compiler from really going to town in its optimizations. Qualcomm’s implementation, by contrast, makes liberal use of loop unrolling and bit-parallel operations. By turing as many of these sequential operations as possible into bit-parallel operations, we can help the JVM perform more operations per clock cycle and allow it to inline critical methods.
Starting with the existing Java port as a base, I ported over the bit-parallel optimizations from Qualcomm’s implementation, in the process eliminating most of the loops requiring serial execution. The results are stark: with the original Java port, VisualVM showed that decrypting a 14 GB file resulted in 38.5 seconds spent in the Turing code. By contrast, the optimized variant spent only 5.8 seconds in the same code.
Qualcomm’s license allows both source and binary modification and redistribution, provided they are credited for Turing’s development. If that license suits your needs and you need a fast Java implementation of the Turing cipher, my port is available on GitHub as QuickTuring. Comments and bug reports are welcome.