Bitbashing (Posts about simd)

4x4 integer matrix transpose in SSE2

Jack Lloyd — Thu, 08 Oct 2009 04:00:00 GMT

The Intel SSE2 intrinsics has a macro _MM_TRANSPOSE4_PS which performs a matrix transposition on a 4x4 array represented by elements in 4 SSE registers. However, it doesn't work with integer registers because Intel intrinsics make a distinction between integer and floating point SSE registers. Theoretically one could cast and use the floating point operations, but it seems quite plausible that this will not round trip properly; for instance if one of your integer values happens to have the same value as a 32-bit IEEE denormal.

However it is easy to do with the punpckldq, punpckhdq, punpcklqdq, and punpckhqdq instructions; code and diagrams ahoy.

Speeding up Serpent: SIMD Edition

Jack Lloyd — Wed, 09 Sep 2009 04:00:00 GMT

The Serpent block cipher was one of the 5 finalists in the AES competition, and is widely thought to be the most secure of them due to its conservative design. It was also considered the slowest candidate, which is one major reason it did not win the AES contest. However, it turns out that on modern machines one can use SIMD operations to implement Serpent at speeds quite close to AES.

Optimizing Forward Error Correction Coding Using SIMD Instructions

Jack Lloyd — Mon, 19 Jan 2009 05:00:00 GMT

Forward error correction (FEC) is a technique for handling lossy storage devices or transmission channels. A FEC code takes k blocks of data and produces an additional m blocks of encoding information, such that any set of k of the blocks (out of the k+m total) is sufficient to recover the original data. One can think of RAID5 as a FEC with arbitrary k and m fixed at 1; most FEC algorithms allow wide latitude for the values that can be sent, allowing the code to be adjusted for the reliability expectations and needs of the particular channel and application. For instance, the Tahoe distributed filesystem splits stored files using k of 3 and m of 7, so as long as at least 30% of the devices storing the file survive, the original file can be recoved.