<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Bitbashing (Posts about simd)</title><link>https://randombit.net/bitbashing/</link><description></description><atom:link href="https://randombit.net/bitbashing/categories/simd.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2019 &lt;a href="mailto:jack@randombit.net"&gt;Jack Lloyd&lt;/a&gt; </copyright><lastBuildDate>Fri, 02 Aug 2019 22:27:07 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>4x4 integer matrix transpose in SSE2</title><link>https://randombit.net/bitbashing/posts/integer_matrix_transpose_in_sse2.html</link><dc:creator>Jack Lloyd</dc:creator><description>&lt;div&gt;&lt;p&gt;The Intel SSE2 intrinsics has a macro &lt;tt class="docutils literal"&gt;_MM_TRANSPOSE4_PS&lt;/tt&gt;
which performs a matrix transposition on a 4x4 array represented by
elements in 4 SSE registers. However, it doesn't work with integer
registers because Intel intrinsics make a distinction between integer
and floating point SSE registers. Theoretically one could cast and use
the floating point operations, but it seems quite plausible that this
will not round trip properly; for instance if one of your integer
values happens to have the same value as a 32-bit IEEE denormal.&lt;/p&gt;
&lt;p&gt;However it is easy to do with the punpckldq, punpckhdq, punpcklqdq,
and punpckhqdq instructions; code and diagrams ahoy.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://randombit.net/bitbashing/posts/integer_matrix_transpose_in_sse2.html"&gt;Read more…&lt;/a&gt; (1 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>programming</category><category>simd</category><guid>https://randombit.net/bitbashing/posts/integer_matrix_transpose_in_sse2.html</guid><pubDate>Thu, 08 Oct 2009 04:00:00 GMT</pubDate></item><item><title>Speeding up Serpent: SIMD Edition</title><link>https://randombit.net/bitbashing/posts/serpent_in_simd.html</link><dc:creator>Jack Lloyd</dc:creator><description>&lt;div&gt;&lt;p&gt;The &lt;a class="reference external" href="http://www.cl.cam.ac.uk/~rja14/serpent.html"&gt;Serpent&lt;/a&gt;
block cipher was one of the 5 finalists in the AES competition, and is
widely thought to be the most secure of them due to its conservative
design.  It was also considered the slowest candidate, which is one
major reason it did not win the AES contest. However, it turns out
that on modern machines one can use SIMD operations to implement
Serpent at speeds quite close to AES.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://randombit.net/bitbashing/posts/serpent_in_simd.html"&gt;Read more…&lt;/a&gt; (3 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>crypto</category><category>simd</category><guid>https://randombit.net/bitbashing/posts/serpent_in_simd.html</guid><pubDate>Wed, 09 Sep 2009 04:00:00 GMT</pubDate></item><item><title>Optimizing Forward Error Correction Coding Using SIMD Instructions</title><link>https://randombit.net/bitbashing/posts/forward_error_correction_using_simd.html</link><dc:creator>Jack Lloyd</dc:creator><description>&lt;div&gt;&lt;p&gt;Forward error correction (FEC) is a technique for handling lossy
storage devices or transmission channels. A FEC code takes &lt;em&gt;k&lt;/em&gt; blocks
of data and produces an additional &lt;em&gt;m&lt;/em&gt; blocks of encoding information,
such that any set of &lt;em&gt;k&lt;/em&gt; of the blocks (out of the &lt;em&gt;k+m&lt;/em&gt; total) is
sufficient to recover the original data. One can think of RAID5 as a
FEC with arbitrary &lt;em&gt;k&lt;/em&gt; and &lt;em&gt;m&lt;/em&gt; fixed at 1; most FEC algorithms allow
wide latitude for the values that can be sent, allowing the code to be
adjusted for the reliability expectations and needs of the particular
channel and application. For instance, the &lt;a class="reference external" href="http://allmydata.org/trac/tahoe"&gt;Tahoe&lt;/a&gt; distributed filesystem splits
stored files using &lt;em&gt;k&lt;/em&gt; of 3 and &lt;em&gt;m&lt;/em&gt; of 7, so as long as at least 30%
of the devices storing the file survive, the original file can be
recoved.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://randombit.net/bitbashing/posts/forward_error_correction_using_simd.html"&gt;Read more…&lt;/a&gt; (9 min remaining to read)&lt;/p&gt;&lt;/div&gt;</description><category>programming</category><category>simd</category><guid>https://randombit.net/bitbashing/posts/forward_error_correction_using_simd.html</guid><pubDate>Mon, 19 Jan 2009 05:00:00 GMT</pubDate></item></channel></rss>