Fake SSE stubs
Why?
I was bored and my friend asked me, "How slow would this be without SSE?"
I replied, "Really slow, because I'm not rewriting an optimal implementation to make it run."
This is where I got the idea for trunk/fakesse.h.
Disclamer
These functions are REALLY slow. (at least 100x slower than using XMM registers and SSE intrinsics)
What exactly are these?
Well, these are scalar implementations of SSE intrinsics. If __SSE2__ isn't defined, __m128i won't be nor will any of the SSE2 intrinsics.
If it's this condition, I define my own __m128i
typedef struct { union { unsigned char byte[16]; uint32_t int32[4]; } u; } __m128i;
Then, I re-implement the SSE2 intrinsics I use such as
static inline __m128i _mm_srli_si128( __m128i A, int imm ) { int i; __m128i ret; for( i = 0; i < 16-imm; i++ ) { ret.u.byte[i] = A.u.byte[i+imm]; } for( i = 16-imm; i < 16; i++ ) { ret.u.byte[i] = 0x00; } return ret; }
See? This is going to be super slow. Instead of taking a couple clock cycles using a real SSE2 instruction, it will take a couple hundred instead.
Would I use these in a production environment?
No! I just wrote them for fun. They're way too slow. Also, it's only a very small subset, and they probably don't behave exactly as the original intrinsics do. Just enough for my application to run. I also didn't take into account saturation for addition, and signed/unsigned. I just did enough for my app to work.
Still, I think there's something beautiful about the compatibility layer they provide. Mostly in the irony of it.
I bet some people will cry after seeing the file... It defeats the entire purpose of SIMD.
I guess there is an upside. Every sse2 intrinsic I use is re-implemented here, so you can always figure out what the heck my code is doing if you're not familiar with the intrinsics.
Little Endian isn't Good Times
Getting these to work was miserable, especiall the _mm_sr*i_si128 and other byte instructions. Everything is completely backwards. To shift left, you actually have to shift each byte right because the bites are stored in little endian. It makes my brain hurt.
