SSE4

toc
 * Home * Hardware * x86 * SSE4**


 * SSE4** is a set of Intel and AMD ambiguous and almost disjoint x86 instruction set extensions, [|SSE4.1], [|SSE4.2] both by Intel, and [|SSE4a] by AMD.

=Intel= 

SSE4.1
Intel introduced SSE4.1 with the [|Penryn] [|Core 2] brand of the [|Core microarchitecture] in 2007 with 47 new instructions.
 * ~ Mnemonic ||~ Description ||||~ C-Intrinsic ||||
 * pcmpeqq || packed compare equal qword || _m128i || [|_mm_cmpeq_epi64] || (_m128i a, _m128i b) ||

//see Vulnerable on distant Checks with SSE4.// 

SSE4.2
[|SSE4.2] of the [|Nehalem-based] [|Core i7] was introduced in 2008 with 7 new instructions.

STTNI
SSE4.2 includes five //String and Text New Instructions// (STTNI) working on 128-bit XMM SIMD as well as general prupose registers and flags to perform character searches and comparison on two operands of 16 bytes at a time, i.e. PCMPESTRI (Packed Compare Explicit Length Strings, Return Index).

ATAI
Popcnt and crc32, working on general purpose registers, were dubbed Application-Targeted Accelerator Instructions (ATAI) as subset of SSE4.2, but should considered as disjoint instruction set concerning SSE4 compiler optimizations.

 =AMD SSE4a= [|SSE4a] was introduced by AMD with the [|K10] (Barcelona) microarchitecture.
 * ~ Mnemonic ||~ Description ||||~ C-Intrinsic ||||
 * popcnt || Population Count || int || [|_mm_popcnt_u64] || (unsigned _int64 a) ||

SIMD
Two new SIMD instructions, working on XMM registers were combined mask-shift instructions (EXTRQ/INSERTQ) and scalar streaming store instructions (MOVNTSD/MOVNTSS). These instructions are not available in Intel's SSE4. 

Advanced Bit Manipulation
The two important instructions work on general purpose registers. Leading Zero Count was not available in Intel's Application-Targeted Accelerator Instructions of SSE4.2, but later incorporated with BMI.


 * ~ Mnemonic ||~ Description ||||~ C-Intrinsic ||||
 * lzcnt || Leading Zero Count || unsigned _int64 || [|_lzcnt64] || (unsigned _int64 a) ||
 * popcnt || Population Count || unsigned _int64 || [|_popcnt64] || (unsigned _int64 a) ||

=See also=
 * AltiVec
 * AVX
 * BMI
 * MMX
 * SIMD and SWAR Techniques
 * SSE
 * SSE2
 * SSE3
 * SSSE3
 * SSE5
 * TBM
 * Vulnerable on distant Checks with SSE4
 * XOP

=Manuals= Reference]] (pdf)
 * [[http://home.ustc.edu.cn/~shengjie/REFERENCE/sse4_instruction_set.pdf|Intel® SSE4 Programming
 * [|Software Optimization Guide for AMD Family 10h and 12h Processors] (pdf)

=Forum Posts=
 * [|using Popcount and Prefetch with SSE4 hardware support] by Engin Üstün, CCC, May 19, 2012 » Population Count, Memory

=External Links=
 * [|SSE4 from Wikipedia]
 * [|MSDN - Streaming SIMD Extensions 4 Instructions]
 * [|MSDN - SSE4A and Advanced Bit Manipulation Intrinsics]
 * [|AMD and Intel incompatible - What to do?] from [|AMD Developer Central]
 * [|SSEPlus Project] from [|AMD Developer Central]
 * [|SSEPlus Project Documentation]
 * [|Agner`s CPU blog] by [|Agner Fog]
 * [|Intel Intrinsics Guide]

=References= =What links here?= include page="SSE4" component="backlinks" limit="40"
 * Up one Level**