Home * Hardware * x86-64
230px-Quad-Core_AMD_Opteron_processor.jpg

x86-64 or x64,
an 64-bit x86-extension, designed by AMD as Hammer- or K8 architecture with Athlon 64 and Opteron cpus. It has been cloned by Intel under the name EMT64 and later Intel 64. Beside 64-bit general purpose extensions, x86-64 supports MMX-, x87- as well as the 128-bit SSE- and SSE2-instruction sets. According to the CPUID-instructions, further SIMD Streamig Extensions, such as SSE3, SSSE3 (Intel only), SSE4 (Core2, K10), AVX, AVX2 and AVX-512, and AMD's 3DNow!, Enhanced 3DNow! and XOP.
Quad-core AMD Opteron processor [1]

Register File

x86-64 doubles the number of x86 general purpose- and XMM registers.

General Purpose

The 16 general purpose registers may be treated as 64 bit Quad Word (bitboard), 32 bit Double Word, 16 bit Word and high, low Byte:
64
32
16
8 high
8 low
Purpose
RAX
EAX
AX
AH
AL
GP, Accumulator
RBX
EBX
BX
BH
BL
GP, Index Register
RCX
ECX
CX
CH
CL
GP, Counter, variable shift, rotate via CL
RDX
EDX
DX
DH
DL
GP, high Accumulator mul/div
RSI
ESI
SI
-
-
GP, Source Index
RDI
EDI
DI
-
-
GP, Destination Index
RSP
ESP
SP
-
-
Stack Pointer
RBP
EBP
BP
-
-
GP, Base Pointer
R08
R08D
R08W
R08H
R08L
GP
..
..
..
..
..
GP
R15
R15D
R15W
R15H
R15L
GP

MMX

Eight 64-bit MMX-Registers: MM0 - MM7.
Treated as Double, Quad Word or vector of two Floats, Double Words, vector if four Words or eight Bytes.

SSE/SSE*

Sixteen 128-bit XMM-Registers: XMM0 - XMM15.
Treated as vector of two Doubles or Quad Words, as vector of four Floats or Double Words, and as vector of eight Words or 16 Bytes.

AVX, AVX2/XOP

Intel Sandy Bridge and AMD Bulldozer
Sixteen 256-bit YMM-Registers: YMM0 - YMM15 (shared by XMM as lower half).
Treated as vector of four Doubles or Quad Words, as vector of eight Floats or Double Words, and as vector of 15 Words or 32 Bytes.

AVX-512

Intel Xeon Phi (2015)
32 512-bit ZMM-Registers: ZMM0 - ZMM31
Eight vector mask registers

Instructions

Useful instructions for bitboard-applications are by default not supported by high-level programming languages. Available through (inline) Assembly or compiler intrinsics of various C-Compilers [2].

General Purpose

x86-64 C-Intrinsic reference from MSDN Library:
Mnemonic
Description
C-Intrinsic
Remark
bsf
bit scan forward
_BitScanForward64

bsr
bit scan reverse
_BitScanReverse64

bswap
byte swap
_byteswap_uint64

bt
bit test
_bittest64

btc
bit test and complement
_bittestandcomplement64

btr
bit test and reset
_bittestandreset64

bts
bit test and set
_bittestandset64

cpuid
cpuid
_cpuid
cpuid
imul
signed multiplication
_mulh, _mul128

lzcnt
leading zero count
_lzcnt16, _lzcnt, _lzcnt64
cpuid, SSE4a
mul
unsigned multiplication
_umulh, _umul128

popcnt
population count
_mm_popcnt_u64,
_popcnt16, _popcnt, _popcnt64
cpuid, SSE4.2, SSE4a
rdtsc
read performance counter
_rdtsc

rol, ror
rotate left, right
_rotl, _rotl64, _rotr_rotr64


Bit-Manipulation


SSE2

x86 and x86-64 - SSE2 C-integer intrinsic reference from MSDN Library, the intrinsic data type _m128i refers a xmm-register or memory location:
Mnemonic
Description

C-Intrinsic

bitwise logical
return

parameter
pand
packed and, r := a & b
_m128i
_mm_and_si128
(_m128i a, _m128i b)
pandn
packed and not, r := ~a & b
_m128i
_mm_andnot_si128
(_m128i a, _m128i b)
por
packed or, r := a | b
_m128i
_mm_or_si128
(_m128i a, _m128i b)
pxor
packed xor, r:= a ^ b
_m128i
_mm_xor_si128
(_m128i a, _m128i b)
quad word shifts



psrlq
packed shift right logical quad
_m128i
_mm_srl_epi64
(_m128i a, _m128i cnt)

immediate
_m128i
_mm_srli_epi64
(_m128i a, int cnt)
psllq
packed shift left logical quad
_m128i
_mm_sll_epi64
(_m128i a, _m128i cnt)

immediate
_m128i
_mm_slli_epi64
(_m128i a, int cnt)
arithmetical



paddb
packed add bytes
_m128i
_mm_add_epi8
(_m128i a, _m128i b)
psubb
packed subtract bytes
_m128i
_mm_sub_epi8
(_m128i a, _m128i b)
psadbw
packed sum of absolute differences
of bytes into a word
_m128i
_mm_sad_epu8
(_m128i a, _m128i b)
pmaxsw
packed maximum signed words
_m128i
_mm_max_epi16
(_m128i a, _m128i b)
pmaxub
packed maximum unsigned bytes
_m128i
_mm_max_epu8
(_m128i a, _m128i b)
pminsw
packed minimum signed words
_m128i
_mm_min_epi16
(_m128i a, _m128i b)
pminub
packed minimum unsigned bytes
_m128i
_mm_min_epu8
(_m128i a, _m128i b)
pcmpeqb
packed compare equal bytes
_m128i
_mm_cmpeq_epi8
(_m128i a, _m128i b)
pmullw
packed multiply mow signed (unsigned) word
_m128i
_mm_mullo_epi16
(_m128i a, _m128i b)
pmulhw
packed multiply high signed word
_m128i
_mm_mulhi_epi16
(_m128i a, _m128i b)
pmulhuw
packed multiply high unsigned word
_m128i
_mm_mulhi_epu16
(_m128i a, _m128i b)
pmaddwd
packed multiply words and add doublewords
_m128
_mm_madd_epi16
(_m128i a, _m128i b)
unpack, shuffle



punpcklbw
unpack and interleave low bytes
gGhHfFeE:dDcCbBaA :=
xxxxxxxx:GHFEDCBA #
xxxxxxxx:ghfedcba
_m128i
_mm_unpacklo_epi8
(_m128i A, _m128i a)
punpckhbw
unpack and interleave high bytes
gGhHfFeE:dDcCbBaA :=
GHFEDCBA:xxxxxxxx #
ghfedcba:xxxxxxxx
_m128i
_mm_unpackhi_epi8
(_m128i A, _m128i a)
punpcklwd
unpack and interleave low words
dDcC:bBaA := xxxx:DCBA#xxxx:dcba
_m128i
_mm_unpacklo_epi16
(_m128i A, _m128i a)
punpckhwd
unpack and interleave high words
dDcC:bBaA := DCBA:xxxx#dcba:xxxx
_m128i
_mm_unpackhi_epi16
(_m128i A, _m128i a)
punpckldq
unpack and interleave low doublewords
bB:aA := xx:BA # xx:ba
_m128i
_mm_unpacklo_epi32
(_m128i A, _m128i a)
punpckhdq
unpack and interleave high doublewords
bB:aA := BA:xx # ba:xx
_m128i
_mm_unpackhi_epi32
(_m128i A, _m128i a)
punpcklqdq
unpack and interleave low quadwords
a:A := x:A # x:a
_m128i
_mm_unpacklo_epi64
(_m128i A, _m128i a)
punpckhqdq
unpack and interleave high quadwords
a:A := A:x # a:x
_m128i
_mm_unpackhi_epi64
(_m128i A, _m128i a)
pshuflw
packed shuffle low words
_m128i
_mm_shufflelo_epi16
(_m128i a, int imm)
pshufhw
packed shuffle high words
_m128i
_mm_shufflehi_epi16
(_m128i a, int imm)
pshufd
packed shuffle doublewords
_m128i
_mm_shuffle_epi32
(_m128i a, int imm)
load, store, moves



movdqa
move aligned double quadword
xmm := *p
_m128i
_mm_load_si128
(_m128i *p)
movdqu
move unaligned double quadword
xmm := *p
_m128i
_mm_loadu_si128
(_m128i *p)
movdqa
move aligned double quadword
*p := xmm
void
_mm_store_si128
(_m128i *p, _m128i a)
movdqu
move unaligned double quadword
*p := xmm
void
_mm_storeu_si128
(_m128i *p, _m128i a)
movq
move quadword, xmm := gp64
_m128i
_mm_cvtsi64_si128
(_int64 a)
movq
move quadword, gp64 := xmm
_int64
_mm_cvtsi128_si64
(_m128i a)
movd
move double word or quadword
xmm := gp64
_m128i
_mm_cvtsi64x_si128
(_int64 value)
movd
move doubleword, xmm := gp32
_m128i
_mm_cvtsi32_si128
(int a)
movd
move doubleword, gp32 := xmm
int
_mm_cvtsi128_si32
(_m128i a)
pextrw
extract packed word, gp16 := xmm[i]
int
_mm_extract_epi16
(_m128i a, int imm)
pinsrw
packed insert word, xmm[i] := gp16
_m128i
_mm_insert_epi16
(_m128i a, int b, int imm)
pmovmskb
packed move mask byte,
gp32 := 16 sign-bits(xmm)
int
_mm_movemask_epi
(_m128i a)
cache support



prefetch

void
_mm_prefetch
(char * p , int i )


Software

Operating Systems

Development

Assembly

C-Compiler


See also


Publications


Manuals

Agner Fog


AMD

Instructions

Optimization Guides


Intel

Instructions

Optimization Guides


Forum Posts


External Links


AMD


Intel


Instruction sets


References

  1. ^ Die shot of AMD Opteron quad-core processor, Wikimedia Commons
  2. ^ Intel(R) C++ Compiler User and Reference Guides covers Intrinsics
  3. ^ Georg Hager's Blog | Random thoughts on High Performance Computing
  4. ^ Intel Nehalem Core i3
  5. ^ Application binary interface from Wikipedia

What links here?


Up one Level