AVX-512

toc an expansion of Intel's the AVX and AVX2 instructions using the [|EVEX prefix], featuring **32** 512-bit wide vector SIMD registers zmm0 through zmm31, keeping either eight doubles or integer quad words such as bitboards, and eight (seven) dedicated mask registers which specify which vector elements are operated on and written. If the Nth bit of a vector mask register is set, then the Nth element of the destination vector is overridden with the result of the operation; otherwise, dependent of the instruction, the element is zeroed, or overridden by an element from another source register (remains unchanged if same source). A vector mask register can be set using vector compare instructions, instructions to move contents from a GP register, or a special subset of vector mask arithmetic instructions.
 * Home * Hardware * x86-64 * AVX-512**
 * AVX-512**,

=Extensions= AVX-512 consists of multiple extensions not all meant to be supported by all AVX-512 capable processors. Only the core extension AVX-512F (AVX-512 Foundation) is required by all implementations AVX-512F and AVX-512CD were first implemented in the [|Xeon Phi] processor and coprocessor known by the code name [|Knights Landing], launched on June 20, 2016.

Reg:Bit || Word variable precision ||^ || EDX:02 || Packed Single precision ||^  || EDX:03 ||
 * ~ Extension ||~  ||~ Description ||~ Architecture ||~ [|CPUID 7]
 * AVX-512F ||~  || Foundation || [|Knights Landing] || EBX:16 ||
 * AVX-512CD ||~  || Conflict Detection Instructions ||^   || EBX:28 ||
 * AVX-512ER ||~  || Exponential and Reciprocal Instructions ||^   || EBX:27 ||
 * AVX-512PF ||~  || Prefetch Instructions ||^   || EBX:26 ||
 * AVX-512BW ||~  || Byte and Word Instructions || [|Skylake X] || EBX:30 ||
 * AVX-512DQ ||~  || Doubleword and Quadword Instructions ||^   || EBX:17 ||
 * AVX-512VL ||~  || Vector Length Extensions ||^   || EBX:31 ||
 * AVX-512IFMA ||~  || Integer Fused Multiply Add || [|Cannonlake] || EBX:21 ||
 * AVX-512VBMI ||~  || Vector Byte Manipulation Instructions ||^   || ECX:01 ||
 * AVX-512VPOPCNTDQ ||~  || Vector Population Count || [|Knights Mill]    || ECX:14 ||
 * AVX-512-4VNNIW ||~  || Vector Neural Network Instructions
 * AVX-512-4FMAPS ||~  || Fused Multiply Accumulation

=Selected Instructions=

VPTERNLOG
AVX-512F features the instruction VPTERNLOGQ (or VPTERNLOGD) to perform bitwise [|ternary logic], for instance to operate on vectors of bitboards. Three input vectors are bitwise combined by an operation determined by an immediate byte operand (**imm8**), whose 256 possible values corresponds with the boolean output vector of the [|truth table] for all eight combinations of the three input bits, as demonstrated with some selected imm8 values in the table below : Following VPTERNLOGQ intrinsics are declared, where the maskz version sets unmasked destination quad word elements to zero, while the mask version copies unmasked elements from s: code format="cpp" __m512i _mm512_ternarylogic_epi64(__m512i a, __m512i b, __m512i c, int imm8); __m512i _mm512_maskz_ternarylogic_epi64( __mmask8 m, __m512i a, __m512i b, __m512i c, int imm8); __m512i _mm512_mask_ternarylogic_epi64(__m512i s, __mmask8 m, __m512i a, __m512i b, __m512i c, int imm8); code 
 * ~ Input ||~  ||~   ||||||||||||||||||||||||||||||||||||||||||||~ Output of Operations ||
 * ~ ||~  ||~   ||~   ||~  ||~ imm8 ||= 0x00 ||~  ||= 0x01 ||~  ||= 0x16 ||~  ||= 0x17 ||~  ||= 0x28 ||~  ||= 0x80 ||~  ||= 0x88 ||~  ||= 0x96 ||~  ||= 0xca ||~  ||= 0xe8 ||~  ||= 0xfe ||~  ||= 0xff ||
 * ~ # ||~ a ||~ b ||~ c ||~ ||~ C-exp ||= false ||~ ||= ~(a|b|c) ||~  ||= a?~(b|c):b^c ||~  || minor(a,b,c) ||~ ||= c&(a^b) ||~  ||= a&b&c ||~  ||= b&c ||~  ||= a^b^c ||~  ||= a?b:c ||~  ||= major(a,b,c) ||~  ||= a|b|c ||~  ||= true ||
 * ~ 0 ||= 0 ||= 0 ||= 0 ||~ ||~ ||= 0 ||~ ||= 1 ||~ ||= 0 ||~ ||= 1 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 1 ||
 * ~ 1 ||= 0 ||= 0 ||= 1 ||~ ||~ ||= 0 ||~ ||= 0 ||~ ||= 1 ||~ ||= 1 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 1 ||~ ||= 1 ||~ ||= 0 ||~ ||= 1 ||~ ||= 1 ||
 * ~ 2 ||= 0 ||= 1 ||= 0 ||~ ||~ ||= 0 ||~ ||= 0 ||~ ||= 1 ||~ ||= 1 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 1 ||~ ||= 0 ||~ ||= 0 ||~ ||= 1 ||~ ||= 1 ||
 * ~ 3 ||= 0 ||= 1 ||= 1 ||~ ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 1 ||~ ||= 0 ||~ ||= 1 ||~ ||= 0 ||~ ||= 1 ||~ ||= 1 ||~ ||= 1 ||~ ||= 1 ||
 * ~ 4 ||= 1 ||= 0 ||= 0 ||~ ||~ ||= 0 ||~ ||= 0 ||~ ||= 1 ||~ ||= 1 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 1 ||~ ||= 0 ||~ ||= 0 ||~ ||= 1 ||~ ||= 1 ||
 * ~ 5 ||= 1 ||= 0 ||= 1 ||~ ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 1 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 1 ||~ ||= 1 ||~ ||= 1 ||
 * ~ 6 ||= 1 ||= 1 ||= 0 ||~ ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 1 ||~ ||= 1 ||~ ||= 1 ||~ ||= 1 ||
 * ~ 7 ||= 1 ||= 1 ||= 1 ||~ ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 0 ||~ ||= 1 ||~ ||= 1 ||~ ||= 1 ||~ ||= 1 ||~ ||= 1 ||~ ||= 1 ||~ ||= 1 ||

VPLZCNT
AVX-512CD has Vector Leading Zero Count - VPLZCNTQ counts leading zeroes on a vector of eight bitboards in parallel - using following intrinsics, where the maskz version sets unmasked destination elements to zero, while the mask version copies unmasked elements from s: code format="cpp" __m512i _mm512_lzcnt_epi64(__m512i a); __m512i _mm512_maskz_lzcnt_epi64(__mmask8 m, __m512i a); __m512i _mm512_mask_lzcnt_epi64(__m512i s, __mmask8 m, __m512i a); code 

VPOPCNT
The future AVX-512VPOPCNTDQ extension has a vector population count instruction to count one bits of either 16 32-bit double words (VPOPCNTD) or 8 64-bit quad words aka bitboards (VPOPCNTQ) in parallel.

=See also=
 * AltiVec
 * AVX
 * AVX2
 * SIMD and SWAR Techniques
 * SSE2
 * XOP

=Manuals=
 * [|Intel® Architecture Instruction Set Extensions Programming Reference] (pdf)

=External Links=
 * [|AVX-512 from Wikipedia]
 * [|Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Overview]
 * [|Intel Instruction Set Architecture Extensions]

Blog Postings

 * [|AVX-512 instructions | Intel® Developer Zone] by [|James Reinders], July 23, 2013
 * [|Future instruction set: AVX-512] by [|Agner Fog], October, 09, 2013
 * [|Additional AVX-512 instructions | Intel® Developer Zone] by [|James Reinders], July 17, 2014
 * [|Processing Arrays of Bits with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) | Intel® Developer Zone] by [|Thomas Willhalm], July 24, 2014
 * [|AVX-512 May Be a Hidden Gem” in Intel Xeon Scalable Processors] by [|James Reinders], [|HPCwire], June 29, 2017

Compiler Support

 * [|Intel Intrinsics Guide - AVX-512]
 * [|Intel® Advanced Vector Extensions 2015/2016 Support in GNU Compiler Collection] (pdf) by [|Kirill Yukhin], July 2014
 * [|Guide to Automatic Vectorization with Intel AVX-512 Instructions in Knights Landing Processors - Colfax Research], May 11, 2016
 * [|Microsoft Visual Studio 2017 Supports Intel® AVX-512 | Visual C++ Team Blog] by Eric Battalio, July 11, 2017

=References= =What links here?= include page="AVX-512" component="backlinks" limit="60"
 * Up one Level**