Advanced Vector Extensions 2 (AVX2) is an expansion of the AVX instruction set. Support for 256-bit expansions of the SSE2 128-bit integer instructions will be added in AVX2, which will along with BMI2 part of Intel'sHaswell architecture planned for 2013.

Features

Beside expanding most integer AVX instructions to 256 bit, AVX2 has 3-operand general-purpose bit manipulation and multiply, vector shifts, Double- and Quad Word-granularity any-to-any permutes, and 3-operand fused multiply-accumulate support. An important catch is that not all of the instructions are simply generalizations of their 128-bit equivalents: many work "in-lane", applying the same 128-bit operation to each 128-bit half of the register instead of a 256-bit generalization of the operation. For example:

Set

Instruction

Result

AVX

vpunpckldq xmm0, ABCD, EFGH

xmm1 := AEBF

AVX2

vpunpckldq ymm0, ABCDEFGH, IJKLMNOP

xmm1 := AIBJEMFN

If vpunpckldq had been expanded in the more intuitive fashion, the result of the AVX2 operation would be AIBJCKDL. The reason for this design might be to allow AVX to be implemented more easily with two separate 128-bit arithmetic units.

Some AVX2 instructions, such as type conversion instructions, take both xmm and ymm registers as arguments. For example:

With AVX2 each data element, such as a bitboard of a quad-bitboard, may be shifted left or right individually, as specified by the second source operand, with following Assemblymnemonics and C intrinsic equivalents:

For each bitboard in a destination quad-bitboard, the Qwords Element Permutation (VPERMQ) instruction ^{[1]} selects one bitboard of a source quad-bitboard. This permits a bitboard in the source operand to be copied to multiple locations in the destination.

Following code extracts the piece-code as "vertical nibble" from a quad-bitboard as board representation inside a register, "indexed" by square. The idea is to shift the square bits to the leftmost bit, the sign bit of each bitboard, to perform the VPMOVMSKB instruction to get the sign bits of all 32 bytes into a general purpose register. Unfortunately, there is no VPMOVMSKQ to get only the signs of four bitboards, so some more masking and mapping is required to get the four-bit piece code ...

int getPiece (__m256i qbb, U64 sq){
__m128i shift = _mm_cvtsi32x_si128( sq ^63);/* left shift amount 63-sq */
qbb = _mm256_sll_epi64( qbb, shift );/* squares to signs */uint32 qbbsigns = _mm256_movemask_epi8( qbb );/* get sign bits of 32 bytes */return((qbbsigns &0x80808080)*0x00204081)>>28;/* mask, nibble-map, shift */}

## Table of Contents

Home * Hardware * x86 * AVX2Advanced Vector Extensions 2(AVX2) is an expansion of the AVX instruction set. Support for 256-bit expansions of the SSE2 128-bit integer instructions will be added in AVX2, which will along with BMI2 part of Intel's Haswell architecture planned for 2013.## Features

Beside expanding most integer AVX instructions to 256 bit, AVX2 has 3-operand general-purpose bit manipulation and multiply, vector shifts, Double- and Quad Word-granularity any-to-any permutes, and 3-operand fused multiply-accumulate support. An important catch is that not all of the instructions are simply generalizations of their 128-bit equivalents: many work "in-lane", applying the same 128-bit operation to each 128-bit half of the register instead of a 256-bit generalization of the operation. For example:vpunpckldqxmm0, ABCD, EFGHvpunpckldqymm0, ABCDEFGH, IJKLMNOPIf vpunpckldq had been expanded in the more intuitive fashion, the result of the AVX2 operation would be AIBJCKDL. The reason for this design might be to allow AVX to be implemented more easily with two separate 128-bit arithmetic units.

Some AVX2 instructions, such as type conversion instructions, take both xmm and ymm registers as arguments. For example:

vpmovzxbwxmm1, xmm2/m64vpmovzxbwymm1, xmm2/m128## Individual Vector Shifts

With AVX2 each data element, such as a bitboard of a quad-bitboard, may be shifted left or right individually, as specified by the second source operand, with following Assembly mnemonics and C intrinsic equivalents:VPSRLVQymm1, ymm2, ymm3/m256_m256i _mm256_srlv_epi64 (_m256i m, _m256i count)VPSLLVQymm1, ymm2, ymm3/m256_m256i _mm256_sllv_epi64 (_m256i m, _m256i count)## Applications

With an appropriate quad-bitboard class, one may generate attacks of up to four different directions using individual shifts, for instance knight attacks or sliding piece attacks with Dumb7Fill to generate all positive or negative sliding ray attacks passing two times orthogonal and diagonal sliding pieces.Code samples and bitboard diagrams rely on Little endian file and rank mapping.## Knight Attacks

## Dumb7Fill

## Bitboard Permutation

For each bitboard in a destination quad-bitboard, the Qwords Element Permutation (VPERMQ) instruction^{[1]}selects one bitboard of a source quad-bitboard. This permits a bitboard in the source operand to be copied to multiple locations in the destination.## Vertical Nibble

Following code extracts the piece-code as "vertical nibble" from a quad-bitboard as board representation inside a register, "indexed" by square. The idea is to shift the square bits to the leftmost bit, the sign bit of each bitboard, to perform theVPMOVMSKBinstruction to get the sign bits of all 32 bytes into a general purpose register. Unfortunately, there is noVPMOVMSKQto get only the signs of four bitboards, so some more masking and mapping is required to get the four-bit piece code ...... using these intrinsics ...

... with seven assembly instructions intended, assuming the quad-bitboard passed in ymm0 and the square in rcx

## See also

## Manuals

## External Links

## References

## What links here?

Up one Level