Hyperbola+Quintessence

include page="MappingHint" || toc =Reverse Math= Assume following masked occupancy on a file, diagonal or anti-diagonal - for simplicity as a flat byte (in a real bitboard with masked files or diagonals you have 6..8 scratch-bits between the bits of this byte). Thus, vertical flip reverses the bits of this byte. code o' = reverse(o) r' = reverse(r)
 * Home * Board Representation * Bitboards * Sliding Piece Attacks * Hyperbola Quintessence**
 * [[image:Reflexion.jpg width="313" height="258" link="http://www.puckergallery.com/artists/bak_chess/bak_chess_all.html"]] ||~  || **Hyperbola Quintessence** applies the o^(o-2r)-trick also for vertical or diagonal negative Rays - by reversing the bit-order of up to one bit per rank or byte with a vertical flip aka x86-64 bswap . It is somehow a resurrection of the reverse bitboards idea of Ryan Mack's //Hyperbola Project// on the fly, and was created by Gerd Isenberg. Improvements by Aleks Peshkov made it applicable and competitive.
 * Samuel Bak - Reflexion, 1990 ||~  ||^   ||

normal   reversed o    11010101  10101011 o' occupancy including slider r    00010000  00001000 r' slider o-r  11000101  10100011 o'-r'  1. sub clears the slider o-2r 10110101  10011011 o'-2r' 2. sub borrows "one" from next blocker |......| \....../ normal 10110101   \..../ 11011001 <--XXXX  re-reverse single xor   01101100 -> to get the attack set code The first subtraction of (o-2r) is done implicitly by masking off the line, removing the slider from the occupied set. The second subtraction borrows a "one" from the next nearest blocker in msb-direction, falling through all unset bits outside the line. Of course, if no blocker is available, it borrows a "one" in usual arithmetical manner from the hidden 2^N. Only the changed bits (from original o, o') are the appropriate sliding attacks, including the blocker but excluding the slider. The result finally needs to be intersected with the same line mask as previously the occupancy, to clear the wrapped borrow one bits outside the file or diagonal. The fine optimization by Aleks Peshkov covers the final union of positive and negative ray-attacks. Since opposed ray-directions are always disjoint sets, using xor instead of //bitwise or// safes two instructions per line-attack. That is because bit-reversal or any mirroring or flipping is own inverse and distributive over xor. code reverse(a ^ b) == reverse (a) ^ reverse(b) code thus code lineAttacks = o^(o-2r) ^ reverse((o'-2r')^o') lineAttacks = o^(o-2r) ^ reverse( o'-2r') ^ reverse(o') lineAttacks = o^(o-2r) ^ reverse( o'-2r') ^ o code and finally code lineAttacks =  (o-2r) ^ reverse( o'-2r') code Beside shorter code this reduces register pressure - and clearly outperforms kindergarten bitboards - [|ipc]-wise, in code size and memory requirements.

=Source Code=

C
The three C-routines only differ by the line-mask applied:

code format="cpp" U64 diagonalAttacks(U64 occ, enumSquare sq) { U64 forward, reverse; forward = occ & smsk[sq].diagonalMaskEx; reverse = _byteswap_uint64(forward); forward -= smsk[sq].bitMask; reverse -= _byteswap_uint64(smsk[sq].bitMask); forward ^= _byteswap_uint64(reverse); forward &= smsk[sq].diagonalMaskEx; return forward; }

U64 antiDiagAttacks(U64 occ, enumSquare sq) { U64 forward, reverse; forward = occ & smsk[sq].antidiagMaskEx; reverse = _byteswap_uint64(forward); forward -= smsk[sq].bitMask; reverse -= _byteswap_uint64(smsk[sq].bitMask); forward ^= _byteswap_uint64(reverse); forward &= smsk[sq].antidiagMaskEx; return forward; }

U64 fileAttacks(U64 occ, enumSquare sq) { U64 forward, reverse; forward = occ & smsk[sq].fileMaskEx; reverse = _byteswap_uint64(forward); forward -= smsk[sq].bitMask; reverse -= _byteswap_uint64(smsk[sq].bitMask); forward ^= _byteswap_uint64(reverse); forward &= smsk[sq].fileMaskEx; return forward; }

U64 bishopAttacks(U64 occ, enumSquare sq) { return diagonalAttacks (occ, sq) + antiDiagAttacks (occ, sq); } code  For better locality of the line-attacks on the otherwise empty board, we may use an properly aligned array of structs. code format="cpp" struct {  U64 bitMask;         // 1 << sq for convenience U64 diagonalMaskEx; U64 antidiagMaskEx; U64 fileMaskEx; } smsk[64]; // 2 KByte code Using x86-64 bswap makes it quite competitive for bishops and files, on AMD [|K8] or [|K10] it has a latency of one cycle with a throughput of 1/3, like other cheap instructions. However, Intel is tad slower - while the recent [|Core 2 duo] processors perform 128-bit SIMD-instructions with 128-bit alus, that is bitwise logical instructions with a latency of one cycle and throughput of 1/3, the general purpose bswap-instruction takes four cycles with a throughput of one. In //Intel 64 and IA32 Architectures Optimization Reference Manual//, it is therefor recommend (5.6.5. endian conversion) to use the SSSE3 pshufb instruction to swap bytes, available through intrinsic , see SSSE3 Hyperbola Quintessence for bishop attacks.

As long there is no fast bit reversal instruction, there is no general solution for all four lines, and the rook attack-getter still needs some standard technique for the rank-attacks. Tim Cooijmans proposed to map the rank to the main diagonal before applying HQ, and to re-map the calculated attacks back to the original rank.

Generalized set-wise attacks
Hyperbola quintessence can be generalized to work on whole sets of sliding pieces instead on individual pieces, whose ranks to be masked. The problem arising, when not masking the rank of the piece is that attacks wrap around the board during subtraction. This is shown below: code ........      ........                            11111111     ........       ........                            11111111     ........       ........                            11111111 r = ........, o = ........ this leads to  o - 2*r =  11111111 ........      ........                            11111111     ........       ........                            11111111     ....1...       ....1...                            11111...     ........       ........                            ........

instead of

........ ........ ........ ........ ........ ........ 11111... ........ code This is not the intended result. It can be avioded, by bitwise adding an overflow barrier on the right-hand side. Afterwards this barrier needs to be removed from the attack set: code u64 right = 0x0101010101010101ULL;

......1.     1..1..1.                              1...1111     ....1...      1...1...                              .1111..1     ......1.      11....1.                              1.111111 r =  .....1.. o = .11..1.. now: ((o | right) - 2*r) = .1.111.1 ........     ........                              ........     ......1.      ......1.                              1111111.     .......1      .......1                              1111111.     1.......      1.......                              1......1

Note, that the 4th rank was not flooded by the subtraction! Next, the blockers are removed as usual:

...111.1                         1111...1                          .11111.1 o ^ ((o | right) - 2*r) = ..111..1 ........                         111111..                          11111111                          .......1

The last step is to remove the barrier at the right side that became visible after the last operation.

...111..                                    1111...                                     .11111.. (o ^ ((o | right) - 2*r) & ~right =  ..111...                                     ........                                     111111..                                     1111111.                                     ........

This is the correct attack set for the left direction.

code the complete algorithm for the left direction is therefore: code const u64 right = 0x0101010101010101ULL;

u64 leftAttacks = ((o ^ ((o | right) - 2*r) & ~right); code For the right-hand direction, the bits need to be reversed rank-wise.

x86-64 assembly
The VC2005 generated x86-64 assembly of bishopAttacks indicates what [|ipc]-monster Hyperbola Quintessence is: code occ$ = 16 sq$ = 24 ?bishopAttacks@@YA_K_KI@Z PROC 00000  40 53                push    rbx 00002  8b c2                mov    eax, edx 00004  4c 8d 15 00 00 00 00 lea    r10, OFFSET FLAT:?smsk 0000b  48 c1 e0 05          shl    rax, 5 0000f  4a 8b 5c 10 08       mov    rbx, QWORD PTR [rax+r10+8]  ; diagonalMaskEx 00014  4e 8b 4c 10 10       mov    r9,  QWORD PTR [rax+r10+16] ; antidiagMaskEx 00019  4e 8b 14 10          mov    r10, QWORD PTR [rax+r10]    ; r := 1 << sq  0001d   4c 8b db             mov    r11, rbx                    ; diagonalMaskEx 00020  49 8b d1             mov    rdx, r9                     ; antidiagMaskEx 00023  4d 8b c2             mov    r8, r10                     ; r := 1 << sq  00026   48 23 d1             and    rdx, rcx                    ; anti & occ 00029  4c 23 d9             and    r11, rcx                    ; dia  & occ 0002c  49 0f c8             bswap  r8                          ; r'  0002f   48 8b c2             mov    rax, rdx                    ; ant 00032  49 8b cb             mov    rcx, r11                    ; dia 00035  49 2b d2             sub    rdx, r10                    ; ant - r  00038   48 0f c8             bswap  rax                         ; ant' 0003b  48 0f c9             bswap  rcx                         ; dia' 0003e  4d 2b da             sub    r11, r10                    ; dia - r  00041   49 2b c0             sub    rax, r8                     ; ant' - r'  00044   49 2b c8             sub    rcx, r8                     ; dia' - r'  00047   48 0f c8             bswap  rax                         ;(ant' - r')' 0004a  48 0f c9             bswap  rcx                         ;(dia' - r')' 0004d  48 33 c2             xor    rax, rdx                    ; ant := (ant' - r')' ^ (ant - r)  00050   49 33 cb             xor    rcx, r11                    ; dia := (dia' - r')' ^ (dia - r)  00053   49 23 c1             and    rax, r9                     ; ant &= antidiagMaskEx 00056  48 23 cb             and    rcx, rbx                    ; dia &= diagonalMaskEx 00059  48 03 c1             add    rax, rcx                    ; attacks := dia + ant 0005c  5b                   pop    rbx 0005d  c3                   ret    0 ?bishopAttacks@@YA_K_KI@Z ENDP code



Java
Java programmer may try [|Long.reverseBytes]: code format="java5" static private final long[] bitMask = { 0x0000000000000001, 0x0000000000000002, 0x0000000000000004, 0x0000000000000008, 0x0000000000000010, 0x0000000000000020, 0x0000000000000040, 0x0000000000000080, ...   };

static private final long[] diagonalMaskEx = { 0x8040201008040200, 0x0080402010080400, 0x0000804020100800, 0x0000008040201000, 0x0000000080402000, 0x0000000000804000, 0x0000000000008000, 0x0000000000000000, ...   };

/**    * @param occ - occupancy *       sq  - from square * @return diagonal attacks from sq with occupancy occ */   static public long diagonalAttacks(long occ, int sq) {      long forward = occ & diagonalMaskEx[sq]; long reverse = Long.reverseBytes(forward); forward -= bitMask[sq]; reverse -= bitMask[sq^56]; forward ^= Long.reverseBytes(reverse); forward &= diagonalMaskEx[sq]; return forward; } code [|Long.reverse] for a generalized attack getter even for ranks is too expensive, except a [|JVM] can use a machine instruction rather than a bit-reversal routine: code format="java5" /**    * @param occ  - occupancy *       line - {0..3} {rank, file, diagonal, antidiagonal} *       sq   - from square * @return attacks from sq on line with occupancy occ */   static public long attacks(long occ, int line, int sq) {      long forward = occ & maskEx[sq][line]; long reverse = Long.reverse(forward); forward -= bitMask[sq]; reverse -= bitMask[sq^63]; forward ^= Long.reverse(reverse); forward &= maskEx[sq][line]; return forward; } code =AMD's SIMD Future in 2011= In 2007 AMD announced the hypothetical SSE5 instruction set with a PPERM instruction, which would be able to reverse vectors of two bitboards. In May 2009 AMD revised SSE5, to become more compatible with Intel's Advanced Vector Extensions, and also expanding the SIMD-register width from 128 to 256-bit, the renamed XOP instruction set. However, the successor of the planned PPERM SSE5-instruction is now called VPPERM, but still works on 128-bit XMM registers. VPPERM instruction requires four operands (xmm-registers): code VPPERM dest, src1, src2, selector code For each of 16 destination bytes the corresponding selector-byte addresses one of 32 input bytes (from src1, src2) and a logical operation including bit-reversal: code format="cpp" char src[32];  // src2:src1 char select[16]; char dest[16]; for (int i = 0; i < 16; i++) { char opera = select[i] >>> 5; // unsigned shift char idx32 = select[i] & 31;

switch ( opera ) { case 0: dest[i] = src[idx32]; break; case 1: dest[i] = ~src[idx32]; break; case 2: dest[i] = bitreverse( src[idx32]); break; case 3: dest[i] = ~bitreverse( src[idx32]); break; case 4: dest[i] = 0x00; break; case 5: dest[i] = 0xFF; break; case 6: dest[i] = src[idx32] >> 7;  break; // signed shift case 7: dest[i] = ~src[idx32] >> 7; break; // signed shift } } code Since VPPERM can simultaneously reverse bits and bytes, it can for instance reverse two bitboards in one run, even from different sources, to make Hyperbola Quintessence work for all four lines.

=See also=
 * Reverse Bitboards
 * Obstruction Difference
 * SBAMG
 * SSSE3 Hyperbola Quintessence
 * Subtracting a Rook from a Blocking Piece

=Forum Posts=
 * [|Re: BitBoard Tests Magic v Non-Rotated 32 Bits v 64 Bits] by Aleks Peshkov, CCC, August 25, 2007
 * [|Hyperbola Quiesscene: hardly any improvement] by trojanfoe, CCC, January 13, 2009
 * [|Comparison of bitboard attack-getter variants] by Sven Schüle, CCC, January 04, 2016
 * [|Re: The wrong way] by Aleks Peshkov, CCC, January 05, 2016 » SSSE3 Hyperbola Quintessence

=External Links=
 * [|Sliding Pieces (Part 1) - Advanced Java Chess Engine Tutorial 8] by Jonathan Warkentin
 * [|Hyperbola Quintessence for rooks along ranks] by [|Tim Cooijmans], April 6, 2014

Hyperbola

 * [|Hyperbola from Wikipedia]
 * [|The Eulerian Hyperbola]
 * [|Foci of a Hyperbola] from [|mathwords.com]

Quintessence

 * [|Quintessence from Wikipedia]
 * [|Quintessence (physics) from Wikipedia]
 * [|Quintessence the Fifth Element from Wikipedia]

Misc
> media type="custom" key="23950926"
 * Focus - [|Hocus Pocus], [|Pinkpop Festival] 1972, [|Geleen], [|YouTube] Video

=References= =What links here?= include page="Hyperbola Quintessence" component="backlinks" limit="60"
 * Up one Level**