Kogge-Stone+Algorithm

toc include page="MappingHint" =Parallel Prefix=
 * Home * Board Representation * Bitboards * Sliding Piece Attacks * Kogge-Stone Algorithm**
 * [[image:300px-4_bit_Kogge_Stone_Adder_Example_new.png width="240" link="https://en.wikipedia.org/wiki/Kogge-Stone_adder"]] ||~  || The **Kogge-Stone Algorithm** for set-wise sliding piece attack generation was first introduced by Steffan Westcott in CCC . It is a parallel prefix approach of a occluded dumb7 [|flood-fill], propagating sliding piece attacks like [|carries] of a [|Kogge-Stone] hardware adder in software. One needs to pass sliding pieces as generator set "g" and the set of empty squares as propagator set "p". For appropriate attacks we need to shift the occluded fill one step further, considering wraps. ||
 * 4-bit Kogge-Stone adder ||~  ||^   ||

code format="cpp" U64 fillUpOccluded(U64 g, U64 p) { g |= p & (g << 8); p &=    (p <<  8); g |= p & (g << 16); p &=    (p << 16); g |= p & (g << 32); return g; } code

code x1 x2 x3 x4 x5 x6 x7 x8        Input : g, p V  V  V  V  V  V  V  V V  V  V  V  V  V  V  V q1 q2 q3 q4 q5 q6 q7 q8         Output : g code
 * # #  #  #  #  #  #          g |= p & (g <<  8);
 * | |  |  |  |  |  |          p &=     (p <<  8);
 * | #  #  #  #  #  #          g |= p & (g << 16);
 * | |  |  |  |  |  |          p &=     (p << 16);
 * | |  |  #  #  #  #          g |= p & (g << 32);
 * | |  |  |  |  |  |          p &=     (p <<  8);
 * | #  #  #  #  #  #          g |= p & (g << 16);
 * | |  |  |  |  |  |          p &=     (p << 16);
 * | |  |  #  #  #  #          g |= p & (g << 32);
 * | #  #  #  #  #  #          g |= p & (g << 16);
 * | |  |  |  |  |  |          p &=     (p << 16);
 * | |  |  #  #  #  #          g |= p & (g << 32);
 * | |  |  |  |  |  |          p &=     (p << 16);
 * | |  |  #  #  #  #          g |= p & (g << 32);
 * | |  |  #  #  #  #          g |= p & (g << 32);
 * | |  |  #  #  #  #          g |= p & (g << 32);
 * | |  |  #  #  #  #          g |= p & (g << 32);
 * | |  |  #  #  #  #          g |= p & (g << 32);
 * | |  |  #  #  #  #          g |= p & (g << 32);
 * | |  |  #  #  #  #          g |= p & (g << 32);
 * | |  |  #  #  #  #          g |= p & (g << 32);
 * | |  |  #  #  #  #          g |= p & (g << 32);
 * | |  |  #  #  #  #          g |= p & (g << 32);
 * | |  |  #  #  #  #          g |= p & (g << 32);
 * | |  |  #  #  #  #          g |= p & (g << 32);

=Direction-wise Fill= //We further rely on the [|compass rose] to identify ray-directions.// code noWe        nort         noEa +7   +8    +9              \  |  /  west    -1 <-  0 -> +1    east / |  \          -9    -8    -7  soWe         sout         soEa code Assuming little-endian file mapping. Big-endian file mapping has to swap A and H and east and west. As a reminder - shifting bitboards - the base of further stuff.



Fill on an empty Board
We already used the south- and north- parallel prefix fill-routines while calculating pawn spans. We need one additional direction step for ray-attacks on the otherwise empty board to exclude sliders. Convenient to initialize ray-attacks arrays of single sliders. We may conduct those routines from the general occluded fills by passing the universal propagator set. The vertical fills already look familiar. code format="cpp" U64 soutFill(U64 gen) { gen |= (gen >> 8); gen |= (gen >> 16); gen |= (gen >> 32); return gen; }

U64 nortFill(U64 gen) { gen |= (gen << 8); gen |= (gen << 16); gen |= (gen << 32); return gen; } code Using explicit propagators as compile time constants to manage the A-, H-file-wraps. code format="cpp" U64 eastFill(U64 gen) { const U64 pr0 = notAFile; const U64 pr1 = pr0 & (pr0 << 1); const U64 pr2 = pr1 & (pr1 << 2); gen |= pr0 & (gen << 1); gen |= pr1 & (gen << 2); gen |= pr2 & (gen << 4); return gen; }

U64 noEaFill(U64 gen) { const U64 pr0 = notAFile; const U64 pr1 = pr0 & (pr0 << 9); const U64 pr2 = pr1 & (pr1 << 18); gen |= pr0 & (gen << 9); gen |= pr1 & (gen << 18); gen |= pr2 & (gen << 36); return gen; }

U64 soEaFill(U64 gen) { const U64 pr0 = notAFile; const U64 pr1 = pr0 & (pr0 >> 7); const U64 pr2 = pr1 & (pr1 >> 14); gen |= pr0 & (gen >> 7); gen |= pr1 & (gen >> 14); gen |= pr2 & (gen >> 28); return gen; }

U64 westFill(U64 gen) { const U64 pr0 = notHFile; const U64 pr1 = pr0 & (pr0 >> 1); const U64 pr2 = pr1 & (pr1 >> 2); gen |= pr0 & (gen >> 1); gen |= pr1 & (gen >> 2); gen |= pr2 & (gen >> 4); return gen; }

U64 soWeFill(U64 gen) { const U64 pr0 = notHFile; const U64 pr1 = pr0 & (pr0 >> 9); const U64 pr2 = pr1 & (pr1 >> 18); gen |= pr0 & (gen >> 9); gen |= pr1 & (gen >> 18); gen |= pr2 & (gen >> 36); return gen; }

U64 noWeFill(U64 gen) { const U64 pr0 = notHFile; const U64 pr1 = pr0 & (pr0 << 7); const U64 pr2 = pr1 & (pr1 << 14); gen |= pr0 & (gen << 7); gen |= pr1 & (gen << 14); gen |= pr2 & (gen << 28); return gen; } code 

Occluded Fill
Occluded fills include sliders, but exclude blockers. code format="cpp" U64 soutOccl(U64 gen, U64 pro) { gen |= pro & (gen >> 8); pro &=      (pro >>  8); gen |= pro & (gen >> 16); pro &=      (pro >> 16); gen |= pro & (gen >> 32); return gen; }

U64 nortOccl(U64 gen, U64 pro) { gen |= pro & (gen << 8); pro &=      (pro <<  8); gen |= pro & (gen << 16); pro &=      (pro << 16); gen |= pro & (gen << 32); return gen; }

U64 eastOccl(U64 gen, U64 pro) { pro &= notAFile; gen |= pro & (gen << 1); pro &=      (pro << 1); gen |= pro & (gen << 2); pro &=      (pro << 2); gen |= pro & (gen << 4); return gen; }

U64 noEaOccl(U64 gen, U64 pro) { pro &= notAFile; gen |= pro & (gen << 9); pro &=      (pro <<  9); gen |= pro & (gen << 18); pro &=      (pro << 18); gen |= pro & (gen << 36); return gen; }

U64 soEaOccl(U64 gen, U64 pro) { pro &= notAFile; gen |= pro & (gen >> 7); pro &=      (pro >>  7); gen |= pro & (gen >> 14); pro &=      (pro >> 14); gen |= pro & (gen >> 28); return gen; }

U64 westOccl(U64 gen, U64 pro) { pro &= notHFile; gen |= pro & (gen >> 1); pro &=      (pro >> 1); gen |= pro & (gen >> 2); pro &=      (pro >> 2); gen |= pro & (gen >> 4); return gen; }

U64 soWeOccl(U64 gen, U64 pro) { pro &= notHFile; gen |= pro & (gen >> 9); pro &=      (pro >>  9); gen |= pro & (gen >> 18); pro &=      (pro >> 18); gen |= pro & (gen >> 36); return gen; }

U64 noWeOccl(U64 gen, U64 pro) { pro &= notHFile; gen |= pro & (gen << 7); pro &=      (pro <<  7); gen |= pro & (gen << 14); pro &=      (pro << 14); gen |= pro & (gen << 28); return gen; } code

Ray-wise Attacks
Ray-wise attacks need the occluded fills shift one further, considering wraps. code format="cpp" U64 soutAttacks (U64 rooks,  U64 empty) {return soutOne(soutOccl(rooks,   empty));} U64 nortAttacks (U64 rooks,  U64 empty) {return nortOne(nortOccl(rooks,   empty));} U64 eastAttacks (U64 rooks,  U64 empty) {return eastOne(eastOccl(rooks,   empty));} U64 noEaAttacks (U64 bishops, U64 empty) {return noEaOne(noEaOccl(bishops, empty));} U64 soEaAttacks (U64 bishops, U64 empty) {return soEaOne(soEaOccl(bishops, empty));} U64 westAttacks (U64 rooks,  U64 empty) {return westOne(westOccl(rooks,   empty));} U64 soWeAttacks (U64 bishops, U64 empty) {return soWeOne(soWeOccl(bishops, empty));} U64 noWeAttacks (U64 bishops, U64 empty) {return noWeOne(noWeOccl(bishops, empty));} code

 =Generalized Rays= Since rotate 64 works like a generalized shift with positive or negative shift amount, it might be applied to get pawn-attacks for both sides - or a Kogge-Stone fill with a direction parameter and small lookups for shift amount and wrap ands, instead of multiple code for eight directions. Of course generalized shift will be a bit slower due to lookups and using cl as the shift amount register.

code format="cpp" U64 occludedFill (U64 gen, U64 pro, int dir8) {  int r = shift[dir8]; // {+-1,7,8,9} pro &= avoidWrap[dir8]; gen |= pro & rotateLeft(gen, r); pro &=      rotateLeft(pro, r); gen |= pro & rotateLeft(gen, 2*r); pro &=      rotateLeft(pro, 2*r); gen |= pro & rotateLeft(gen, 4*r); return gen; }

U64 shiftOne (U64 b, int dir8) {  int r = shift[dir8]; // {+-1,7,8,9} return rotateLeft(b, r) & avoidWrap[dir8]; }

U64 slidingAttacks (U64 sliders, U64 empty, int dir8) {  U64 fill = occludedFill(slider, empty, dir8) return shiftOne(fill, dir8); }

// positve left, negative right shifts int shift[8] = {9, 1,-7,-8,-9,-1, 7, 8};

U64 avoidWrap[8] = {  0xfefefefefefefe00, 0xfefefefefefefefe, 0x00fefefefefefefe, 0x00ffffffffffffff, 0x007f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f00, 0xffffffffffffff00, }; code

=See also=
 * Steffan Westcott's Elaboration on Kogge-Stone algorithm
 * Add/Sub versus Attacks
 * Design Principles by Steffan Westcott
 * Dumb7Fill
 * Fill by Subtraction
 * Fill Algorithms
 * Pieces versus Directions
 * SSE2-Wrapper in C++

=Publications=
 * [|Peter M. Kogge], Harold S. Stone (**1973**). //A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations//. IEEE Transactions on Computers, 1973, C-22, 783-791, [|pdf]

=Forum Posts=
 * [|Re: flood fill attack bitboards] by Steffan Westcott, CCC, September 15, 2002
 * [|Kogge Stone Algorithm mistake on chessprogramming Wiki] by Christopher Conkie, CCC, June 29, 2008
 * [|Re: Hyperbola Quiesscene: hardly any improvement] by Karlo Bala Jr., CCC, January 14, 2009 » Hyperbola Quintessence
 * [|Kogge Stone, Vector Based] by Srdja Matovic, CCC, January 22, 2013 » GPU
 * [|Fast perft on GPU (upto 20 Billion nps w/o hashing)] by Ankan Banerjee, CCC, June 22, 2013 » Perft, GPU

=External Links= > media type="youtube" key="0AZVy2coi6c" width="480"
 * [|Kogge-Stone adder from Wikipedia]
 * [|Kogge-Stone algorithm] from [|Terra Help and Information - Theories] by Peter Fendrich
 * Gong - [|Shapeshifter], 1992, [|YouTube] Video

=References= =What links here?= include page="Kogge-Stone Algorithm" component="backlinks" limit="40"
 * Up one Level**