Kogge-Stone Algorithm

Home * Board Representation * Bitboards * Sliding Piece Attacks * Kogge-Stone Algorithm
300px-4_bit_Kogge_Stone_Adder_Example_new.png

The Kogge-Stone Algorithm for set-wise sliding piece attack generation was first introduced by Steffan Westcott in CCC [1] . It is a parallel prefix approach of a occluded dumb7 flood-fill, propagating sliding piece attacks like carries of a Kogge-Stone hardware adder [2] in software. One needs to pass sliding pieces as generator set "g" and the set of empty squares as propagator set "p". For appropriate attacks we need to shift the occluded fill one step further, considering wraps.
4-bit Kogge-Stone adder [3]

mapping.JPG
Code samples and bitboard diagrams rely on Little endian file and rank mapping.

Parallel Prefix

The routine fillUpOccluded() smears the set bits of bitboard g upwards, but only along set bits of p; a reset bit in p is enough to halt a smear. Other routines have a similar effect.
U64 fillUpOccluded(U64 g, U64 p) {
   g |= p & (g <<  8);
   p &=     (p <<  8);
   g |= p & (g << 16);
   p &=     (p << 16);
   g |= p & (g << 32);
   return g;
}
The method chosen in FillUpOccluded() is based on a Kogge-Stone parallel prefix network, because it can be implemented very easily in software. The diagram below (trust me, it really _is_ supposed to look like that) is an illustration of how it works. The corresponding lines of program code are shown on the right. The inputs are fed into the network at the top, pass along the connecting lines, are combined by the # operator at various points, and the outputs appear at the bottom.
x1 x2 x3 x4 x5 x6 x7 x8         Input : g, p
|  |  |  |  |  |  |  |
V  V  V  V  V  V  V  V
|  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |
|\ |\ |\ |\ |\ |\ |\ |
| \| \| \| \| \| \| \|
|  #  #  #  #  #  #  #          g |= p & (g <<  8);
|  |  |  |  |  |  |  |          p &=     (p <<  8);
|\ |\ |\ |\ |\ |\ |  |
| \: \: \: \: \: \:  |
|  \  \  \  \  \  \  |
|  :\ :\ :\ :\ :\ :\ |
|  | \| \| \| \| \| \|
|  |  #  #  #  #  #  #          g |= p & (g << 16);
|  |  |  |  |  |  |  |          p &=     (p << 16);
|\ |\ |\ |\ |  |  |  |
| \: \: \: \:  |  |  |
|  \  \  \  \  |  |  |
|  :\ :\ :\ :\ |  |  |
|  | \: \: \: \:  |  |
|  |  \  \  \  \  |  |
|  |  :\ :\ :\ :\ |  |
|  |  | \: \: \: \:  |
|  |  |  \  \  \  \  |
|  |  |  ;\ ;\ :\ :\ |
|  |  |  | \| \| \| \|
|  |  |  |  #  #  #  #          g |= p & (g << 32);
|  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |
V  V  V  V  V  V  V  V
q1 q2 q3 q4 q5 q6 q7 q8         Output : g

Direction-wise Fill

We further rely on the compass rose to identify ray-directions.
  noWe         nort         noEa
          +7    +8    +9
              \  |  /
  west    -1 <-  0 -> +1    east
              /  |  \
          -9    -8    -7
  soWe         sout         soEa
Assuming little-endian file mapping. Big-endian file mapping has to swap A and H and east and west. As a reminder - shifting bitboards - the base of further stuff.


Fill on an empty Board

We already used the south- and north- parallel prefix fill-routines while calculating pawn spans. We need one additional direction step for ray-attacks on the otherwise empty board to exclude sliders. Convenient to initialize ray-attacks arrays of single sliders. We may conduct those routines from the general occluded fills by passing the universal propagator set. The vertical fills already look familiar.
U64 soutFill(U64 gen) {
   gen |= (gen >>  8);
   gen |= (gen >> 16);
   gen |= (gen >> 32);
   return gen;
}
 
U64 nortFill(U64 gen) {
   gen |= (gen <<  8);
   gen |= (gen << 16);
   gen |= (gen << 32);
   return gen;
}
Using explicit propagators as compile time constants to manage the A-, H-file-wraps.
U64 eastFill(U64 gen) {
   const U64 pr0 = notAFile;
   const U64 pr1 = pr0 & (pr0 << 1);
   const U64 pr2 = pr1 & (pr1 << 2);
   gen |= pr0 & (gen  << 1);
   gen |= pr1 & (gen  << 2);
   gen |= pr2 & (gen  << 4);
   return gen;
}
 
U64 noEaFill(U64 gen) {
   const U64 pr0 = notAFile;
   const U64 pr1 = pr0 & (pr0 <<  9);
   const U64 pr2 = pr1 & (pr1 << 18);
   gen |= pr0 & (gen <<  9);
   gen |= pr1 & (gen << 18);
   gen |= pr2 & (gen << 36);
   return gen;
}
 
U64 soEaFill(U64 gen) {
   const U64 pr0 = notAFile;
   const U64 pr1 = pr0 & (pr0 >>  7);
   const U64 pr2 = pr1 & (pr1 >> 14);
   gen |= pr0 & (gen >>  7);
   gen |= pr1 & (gen >> 14);
   gen |= pr2 & (gen >> 28);
   return gen;
}
 
U64 westFill(U64 gen) {
   const U64 pr0 = notHFile;
   const U64 pr1 = pr0 & (pr0 >> 1);
   const U64 pr2 = pr1 & (pr1 >> 2);
   gen |= pr0 & (gen >> 1);
   gen |= pr1 & (gen >> 2);
   gen |= pr2 & (gen >> 4);
   return gen;
}
 
U64 soWeFill(U64 gen) {
   const U64 pr0 = notHFile;
   const U64 pr1 = pr0 & (pr0 >>  9);
   const U64 pr2 = pr1 & (pr1 >> 18);
   gen |= pr0 & (gen >>  9);
   gen |= pr1 & (gen >> 18);
   gen |= pr2 & (gen >> 36);
   return gen;
}
 
U64 noWeFill(U64 gen) {
   const U64 pr0 = notHFile;
   const U64 pr1 = pr0 & (pr0 <<  7);
   const U64 pr2 = pr1 & (pr1 << 14);
   gen |= pr0 & (gen <<  7);
   gen |= pr1 & (gen << 14);
   gen |= pr2 & (gen << 28);
   return gen;
}

Occluded Fill

Occluded fills include sliders, but exclude blockers [4].
U64 soutOccl(U64 gen, U64 pro) {
   gen |= pro & (gen >>  8);
   pro &=       (pro >>  8);
   gen |= pro & (gen >> 16);
   pro &=       (pro >> 16);
   gen |= pro & (gen >> 32);
   return gen;
}
 
U64 nortOccl(U64 gen, U64 pro) {
   gen |= pro & (gen <<  8);
   pro &=       (pro <<  8);
   gen |= pro & (gen << 16);
   pro &=       (pro << 16);
   gen |= pro & (gen << 32);
   return gen;
}
 
U64 eastOccl(U64 gen, U64 pro) {
   pro &= notAFile;
   gen |= pro & (gen << 1);
   pro &=       (pro << 1);
   gen |= pro & (gen << 2);
   pro &=       (pro << 2);
   gen |= pro & (gen << 4);
   return gen;
}
 
U64 noEaOccl(U64 gen, U64 pro) {
   pro &= notAFile;
   gen |= pro & (gen <<  9);
   pro &=       (pro <<  9);
   gen |= pro & (gen << 18);
   pro &=       (pro << 18);
   gen |= pro & (gen << 36);
   return gen;
}
 
U64 soEaOccl(U64 gen, U64 pro) {
   pro &= notAFile;
   gen |= pro & (gen >>  7);
   pro &=       (pro >>  7);
   gen |= pro & (gen >> 14);
   pro &=       (pro >> 14);
   gen |= pro & (gen >> 28);
   return gen;
}
 
U64 westOccl(U64 gen, U64 pro) {
   pro &= notHFile;
   gen |= pro & (gen >> 1);
   pro &=       (pro >> 1);
   gen |= pro & (gen >> 2);
   pro &=       (pro >> 2);
   gen |= pro & (gen >> 4);
   return gen;
}
 
U64 soWeOccl(U64 gen, U64 pro) {
   pro &= notHFile;
   gen |= pro & (gen >>  9);
   pro &=       (pro >>  9);
   gen |= pro & (gen >> 18);
   pro &=       (pro >> 18);
   gen |= pro & (gen >> 36);
   return gen;
}
 
U64 noWeOccl(U64 gen, U64 pro) {
   pro &= notHFile;
   gen |= pro & (gen <<  7);
   pro &=       (pro <<  7);
   gen |= pro & (gen << 14);
   pro &=       (pro << 14);
   gen |= pro & (gen << 28);
   return gen;
}

Ray-wise Attacks

Ray-wise attacks need the occluded fills shift one further, considering wraps.
U64 soutAttacks (U64 rooks,   U64 empty) {return soutOne(soutOccl(rooks,   empty));}
U64 nortAttacks (U64 rooks,   U64 empty) {return nortOne(nortOccl(rooks,   empty));}
U64 eastAttacks (U64 rooks,   U64 empty) {return eastOne(eastOccl(rooks,   empty));}
U64 noEaAttacks (U64 bishops, U64 empty) {return noEaOne(noEaOccl(bishops, empty));}
U64 soEaAttacks (U64 bishops, U64 empty) {return soEaOne(soEaOccl(bishops, empty));}
U64 westAttacks (U64 rooks,   U64 empty) {return westOne(westOccl(rooks,   empty));}
U64 soWeAttacks (U64 bishops, U64 empty) {return soWeOne(soWeOccl(bishops, empty));}
U64 noWeAttacks (U64 bishops, U64 empty) {return noWeOne(noWeOccl(bishops, empty));}


Generalized Rays

Since rotate 64 works like a generalized shift with positive or negative shift amount, it might be applied to get pawn-attacks for both sides - or a Kogge-Stone fill with a direction parameter and small lookups for shift amount and wrap ands, instead of multiple code for eight directions. Of course generalized shift will be a bit slower due to lookups and using cl as the shift amount register.

U64 occludedFill (U64 gen, U64 pro, int dir8)
{
   int r = shift[dir8]; // {+-1,7,8,9}
   pro &= avoidWrap[dir8];
   gen |= pro & rotateLeft(gen, r);
   pro &=       rotateLeft(pro, r);
   gen |= pro & rotateLeft(gen, 2*r);
   pro &=       rotateLeft(pro, 2*r);
   gen |= pro & rotateLeft(gen, 4*r);
   return gen;
}
 
U64 shiftOne (U64 b, int dir8)
{
   int r = shift[dir8]; // {+-1,7,8,9}
   return rotateLeft(b, r) & avoidWrap[dir8];
}
 
U64 slidingAttacks (U64 sliders, U64 empty, int dir8)
{
   U64 fill = occludedFill(slider, empty, dir8)
   return shiftOne(fill, dir8);
}
 
// positve left, negative right shifts
int shift[8] = {9, 1,-7,-8,-9,-1, 7, 8};
 
U64 avoidWrap[8] =
{
   0xfefefefefefefe00,
   0xfefefefefefefefe,
   0x00fefefefefefefe,
   0x00ffffffffffffff,
   0x007f7f7f7f7f7f7f,
   0x7f7f7f7f7f7f7f7f,
   0x7f7f7f7f7f7f7f00,
   0xffffffffffffff00,
};

See also


Publications

  • Peter M. Kogge, Harold S. Stone (1973). A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations. IEEE Transactions on Computers, 1973, C-22, 783-791, pdf

Forum Posts


External Links


References

  1. ^ Re: flood fill attack bitboards by Steffan Westcott, CCC, September 15, 2002
  2. ^ Hardware algorithms for arithmetic modules from the ARITH research group, Aoki lab., Tohoku University
  3. ^ Kogge-Stone adder from Wikipedia
  4. ^ Kogge Stone Algorithm mistake on chessprogramming Wiki by Christopher Conkie, CCC, June 29, 2008
  5. ^ Parallel Thread Execution from Wikipedia
  6. ^ NVIDIA Compute PTX: Parallel Thread Execution, ISA Version 1.4, March 31, 2009, pdf
  7. ^ Shapeshifting from Wikipedia

What links here?


Up one Level