Kindergarten bitboards [1] was a kind of interactive forum development [2] with a lot of meanders [3] . There were two issues involved - first to calculate the occupancy of any line from the occupied bitboard[4] - and second, compact and dense lookup tables.
As a quintessence Gerd Isenberg came up with this nomination. It relies on fast 64-bit multiplication, but is otherwise quite resource friendly and a compromise between calculation and table-size.
Ranks and diagonals - that is their appropriate line-mask by square-index - are first intersected by the occupancy of the whole board. Doesn't matter whether the slider itself is cleared or not - it is redundant anyway, considered by the pre-calculated lookup-table.
Since there is only up to one bit per file, the north-fill multiplication by the A-file maps the diagonal to the 8th rank. Or - since we only need the inner six bits, we combine the required shift left one by multiplying with the B-file. Shifting right the product by 58 (64-6) leaves the six-bit occupancy-index in the 0..63 range. For instance the diagonal-attacks of a bishop on d4. 'A'-'H' represent the masked occupied bits along this diagonal, which are either zero or one.
masked line * B-File = B-G upper six occupancy 6 bit
. . . . . . . H . 1 . . . . . . . A[B C . E F G] . . . . . . . .
. . . . . . G . . 1 . . . . . . . A B C . E F G . . . . . . . .
. . . . . F . . . 1 . . . . . . . A B C . E F . . . . . . . . .
. . . . E . . . . 1 . . . . . . . A B C . E . . >> . . . . . . . .
. . . . . . . . * . 1 . . . . . . = . A B C . . . . 58 . . . . . . . .
. . C . . . . . . 1 . . . . . . . A B C . . . . . . . . . . . .
. B . . . . . . . 1 . . . . . . . A B . . . . . . . . . . . . .
A . . . . . . . . 1 . . . . . . . A . . . . . . [B C . E F G]. .
The pre-calculated lookup-table contains the attacks of the first rank - but eight copies in each rank or byte. It is indexed by the six bit occupied-state ('B'-'G') and the file of the slider's square. It needs to be intersected with the same line-mask as formerly the occupancy - to map the first rank attack bits to the appropriate line - that's all. Appropriate pre-calculated attack bits are represented by 'a'-'h':
8 copies of rank the attack set
attacks & l-mask -> of this line
a b c . e f g h . . . . . . . h
a b c . e f g h . . . . . . g .
a b c . e f g h . . . . . f . .
a b c . e f g h . . . . e . . .
a b c . e f g h . . . . . . . .
a b c . e f g h . . c . . . . .
a b c . e f g h . b . . . . . .
a b c . e f g h a . . . . . . .
Since all ranks, diagonals and anti-diagonals are properly file-aligned, it works perfectly with some redundant occupied bits for shorter diagonals as well, like here the outer bit 'B':
masked line * B-File = B-G upper six occupancy 6 bit
. . . . . . . . . 1 . . . . . . H .[B C . E F G] . . . . . . . .
. . . . . . . H . 1 . . . . . . . . B C . E F G . . . . . . . .
. . . . . . G . . 1 . . . . . . . . B C . E F G . . . . . . . .
. . . . . F . . . 1 . . . . . . . . B C . E F . >> . . . . . . . .
. . . . E . . . * . 1 . . . . . . = . . B C . E . . 58 . . . . . . . .
. . . . . . . . . 1 . . . . . . . . B C . . . . . . . . . . . .
. . C . . . . . . 1 . . . . . . . . B C . . . . . . . . . . . .
. B . . . . . . . 1 . . . . . . . . B . . . . . [B C . E F G]. .
Appropriate pre-calculated attack bits are represented by 'b'-'h' here:
8 copies of rank the attack set or the attack set
attacks & l-mask -> of this line -> of the shorter diagonal
a b c . e f g h . . . . . . . h . . . . . . . .
a b c . e f g h . . . . . . g . . . . . . . . h
a b c . e f g h . . . . . f . . . . . . . . g .
a b c . e f g h . . . . e . . . . . . . . f . .
a b c . e f g h . . . . . . . . . . . . e . . .
a b c . e f g h . . c . . . . . . . . . . . . .
a b c . e f g h . b . . . . . . . . c . . . . .
a b c . e f g h a . . . . . . . . b . . . . . .
Wasn't that simple? That is why it is called kindergarten bitboards!
The trick is to share one 4KByte table by three line-directions by re-using the mask for a final intersection. Of course one may use the calculated occupied state to index rotated bitboards like tables of 32KByte each. But dividing the table size by 3*8 on the cost of that additional 'and' (and keeping the mask inside a register) is tempting. Of course - like always with computation versus memory issues - it depends on the cache- and memory using and footprint inside a particular chess program and the hardware architecture, which solution is preferable. So far L1 Cache is a rare resource, Translation Lookaside Buffer als well.
The three routines only differ by the line-mask applied. As pointed out by Aleks Peshkov, it is smarter to index by file, occupancy, since fillUpAttacks[sq&7] may be shared by two (bishop) or three (queen) line-attack getters.
Files need tad more work. Shift the board left (arithmetical right!) to the A-file to mask it. To get the inner six bits, a flip-multiplication by the c2-h7 diagonal is applied with further shift right 58. The lookup-table contains the A-file attacks, which are shifted "back" to the original file.
masked A-file * c2-h7 Diagonal = occupancy
H . . . . . . . . . . . . . . . . .[G F E D C B] . . . . . . . .
G . . . . . . . . . . . . . . 1 . . F E D C B A . . . . . . . .
F . . . . . . . . . . . . . 1 . . . E D C B A . . . . . . . . .
E . . . . . . . . . . . . 1 . . . . D C B A . . >> . . . . . . . .
D . . . . . . . * . . . . 1 . . . = . . C B A . . . 58 . . . . . . . .
C . . . . . . . . . . 1 . . . . . . B A . . . . . . . . . . . .
B . . . . . . . . . 1 . . . . . . . A . . . . . . . . . . . . .
A . . . . . . . . . . . . . . . . . . . . . . . [G F E D C B]. .
Note that the six inner bit occupancy is reversed - considered in the pre-calculated aFileAttacks array. This reversed lookup was justified to share first rank-attacks by all directions - with a dense lookup of 512 Byte. But the 4KByte tables outperform the additional multiplications and shift of the dense version - and one may alternatively multiply with the flipped diagonal, the c7-h2 anti-diagonal:
masked A-file * c7-h2 AntiDiag = occupancy
H . . . . . . . . . . . . . . . . .[B C D E F G] . . . . . . . .
G . . . . . . . . . 1 . . . . . . . A B C D E F . . . . . . . .
F . . . . . . . . . . 1 . . . . . . . A B C D E . . . . . . . .
E . . . . . . . . . . . 1 . . . . . . . A B C D >> . . . . . . . .
D . . . . . . . * . . . . . 1 . . = . . . . . A B C 58 . . . . . . . .
C . . . . . . . . . . . . . 1 . . . . . . . A B . . . . . . . .
B . . . . . . . . . . . . . . 1 . . . . . . . A . . . . . . . .
A . . . . . . . . . . . . . . . . . . . . . . . [B C D E F G]. .
Shared Rank Lookup
As often, computation versus memory size. One may share a 512Byte Lookup of the first rank by all lines with some trailing computation. Multiplying with the A-file (fill north) for ranks and diagonals, and multiplying with the diagonal for the file. Likely the additional multiplication don't pays off.
One other variation of the memory versus computation theme was encouraged by 32-bit mode. 64-bit multiplication is quite expensive in 32-bit mode - a call using three imuls. Thus, it is more efficient to use shift-or plus 32-bit multiplication, which might in fact be used in 64-bit mode as well. Piotr Cichy proposed a multiplication less parallel prefix shift approach similar to Occupancy of any Line[6] , which is a good alternative for processors with slow multiplication.
An efficient and tricky file-approach was introduced by Zach Wegner[7], using a 32KByte, rotated like lookup-table: It is quite strange, yes, but it is an out of order mapping. There are only 5 bits because each bit in the factor maps more than one bit. The trick here is the odd shift 29, so that the multiply does not overflow individual bits. I have since found that 25 and 27 will work with the same magic:
occ
. . . . . . . .
a . . . . . . .
b . . . . . . . occ | occ >> 29 * 0x01041041 with the index bracketed
c . . . . . . . ...\ ...\ ...\
d . . . . . . . d . . . . . . . 1 . . . . . . . d a[f c e b d a]
e . . . . . . . e . . a . . . . . . 1 . . . . . e b . a f c e b
f . . . . . . . f . . b . . . . . . . . 1 . . . f c . b . . f c
. . . . . . . . . . . c . . . . 1 . . . . . 1 . . . . c . . . .
The interesting thing is that this works for any masked file. In fact if it was shifted to the a-file, you could get away with the 3-bit factor 0x00041040 (but using a shift of 23).
Ranks and diagonals are trivial, this version favors rotated like memory size for less computation and same operations than file-attacks. One may therefor generalize the routine by a line-direction parameter:
A similar approach was proposed by Andrew Fan in 2009, been active in his own engine for a few years (2006 earliest recorded file time) [8].
Magic Compression
So far Kindergarten bitboards performs a perfect hashing of the up to six relevant and scattered occupied bits of any line to a six-bit index - which is a bijective mapping of 64 different occupancies per line to 64 indices for the precalculated attack sets.
If we have a closer look to the attack sets, say of a rook on the a-file, we enumerate far less disjoint sets. A rook on a1 (a8) has seven different attack-sets on that file, depending on the occupancy of a2-a7. On a2 (a7) there is even one attack set less, on a3 (a6) 2 times 5 and on a4 (a5) 3 times 4 attack-sets. Thus, there are {7, 6, 10, 12, 12, 10, 6, 7} disjoint attack-sets per square on line, or 70 in total over all eight squares.
While kindergarten bitboards apply a minimal perfect mapping of scattered bits to a six-bit index, the mapping of the attack-sets is surjective, since each of the 64 occupancies maps only up to 12 distinct sets. Of course that is because occupancies "behind" the first blocker are redundant and map the same attack.
Grant Osborne came up with the idea, derived from magic bitboards - to use different "magic" factors per square (rank), where multiplication may produce carries and enough so called constructive collisions to gain only five or even four bit indices and therefor denser tables. Since different squares may have different table sizes (16 or 32 entries), a Java-like array is used for the attacks, in C implemented as array of pointers to the arbitrary sized attack tables. The variable right shift by either 60 or 59 is encoded inside the otherwise redundant upper six bits of the magic factor, as mentioned in incorporating the shift of magic bitboards.
Grant's proposal, so far with {5,4,4,5,5,4,4,5} bit ranges for the lookups per square for vertical rook attacks, results in a 1.5 KByte array instead the 4KByte of the initial Kindergarten file attack getter [9] . Whether the effort of the rank-indexed magic-factor plus additional pointer indirection pays off the memory saving is another question, and should be tried inside a concrete chess program with its individual cache- and memory footprint.
The table demonstrates how it works for file-attack of the a3 rook with a four bit range only five relevant occupied bits, since a3 is member of the inner six bits. The empirical determined factor is 0xF2808817CAD6FF0C, six upper bits contain the right shift for the product, for this square shift 60:
occupancy (A-File)
product
index 0..15
attack set
o - outer squares don't care
x - empty or any piece
. - empty
b - Blocker - any piece
R - Rook
As a quintessence Gerd Isenberg came up with this nomination. It relies on fast 64-bit multiplication, but is otherwise quite resource friendly and a compromise between calculation and table-size.
Table of Contents
Ranks and Diagonals
Ranks and diagonals - that is their appropriate line-mask by square-index - are first intersected by the occupancy of the whole board. Doesn't matter whether the slider itself is cleared or not - it is redundant anyway, considered by the pre-calculated lookup-table.Since there is only up to one bit per file, the north-fill multiplication by the A-file maps the diagonal to the 8th rank. Or - since we only need the inner six bits, we combine the required shift left one by multiplying with the B-file. Shifting right the product by 58 (64-6) leaves the six-bit occupancy-index in the 0..63 range. For instance the diagonal-attacks of a bishop on d4. 'A'-'H' represent the masked occupied bits along this diagonal, which are either zero or one.
We need 'B'-'G' as six bit number:
The pre-calculated lookup-table contains the attacks of the first rank - but eight copies in each rank or byte. It is indexed by the six bit occupied-state ('B'-'G') and the file of the slider's square. It needs to be intersected with the same line-mask as formerly the occupancy - to map the first rank attack bits to the appropriate line - that's all. Appropriate pre-calculated attack bits are represented by 'a'-'h':
Since all ranks, diagonals and anti-diagonals are properly file-aligned, it works perfectly with some redundant occupied bits for shorter diagonals as well, like here the outer bit 'B':
Appropriate pre-calculated attack bits are represented by 'b'-'h' here:
Wasn't that simple? That is why it is called kindergarten bitboards!
The trick is to share one 4KByte table by three line-directions by re-using the mask for a final intersection. Of course one may use the calculated occupied state to index rotated bitboards like tables of 32KByte each. But dividing the table size by 3*8 on the cost of that additional 'and' (and keeping the mask inside a register) is tempting. Of course - like always with computation versus memory issues - it depends on the cache- and memory using and footprint inside a particular chess program and the hardware architecture, which solution is preferable. So far L1 Cache is a rare resource, Translation Lookaside Buffer als well.
Like other none rotated approaches, namely magic bitboards or hyperbola quintessence, the nice thing is that one can hide the implementation details behind a stateless interface. In C /C++ one may use header files with exclusive, conditional compiled inlined routines, as combinations and variations of the mentioned approaches.
The three routines only differ by the line-mask applied. As pointed out by Aleks Peshkov, it is smarter to index by file, occupancy, since fillUpAttacks[sq&7] may be shared by two (bishop) or three (queen) line-attack getters.
One may use similar structs for the line-masks than the hyperbola quintessence.
File-Attacks
Files need tad more work. Shift the board left (arithmetical right!) to the A-file to mask it. To get the inner six bits, a flip-multiplication by the c2-h7 diagonal is applied with further shift right 58. The lookup-table contains the A-file attacks, which are shifted "back" to the original file.This is how it works:
Note that the six inner bit occupancy is reversed - considered in the pre-calculated aFileAttacks array. This reversed lookup was justified to share first rank-attacks by all directions - with a dense lookup of 512 Byte. But the 4KByte tables outperform the additional multiplications and shift of the dense version - and one may alternatively multiply with the flipped diagonal, the c7-h2 anti-diagonal:
Shared Rank Lookup
As often, computation versus memory size. One may share a 512Byte Lookup of the first rank by all lines with some trailing computation. Multiplying with the A-file (fill north) for ranks and diagonals, and multiplying with the diagonal for the file. Likely the additional multiplication don't pays off.32-bit Versions
One other variation of the memory versus computation theme was encouraged by 32-bit mode. 64-bit multiplication is quite expensive in 32-bit mode - a call using three imuls. Thus, it is more efficient to use shift-or plus 32-bit multiplication, which might in fact be used in 64-bit mode as well. Piotr Cichy proposed a multiplication less parallel prefix shift approach similar to Occupancy of any Line [6] , which is a good alternative for processors with slow multiplication.An efficient and tricky file-approach was introduced by Zach Wegner [7], using a 32KByte, rotated like lookup-table:
It is quite strange, yes, but it is an out of order mapping. There are only 5 bits because each bit in the factor maps more than one bit. The trick here is the odd shift 29, so that the multiply does not overflow individual bits. I have since found that 25 and 27 will work with the same magic:
The interesting thing is that this works for any masked file. In fact if it was shifted to the a-file, you could get away with the 3-bit factor 0x00041040 (but using a shift of 23).
Ranks and diagonals are trivial, this version favors rotated like memory size for less computation and same operations than file-attacks. One may therefor generalize the routine by a line-direction parameter:
A similar approach was proposed by Andrew Fan in 2009, been active in his own engine for a few years (2006 earliest recorded file time) [8].
Magic Compression
So far Kindergarten bitboards performs a perfect hashing of the up to six relevant and scattered occupied bits of any line to a six-bit index - which is a bijective mapping of 64 different occupancies per line to 64 indices for the precalculated attack sets.If we have a closer look to the attack sets, say of a rook on the a-file, we enumerate far less disjoint sets. A rook on a1 (a8) has seven different attack-sets on that file, depending on the occupancy of a2-a7. On a2 (a7) there is even one attack set less, on a3 (a6) 2 times 5 and on a4 (a5) 3 times 4 attack-sets. Thus, there are {7, 6, 10, 12, 12, 10, 6, 7} disjoint attack-sets per square on line, or 70 in total over all eight squares.
While kindergarten bitboards apply a minimal perfect mapping of scattered bits to a six-bit index, the mapping of the attack-sets is surjective, since each of the 64 occupancies maps only up to 12 distinct sets. Of course that is because occupancies "behind" the first blocker are redundant and map the same attack.
Grant Osborne came up with the idea, derived from magic bitboards - to use different "magic" factors per square (rank), where multiplication may produce carries and enough so called constructive collisions to gain only five or even four bit indices and therefor denser tables. Since different squares may have different table sizes (16 or 32 entries), a Java-like array is used for the attacks, in C implemented as array of pointers to the arbitrary sized attack tables. The variable right shift by either 60 or 59 is encoded inside the otherwise redundant upper six bits of the magic factor, as mentioned in incorporating the shift of magic bitboards.
Grant's proposal, so far with {5,4,4,5,5,4,4,5} bit ranges for the lookups per square for vertical rook attacks, results in a 1.5 KByte array instead the 4KByte of the initial Kindergarten file attack getter [9] . Whether the effort of the rank-indexed magic-factor plus additional pointer indirection pays off the memory saving is another question, and should be tried inside a concrete chess program with its individual cache- and memory footprint.
The table demonstrates how it works for file-attack of the a3 rook with a four bit range only five relevant occupied bits, since a3 is member of the inner six bits. The empirical determined factor is 0xF2808817CAD6FF0C, six upper bits contain the right shift for the product, for this square shift 60:
x - empty or any piece
. - empty
b - Blocker - any piece
R - Rook
0xF2808817CAD6FF0C
nibble
in
product
. not attacked
x
x
x
b
R
b
o
.
.
.
1
.
1
.
x
x
x
b
R
.
o
.
.
.
1
.
1
1
x
x
b
.
R
b
o
.
.
1
1
.
1
.
x
x
b
.
R
.
o
.
.
1
1
.
1
1
x
b
.
.
R
b
o
.
1
1
1
.
1
.
x
b
.
.
R
.
o
.
1
1
1
.
1
1
b
.
.
.
R
b
o
1
1
1
1
.
1
.
b
.
.
.
R
.
o
1
1
1
1
.
1
1
.
.
.
.
R
b
o
1
1
1
1
.
1
.
.
.
.
.
R
.
o
no blocker
1
1
1
1
.
1
1
See also
Forum Posts
External Links
Nils Landgren, Lars Danielsson, Wolfgang Haffner, Esbjörn Svensson, Pat Metheny, Michael Brecker
References
What links here?
Up one Level