Magic bitboards,
a multiply-right-shift perfect hashing algorithm to index an attack bitboard database - which leaves both line-attacks of bishop or rook in one run. Thanks to the fast 64-bit multiplication and fast and huge caches of recent processors, Magic Bitboards has become a de facto standard of modern bitboard engines, as used for instance in Crafty, Arasan, Stockfish and Houdini. While Robert Hyatt reported no immediate speed gain over Rotated Bitboards in Crafty ^{[1]} , Jon Dart mentioned a 20-25% speedup ^{[2]} in Arasan over rotated.

The magic bitboard approach was motivated by Gerd Isenberg's multi-direction hashing technique kindergarten bitboards and probably by Gerd's and Tony van Roon-Werten's early trials to map line-wise occupancies by De Bruijn- or random number multiplication ^{[4]} . Lasse Hansen had the idea to hash the up to twelve relevant occupied bits of both directions of a rook- or bishop movement simultaneously ^{[5]} .

Pradu Kannan's improvements to Lasse Hansen's initial approach was to introduce a Java-like, two-dimensional array with individual size for each square and all it's relevant occupancies ^{[6]} . Big savings in table-size - since many squares on either orthogonal or diagonal lines require less bits than others, especially considering the inner six bits. While center squares are more dense for rooks, it is the opposite for bishops ^{[7]} .

Recently, Robert Purves coined the names Plain and Fancy Magics ^{[8]} , and found Hansen's initial Plain Magics with 2 MiB table for rooks and 256 KiB for bishops nearly indistinguishable from Fancy (about 800 KiB and 38 KiB) on his Intel i5 with huge L3 smart cache, see plain versus fancy source code. In the same CCC forum thread, Robert Houdart proposed a byte lookup per square for further table reductions.

How it works

A magic move-bitboard generation technique consists of four key steps:

Mask the relevant occupancy bits to form a key. For example if you had a rook on a1, the relevant occupancy bits will be from a2-a7 and b1-g1.

Multiply the key by a "magic number" to obtain an index mapping. This magic number can be generated by brute-force trial and error quite easily although it isn't 100% certain that the magic number is the best possible (see step 3).

Right shift the index mapping by 64-n bits to create an index, where n is the number of bits in the index. A better magic number will have less bits required in the index.

Use the index to reference a preinitialized move database.

The following illustration should give an impression, how magic bitboards work. All masked relevant occupied bits are perfectly hashed to the consecutive occupied state to index the pre-calculated attack-sets. Constructive collisions, where different occupancies map same attack-sets - since different bits are outer redundant bits "behind" the first blocker, are desired and even necessary to apply a perfect hashing with N bits.

The above illustration is correct for the b1 bishop, since it has only one ray and one bit per file and works kindergarten like. In general a one to one mapping of N scattered occupied bits to N consecutive bits is not always possible due to overflows. A perfect mapping of N scattered bits to N consecutive bits is likely not minimal for most squares. It requires one or two gaps inside the consecutive N bits, to avoid collisions, blowing up the table size.

But the purpose is to perfectly hash attack-sets rather than consecutive occupied bits. The number of distinct attack-sets is much smaller than the relevant occupancies. Thus, with the help of constructive collisions, some initial guess how to map the bits, and/or trial and error, using exactly N bits is always possible. If one tries hard enough to maximize constructive collisions - even less. There are some N-1 ones reported at Best Magics so far, which will half the individual table size for some squares in the widespread Fancy Magic Bitboards approach.

Perfect Hashing

Magic bitboards applies perfect hashing. A surjective function, to map the vector of all relevant occupancies to a range of attack-sets per square. The less bits the attack-set - the closer the blockers, the more those attack-sets are shared by occupancies with different, but redundant outer squares.

The cardinality of all relevant occupancies is determined by the number of bits to map, varying from five to twelve - thus, the cardinality is the power of two the number of bits, varying from 32 to 4096.

The cardinality of distinct attack-sets is determined by the product of the length of each of the max four direction rays greater than zero (or one). The rook on d4 has 3*4*3*4 = 144 distinct attack-sets, a bishop on a8 has only 7.

The ratio of both cardinalities, that is all relevant occupancies versus the all distinct attack-sets is illustrated below: As a quarter of a board - due to the symmetry, the other squares may deduced by flipping and mirroring. Noticeable is the huge 4096/49 ratio of 2^12 occupied states versus 7 times 7 attack-sets of the edge rooks - 12 bits instead of 6. Those "expensive" squares make constructive collisions very desirable. To become more "minimal" by saving an index bit - to halve down the table for one square or the other.

bishop on square

#occs/
#attset

rook on square

A

B

C

D

A

B

C

D

8

64/7

32/6

32/10

32/ 12

8

4096/49

2048/42

2048/ 70

2048/ 84

7

32/6

32/6

32/10

32/ 12

7

2048/42

1024/36

1024/ 60

1024/ 72

6

32/10

32/10

128/40

128/ 48

6

2048/70

1024/60

1024/100

1024/120

5

32/12

32/12

128/48

512/108

5

2048/84

1024/72

1024/120

1024/144

bishop on square

rook on square

A

B

C

D

A

B

C

D

8

9.14

5.33

3.20

2.67

8

83.59

48.76

29.26

24.38

7

5.33

5.33

3.20

2.67

7

48.76

28.44

17.07

14.22

6

3.20

3.20

3.20

2.67

6

29.26

17.07

10.24

8.53

5

2.67

2.67

2.67

4.74

5

24.38

14.22

8.53

7.11

The idea to implement minimal perfect hashing by an additional 16-bit indirection turned out to be slower (see conditional compiles in Pradu Kannan's source ^{[9]} ).

Recent table sizes were about 38 KiB for the bishop attacks, but still about 800 KiB for rook attacks (Fancy). That sounds huge, considering L1 and L2 (L3) cache-sizes and number of cachelines and pages needed - we likely fetch distinct cachelines for each different square or occupancy. On the other hand caches and pages become larger in future processors. And occupancy and squares of the lookups don't change that randomly inside a search that we can still expect a lot of L1-hits. Unfortunately changes in occupancy outside the blockers and therefor not affecting the attack-set will introduce some more cache misses.

Wishing Dreams

Since there are 1428/4900 distinct attack sets for the bishop/rook attacks, we can use 11(2048 > 1428)/13(8196 > 4900) index bits for all bishop/rook attack bitboards. The table size will be 16KB for the bishop attacks and 64KB for rook attacks. To make it possible, we must find a group of magics (one per square and dependent each other). All the magics in the group must hash its all relevant occupancies into the same 11/13 bit index without any collision with others (anyhow for each square, different occupancies can hash to the same attack sets).

But the idea seems like a wishing dream. Can we find ONE of the magic group of magics? Does it exist in theory? If not, can we use more bits to reach the condition? If yes, can we find a good magic group? A good magic group means that all the hash value <=n, so we can reduce the table size to n, until it reach to 1428/4900. Enough? If all hash values are grouped by square, can we think about using the Sharing Attacks method?

Implementations

Despite its huge table size, register usage and code size are important issues as well - and here Magic Bitboards are unbeatable. There are enough variations of space-time tradeoff and implementation details of that theme for all who like to play the optimization game. C-source code with various precompiler options is available from Pradu Kannan's site. MINIMIZE_MAGIC is about Plain versus Fancy ^{[10]} , while PERFECT_MAGIC_HASH enables an additional indirection via 16-bit indices. As always, with space-time tradeoffs - it depends on the individual cache- and memory footprint inside a particular chess program and the hardware architecture, which solution is preferable.

Fancy

Fancy Magic Bitboards is the main stream implementation as proposed by Pradu Kannan with individual table sizes for each square. It requires an individual shift, to leave that many index bits remaining as determined by the population of relevant occupancies. As mentioned, if one tries hard enough to maximize constructive collisions while looking for magics, one may half the size for several individual squares, see Best Magics so far.

U64 attack_table[...];// ~840 KiB all rook and bishop attacks, less with constructive collisions optimizationstruct SMagic {
U64* ptr;// pointer to attack_table for each particular square
U64 mask;// to mask relevant squares of both lines (no outer squares)
U64 magic;// magic 64-bit factorint shift;// shift right};
SMagic mBishopTbl[64];
SMagic mRookTbl[64];
U64 bishopAttacks(U64 occ, enumSquare sq){
U64* aptr = mBishopTbl[sq].ptr;
occ &= mBishopTbl[sq].mask;
occ *= mBishopTbl[sq].magic;
occ >>= mBishopTbl[sq].shift;return aptr[occ];}
U64 rookAttacks(U64 occ, enumSquare sq){
U64* aptr = mRookTbl[sq].ptr;
occ &= mRookTbl[sq].mask;
occ *= mRookTbl[sq].magic;
occ >>= mRookTbl[sq].shift;return aptr[occ];}

Plain

Plain Magic Bitboards, Lasse Hansen's initial approach, takes much more memory for homogeneous two-dimensional attack arrays. 32 KiB per rook square, 4 KiB for each bishop square. It does not need a variable shift and therefor needs one processor register less to further reduce register pressure, despite shorter code. Unfortunately, so far, there are two rook corner squares (0 and 7), which prohibit dividing the rook table size by two.

U64 mBishopAttacks[64][512];// 256 K
U64 mRookAttacks [64][4096];// 2048Kstruct SMagic {
U64 mask;// to mask relevant squares of both lines (no outer squares)
U64 magic;// magic 64-bit factor};
SMagic mBishopTbl[64];
SMagic mRookTbl [64];
U64 bishopAttacks(U64 occ, enumSquare sq){
occ &= mBishopTbl[sq].mask;
occ *= mBishopTbl[sq].magic;
occ >>=64-9;return mBishopAttacks[sq][occ];}

Fixed shift Fancy

A fancy fixed shift variation was introduced by Onno Garms, looking for magics producing factors with appropriate reduced value ranges. That is, leave 12 bit rook indices for all squares, but with upper bit(s) clear for all possible occupancies for all the squares with less than 12 relevant bits, so that it efficiently becomes a less 12 bit index. Onno believes that it is possible to find magic factors with that property for most of those squares, to come close to the fancy table size with the advantage of a fixed shift ^{[11]} . Even an "overlapping" of square arrays is feasible, if they contain accordant unused slot gaps on both ends. Volker Annuss came up with a fixed shift solution of only 800 KiB and below ^{[12]}^{[13]}, and more recently even innovated with Black Magic Bitboards.

U64 attack_table[...];// > 800 KiB - 1.x MiB - depending on the effort on looking for magicsstruct SMagic {
U64* ptr;// pointer to attack_table for each particular square
U64 mask;// to mask relevant squares of both lines (no outer squares)
U64 magic;// magic 64-bit factor};
SMagic mBishopTbl[64];
SMagic mRookTbl[64];
U64 rookAttacks(U64 occ, enumSquare sq){
U64* aptr = mRookTbl[sq].ptr;
occ &= mRookTbl[sq].mask;
occ *= mRookTbl[sq].magic;
occ >>=64-12;return aptr[occ];}

Black Magic Bitboards

Volker Annuss' most recent innovation and further improvement of his already dense 700 KiB fixed shift fancy reference implementation is called Black Magic Bitboards, introduced in CCC in August 2017 ^{[14]}. Instead of intersection with a mask to clear the not relevant occupancy bits, he used the union with the complement of mask to set all not relevant occupancy bits.

This trick, which basically adds a huge constant to the occupancy factor, enables to find Black Magics yielding in even smaller array sizes per square, determined by maximum and minimum index - the latter no longer zero for the empty occupancy! Unfortunately so far, the individual savings with denser square arrays don't completely advance to the size of the combined overlapping table, since overlapping became harder due to smaller gaps ^{[15]} . However, 692 KiB for the complete rook and bishop attack table is the new fixed shift reference. Black Magic Bitboards are now used in Volker's engine Arminius^{[16]}^{[17]}:

U64 attack_table[88507];// 692 KiB for published black magics by Volker Annuss (new size 88316 aka 690 KiB) struct SBlackMagic {
U64* ptr;// pointer to attack_table for each particular square
U64 notmask;// to mask relevant squares of both lines (no outer squares)
U64 blackmagic;// black magic 64-bit factor};
SBlackMagic mBishopTbl[64];
SBlackMagic mRookTbl[64];
U64 rookAttacks(U64 occ, enumSquare sq){
U64* aptr = mRookTbl[sq].ptr;
occ |= mRookTbl[sq].notmask;
occ *= mRookTbl[sq].blackmagic;
occ >>=64-12;return aptr[occ];}

Byte Lookup

Since there are only up to 144 different attack sets per square, Robert Houdart proposed to lookup bytes by square and hashed occupancy for a minimal perfect hashing with an additional indirection via an offset per square, which is successfully used in his engine Houdini^{[18]} . Following sample code uses the Plain fixed shift method, while it may also applied for the dense Fancy one.

byte mBishopAttacks[64][512];// 32 K
byte mRookAttacks [64][4096];// 256 K
U64 mRookLookup[4900];// 39 K
U64 mBishopLookup[1428];// 11 Kstruct SMagic {
U64 mask;// to mask relevant squares of both lines (no outer squares)
U64 magic;// magic 64-bit factorint offset;// offset into lookup table};
SMagic mBishopTbl[64];
SMagic mRookTbl [64];
U64 bishopAttacks(U64 occ, enumSquare sq){
occ &= mBishopTbl[sq].mask;
occ *= mBishopTbl[sq].magic;
occ >>=64-9;return mBishopLookup[mBishopTbl[sq].offset+ mBishopAttack[sq][occ]];}

32-bit Magics

Optimizations are possible for 32-bit systems, as proposed by Grant Osborne^{[19]} . Instead of letting the compiler split up a 64-bit multiply, two 32-bit multiplications can be manually implemented. Multiply the low bits of the magic with the low bits of the key and add it with the product of the high bits of the magic and the high bits of the key and use the upper bits of the sum to index the move database. For the bishops one may try only one 32-bit multiplication after xoring the masked high and low half.

struct{unsignedint rookMaskLo;unsignedint rookMaskHi;unsignedint rookMagicLo;unsignedint rookMagicHi;unsignedint rookShift;
U64 *pRookAttacks;} sm[64];
U64 rookAttacks(U64 occ, enumSquare sq){unsignedint lo =(int) occ;unsignedint hi =(int)(occ >>32);
lo &= sm[sq].rookMaskLo;
hi &= sm[sq].rookMaskHi;
lo *= sm[sq].rookMagicLo;
hi *= sm[sq].rookMagicHi;
lo =(lo ^ hi)>> sm[sq].rookShift;return sm[sq].pRookAttacks[lo];}

Sharing Attacks

The development has not finished. Lasse Hansen came up with another stunning idea ^{[20]} , to use a final intersection by attacks on an empty board, to share tables by two (rooks) or even four (bishops) squares. Bishop sharing is simple to understand, since there are light and dark colored bishops with disjoint attacks-sets, able to share the pre-calculated union of two square-attacks.

The trick is to share even equal colored bishops and rooks where both attack-sets are not disjoint - but all members of the intersection are always set, since they are direct neighbors of both sliders. This is how rooks and bishops may share union-attack-sets by two resp. four squares:

On the cost of one additional and-instruction, post-masking has the potential to almost halve the rook tables - and even better for the already small bishop tables. Additionally, four union attack bitboards can be shared by eight rook- and six bishop-squares.

If the additional post-mask becomes member of a structure, it exceeds 32 byte in 64-bit mode. A structure of arrays may be preferable than an array of structures.

U64 attack_table[...];// ~400K Byte for all rook and (shared) bishop attacksstruct SMagic {
U64* ptr;// pointer to attack_table for each particular square
U64 magic;// magic 64-bit factor
U64 premask;// to mask relevant squares of both lines (no outer squares)
U64 postmask;// attacks on the otherwise empty boardint shift;// shift right};
SMagic mBishopTbl[64];
SMagic mRookTbl[64];
U64 bishopAttacks(U64 occ, enumSquare sq){
U64* aptr = mBishopTbl[sq].ptr;
occ &= mBishopTbl[sq].premask;
occ *= mBishopTbl[sq].magic;
occ >>= mBishopTbl[sq].shift;return mBishopTbl[sq].postmask& aptr[occ];}
U64 rookAttacks(U64 occ, enumSquare sq){
U64* aptr = mRookTbl[sq].ptr;
occ &= mRookTbl[sq].premask;
occ *= mRookTbl[sq].magic;
occ >>= mRookTbl[sq].shift;return mRookTbl[sq].postmask& aptr[occ];}

Incorporating the Shift

Another possible improvement was introduced by Grant Osborne^{[21]} - to don't store the variable right shift inside an own array or structure element, but to use the almost redundant upper six bit of the magic factor instead - incorporating the shift into the magic number. While using a structure for pre-, postmask, magic and pointer, it's size is 4*8 = 32-byte or - if properly aligned - the half of one cache line. The additional effort for the right shift 58 to extract the variable shift is likely hidden by the latency of the independent multiplication, improving ipc.

U64 attack_table[...];// ~400K Byte for all rook and (shared) bishop attacksstruct SMagic {
U64* ptr;// pointer to attack_table for each particular square
U64 magic;// magic 64-bit factor, including upper six bits as shift amount
U64 premask;// to mask relevant squares of both lines (no outer squares)
U64 postmask;// attacks on the otherwise empty board};// 32 Byte with 64-bit pointer!
SMagic mBishopTbl[64];
SMagic mRookTbl[64];
U64 bishopAttacks(U64 occ, enumSquare sq){
U64* aptr = mBishopTbl[sq].ptr;int shift = mBishopTbl[sq].magic>>58;
occ &= mBishopTbl[sq].premask;
occ *= mBishopTbl[sq].magic;return mBishopTbl[sq].postmask& aptr[occ >> shift];}
U64 rookAttacks(U64 occ, enumSquare sq){
U64* aptr = mRookTbl[sq].ptr;int shift = mRookTbl[sq].magic>>58;
occ &= mRookTbl[sq].premask;
occ *= mRookTbl[sq].magic;return mRookTbl[sq].postmask& aptr[occ >> shift];}

Incorporating Offset

A similar idea as in Incorporating the Shift in the domain fixed shift magics was proposed by Eugene Kotlov concerning the indirection to the stored attack array, using the 16 lower bits of the magic factor as offset ^{[22]}.

Initalization

Looking for Magics

The main issue is to find 64 magic factors for each rook and bishop square, to ensure they are free of collisions.

You may try to find your own magics - to possibly contribute to the record table for most dense fancy or plain tables. So far, there were no magics found for the expensive rook squares a1 and h1 (0, 7) with less than 12 bits, which would allow to half the plain table size. Is there any prove they don't exist - even with Black Magic?

Home * Board Representation * Bitboards * Sliding Piece Attacks * Magic BitboardsMagic bitboards,a multiply-right-shift perfect hashing algorithm to index an attack bitboard database - which leaves both line-attacks of bishop or rook in one run. Thanks to the fast 64-bit multiplication and fast and huge caches of recent processors, Magic Bitboards has become a de facto standard of modern bitboard engines, as used for instance in Crafty, Arasan, Stockfish and Houdini. While Robert Hyatt reported no immediate speed gain over Rotated Bitboards in Crafty

^{[1]}, Jon Dart mentioned a 20-25% speedup^{[2]}in Arasan over rotated.^{[3]}## Table of Contents

## History

The magic bitboard approach was motivated by Gerd Isenberg's multi-direction hashing technique kindergarten bitboards and probably by Gerd's and Tony van Roon-Werten's early trials to map line-wise occupancies by De Bruijn- or random number multiplication^{[4]}. Lasse Hansen had the idea to hash the up to twelve relevant occupied bits ofboth directionsof a rook- or bishop movement simultaneously^{[5]}.Pradu Kannan's improvements to Lasse Hansen's initial approach was to introduce a Java-like, two-dimensional array with individual size for each square and all it's relevant occupancies

^{[6]}. Big savings in table-size - since many squares on either orthogonal or diagonal lines require less bits than others, especially considering the inner six bits. While center squares are more dense for rooks, it is the opposite for bishops^{[7]}.Recently, Robert Purves coined the names Plain and Fancy Magics

^{[8]}, and found Hansen's initial Plain Magics with 2 MiB table for rooks and 256 KiB for bishops nearly indistinguishable from Fancy (about 800 KiB and 38 KiB) on his Intel i5 with huge L3 smart cache, see plain versus fancy source code. In the same CCC forum thread, Robert Houdart proposed a byte lookup per square for further table reductions.## How it works

A magic move-bitboard generation technique consists of four key steps:The following illustration should give an impression, how magic bitboards work. All masked relevant occupied bits are perfectly hashed to the consecutive occupied state to index the pre-calculated attack-sets. Constructive collisions, where different occupancies map same attack-sets - since different bits are outer redundant bits "behind" the first blocker, are desired and even necessary to apply a perfect hashing with N bits.

The above illustration is correct for the b1 bishop, since it has only one ray and one bit per file and works kindergarten like. In general a one to one mapping of N scattered occupied bits to N consecutive bits is not always possible due to overflows. A perfect mapping of N scattered bits to N consecutive bits is likely not minimal for most squares. It requires one or two gaps inside the consecutive N bits, to avoid collisions, blowing up the table size.

But the purpose is to perfectly hash attack-sets rather than consecutive occupied bits. The number of distinct attack-sets is much smaller than the relevant occupancies. Thus, with the help of constructive collisions, some initial guess how to map the bits, and/or trial and error, using exactly N bits is always possible. If one tries hard enough to maximize constructive collisions - even less. There are some N-1 ones reported at Best Magics so far, which will half the individual table size for some squares in the widespread Fancy Magic Bitboards approach.

## Perfect Hashing

Magic bitboards applies perfect hashing. A surjective function, to map the vector of all relevant occupancies to a range of attack-sets per square. The less bits the attack-set - the closer the blockers, the more those attack-sets are shared by occupancies with different, but redundant outer squares.cardinalityof allrelevant occupanciesis determined by the number of bits to map, varying from five to twelve - thus, the cardinality is the power of two the number of bits, varying from 32 to 4096.cardinalityofdistinct attack-setsis determined by the product of the length of each of the max four direction rays greater than zero (or one). The rook on d4 has 3*4*3*4 = 144 distinct attack-sets, a bishop on a8 has only 7.The

ratioof both cardinalities, that is allrelevant occupanciesversus the alldistinct attack-setsis illustrated below: As a quarter of a board - due to the symmetry, the other squares may deduced by flipping and mirroring. Noticeable is the huge 4096/49 ratio of 2^12 occupied states versus 7 times 7 attack-sets of the edge rooks - 12 bits instead of 6. Those "expensive" squares make constructive collisions very desirable. To become more "minimal" by saving an index bit - to halve down the table for one square or the other.#attset

The idea to implement minimal perfect hashing by an additional 16-bit indirection turned out to be slower (see conditional compiles in Pradu Kannan's source

^{[9]}).Recent table sizes were about 38 KiB for the bishop attacks, but still about 800 KiB for rook attacks (Fancy). That sounds huge, considering L1 and L2 (L3) cache-sizes and number of cachelines and pages needed - we likely fetch distinct cachelines for each different square or occupancy. On the other hand caches and pages become larger in future processors. And occupancy and squares of the lookups don't change that randomly inside a search that we can still expect a lot of L1-hits. Unfortunately changes in occupancy outside the blockers and therefor not affecting the attack-set will introduce some more cache misses.

## Wishing Dreams

Since there are 1428/4900distinctattack sets for the bishop/rook attacks, we can use 11(2048 > 1428)/13(8196 > 4900) index bits for all bishop/rook attack bitboards. The table size will be 16KB for the bishop attacks and 64KB for rook attacks. To make it possible, we must find a group of magics (one per square and dependent each other). All the magics in the group must hash its allrelevant occupanciesinto the same 11/13 bit index without any collision with others (anyhow for each square, different occupancies can hash to the same attack sets).But the idea seems like a wishing dream. Can we find ONE of the

magic groupof magics? Does it exist in theory? If not, can we use more bits to reach the condition? If yes, can we find a good magic group? A good magic group means that all the hash value <=n, so we can reduce the table size to n, until it reach to 1428/4900. Enough? If all hash values are grouped by square, can we think about using the Sharing Attacks method?## Implementations

Despite its huge table size, register usage and code size are important issues as well - and here Magic Bitboards are unbeatable. There are enough variations of space-time tradeoff and implementation details of that theme for all who like to play the optimization game. C-source code with various precompiler options is available from Pradu Kannan's site. MINIMIZE_MAGIC is about Plain versus Fancy^{[10]}, while PERFECT_MAGIC_HASH enables an additional indirection via 16-bit indices. As always, with space-time tradeoffs - it depends on the individual cache- and memory footprint inside a particular chess program and the hardware architecture, which solution is preferable.## Fancy

Fancy Magic Bitboards is the main stream implementation as proposed by Pradu Kannan with individual table sizes for each square. It requires an individual shift, to leave that many index bits remaining as determined by the population of relevant occupancies. As mentioned, if one tries hard enough to maximize constructive collisions while looking for magics, one may half the size for several individual squares, see Best Magics so far.## Plain

Plain Magic Bitboards, Lasse Hansen's initial approach, takes much more memory for homogeneous two-dimensional attack arrays. 32 KiB per rook square, 4 KiB for each bishop square. It does not need a variable shift and therefor needs one processor register less to further reduce register pressure, despite shorter code. Unfortunately, so far, there are two rook corner squares (0 and 7), which prohibit dividing the rook table size by two.## Fixed shift Fancy

A fancy fixed shift variation was introduced by Onno Garms, looking for magics producing factors with appropriate reduced value ranges. That is, leave 12 bit rook indices for all squares, but with upper bit(s) clear for all possible occupancies for all the squares with less than 12 relevant bits, so that it efficiently becomes a less 12 bit index. Onno believes that it is possible to find magic factors with that property for most of those squares, to come close to the fancy table size with the advantage of a fixed shift^{[11]}. Even an "overlapping" of square arrays is feasible, if they contain accordant unused slot gaps on both ends. Volker Annuss came up with a fixed shift solution of only 800 KiB and below^{[12]}^{[13]}, and more recently even innovated with Black Magic Bitboards.## Black Magic Bitboards

Volker Annuss' most recent innovation and further improvement of his already dense 700 KiB fixed shift fancy reference implementation is calledBlack Magic Bitboards, introduced in CCC in August 2017^{[14]}. Instead of intersection with a mask to clear the not relevant occupancy bits, he used the union with the complement of mask to set all not relevant occupancy bits.This trick, which basically adds a huge constant to the occupancy factor, enables to find

Black Magicsyielding in even smaller array sizes per square, determined by maximum and minimum index - the latter no longer zero for the empty occupancy! Unfortunately so far, the individual savings with denser square arrays don't completely advance to the size of the combined overlapping table, since overlapping became harder due to smaller gaps^{[15]}. However, 692 KiB for the complete rook and bishop attack table is the new fixed shift reference. Black Magic Bitboards are now used in Volker's engine Arminius^{[16]}^{[17]}:## Byte Lookup

Since there are only up to 144 different attack sets per square, Robert Houdart proposed to lookup bytes by square and hashed occupancy for a minimal perfect hashing with an additional indirection via an offset per square, which is successfully used in his engine Houdini^{[18]}. Following sample code uses the Plain fixed shift method, while it may also applied for the dense Fancy one.## 32-bit Magics

Optimizations are possible for 32-bit systems, as proposed by Grant Osborne^{[19]}. Instead of letting the compiler split up a 64-bit multiply, two 32-bit multiplications can be manually implemented. Multiply the low bits of the magic with the low bits of the key and add it with the product of the high bits of the magic and the high bits of the key and use the upper bits of the sum to index the move database. For the bishops one may try only one 32-bit multiplication after xoring the masked high and low half.## Sharing Attacks

The development has not finished. Lasse Hansen came up with another stunning idea^{[20]}, to use a final intersection by attacks on an empty board, to share tables by two (rooks) or even four (bishops) squares. Bishop sharing is simple to understand, since there are light and dark colored bishops with disjoint attacks-sets, able to share the pre-calculated union of two square-attacks.The trick is to share even equal colored bishops and rooks where both attack-sets are not disjoint - but all members of the intersection are always set, since they are direct neighbors of both sliders. This is how rooks and bishops may share union-attack-sets by two resp. four squares:

On the cost of one additional and-instruction, post-masking has the potential to almost halve the rook tables - and even better for the already small bishop tables. Additionally, four union attack bitboards can be shared by eight rook- and six bishop-squares.

If the additional post-mask becomes member of a structure, it exceeds 32 byte in 64-bit mode. A structure of arrays may be preferable than an array of structures.

## Incorporating the Shift

Another possible improvement was introduced by Grant Osborne^{[21]}- to don't store the variable right shift inside an own array or structure element, but to use the almost redundant upper six bit of the magic factor instead - incorporating the shift into the magic number. While using a structure for pre-, postmask, magic and pointer, it's size is 4*8 = 32-byte or - if properly aligned - the half of one cache line. The additional effort for the right shift 58 to extract the variable shift is likely hidden by the latency of the independent multiplication, improving ipc.## Incorporating Offset

A similar idea as in Incorporating the Shift in the domain fixed shift magics was proposed by Eugene Kotlov concerning the indirection to the stored attack array, using the 16 lower bits of the magic factor as offset^{[22]}.## Initalization

## Looking for Magics

The main issue is to find 64 magic factors for each rook and bishop square, to ensure they are free of collisions.## Magic Records

You may try to find your own magics - to possibly contribute to the record table for most dense fancy or plain tables. So far, there were no magics found for the expensive rook squares a1 and h1 (0, 7) with less than 12 bits, which would allow to half the plain table size. Is there any prove they don't exist - even with Black Magic?## See also

## Publications

2007).Magic Move-Bitboard Generation in Computer Chess, as pdf2009).New Architectures in Computer Chess, Ph.D. Thesis, Chapter 3, Magic Hash Functions for Bitboards, pdf2011).On perfect hashing of numbers with sparse digit representation via multiplication by a constant. Discrete Applied Mathematics, Vol. 159, No. 11## Forum Posts

## 2006

## 2007

^{[23]}## 2008

## 2009

## 2010 ...

## 2015 ...

^{[24]}Re: understanding fixed shift fancy magic bitboard generation by Volker Annuss, CCC, May 06, 2016

## External Links

^{[25]}## Other Magic Stuff

line-up: NguyÃªn LÃª, Claudio Puntin, Steffen Schorn, Kristjan Randalu, Bodek Janke

Youn Sun Nah, Ulf Wakenius, Lars Danielsson, Vincent Peirani

## References

2007).Magic Move-Bitboard Generation in Computer Chess, as pdf## What links here?

Up one Level