Bitboard+Serialization

toc =Isolating Subsets= The process of isolating subsets is performed by intersection, for single populated subsets with the two's complement.
 * Home * Board Representation * Bitboards * Bitboard Serialization**
 * [[image:Encounter.jpg width="276" height="206" link="http://www.mcescher.com/Gallery/back-bmp/LW331.jpg"]] ||~  || **Bitboard Serialization** refers to the transformation of a bitboard with up to 64 one-bits set into a list of up to 64 bit-indices aka square indices of a 8x8 board - for instance to process move-target sets for move generation. This is done in two phases, isolating none-empty [|subsets] and then transforming those more versatile subsets into lists, either bit by bit, by applying a [|bisection] scheme, where finally words or bytes may act as index of a pre-calculated database, or by perfect hashing of square lists by subsets with a limited maximum popularity, for instance move-target sets of a king or knight even with minimal perfect hashing. ||
 * M. C. Escher, Encounter, 1944 ||~  ||^   ||

Single Bits
This obvious loop approach is similar to Brian Kernighan's way to count the number of one-bits by consecutively isolating and clearing the LS1B: code format="cpp" while ( x ) { U64 ls1b = x & -x; // isolate LS1B ...  x &= x-1; // reset LS1B } code or with likely the same generated assembly: code format="cpp" if ( x ) do { U64 ls1b = x & -x; // isolate LS1B ... } while ( x &= x-1 ); code Of course we may also reset the LS1B by code format="cpp" x ^= ls1b; // reset LS1B code but todays processors like to gain more parallelism to calculate independent expressions.

Multiple Bits
Isolating none-empty subsets with possibly multiple one-bits can be applied by [|divide and conquer], that is to divide the bitboard or quad word recursively into smaller items, two double words, consisting of two words and those again of two bytes. Word and byte-wide isolated subsets may then act as an index of a pre-calculated lookup table to convert those subsets to adequate data-structures, most likely lists with up to sixteen or eight elements.

=Converting Sets to Lists=

Square Index Serialization
For most applications LS1B-isolation alone is not appropriate, but the conversion from the exponential bitboard centric world to the scalar square centric world, also called bit-scanning.

Scanning Forward
code format="cpp" if ( x ) do { int idx = bitScanForward(x); // square index from 0..63 *list++ = foo(idx, ...); } while (x &= x-1); // reset LS1B code Per definition bitScanForward reveals the index of LS1B.

Scanning Reverse
If - for some reason - we like to traverse the sets in reverse or unknown order anyway, we can not (or don't want to) rely on the independent LS1B reset. code format="cpp" if ( x ) do { int idx = bitScanReverse(x); // square index from 0..63 *list++ = foo(idx, ...); } while (x ^= powOf2[idx]) ; // or 1ULL << idx -> reset found bit code

Scanning with Reset
A win of abstraction is to use a combined bitscan with reset found bit routine. This is fine. But probably harder for compilers to generate optimal code in the if-do-while-sense, where reset last bit already sets the zero-flag. If you don't care on such micro-optimizations, this is the preferred control structure. code format="cpp" while ( x ) { int idx = bitScanAndReset(&x); // square index from 0..63 *list++ = foo(idx, ...); } code One may even don't care about the order.

Intrinsic Version
If bitscan is able to properly handle empty sets - leaving an value outside the 0..63 range (like leading or trailing zero count), we may think about to skip the leading while condition and to break on bitscan(x) > 63 for instance. That was not recommend - since the reset leaves the condition en-passant, and the computational cost of an additional bitscan or zero count was higher. If you like to play the optimization game, it might be fine for x86-64 [|Core 2 duo] thought - using bitscan and bittestandreset intrinsics or wrappers - if kept all in registers of course. code format="cpp" while (_BitScanForward64(&idx, x)) { // or reverse *list++ = foo(idx, ...); _bittestandreset64(&x, idx); } code The loop is intended to look like this in x86-64 assembly: code std          ; set direction flag bsf rax, rdx ; scan first to-bit jz  over     ; jump if no more moves loop: btr rdx, rax ; reset found bit or  eax, ecx ; combine to- with from-square stosw        ; store 16-bit move *rdi++ = move bsf rax, rdx ; scan next to-bit jnz loop     ; jump if more moves over: code 
 * input rdx - move target set
 * ecx - move from aspects
 * rdi - pointer to movelist

Black or White
With bitboard serialization one minor problem is the relative order in move generation considering side to move. Bsf scans the board in a1..h1, a2..h2, a8..h8 order, assuming little-endian rank-file mapping, which might be the desired order for an attacking black player. Traversing white pieces and target squares the same way may result in asymmetries and different search behavior of color flipped positions. Despite other features considered in move ordering, the initial order in generation has more or less influence. Therefore, it is desired to traverse the "white" bitboards with priority for the black back-rank as well. This might be done by bitscan reverse, which covers the rank symmetry, but also mirrors the files. Another alternative is to traverse a vertically flipped "white" bitboard, which can be done outside the do-while loop by a "conditional" x86-64 byte swap, and requires one further register and xor per loop cycle, which might be combined with other stuff, f.i. the from square of a move: code format="cpp" if ( x ) { U64 m = (U64)color - 1; // e.g. -1 if white, 0 for black int o = (int)m & 56; x = x ^ ((x ^ flipVertical(x)) & m); // conditional flip do { int idx = bitScanForward(x) ^ o; // square index from 0..63 *list++ = foo(idx, ...); } while (x &= x-1); // reset LS1B } code 

STL Iterator
Rein Halbersma has written a prototype of a generic C++11 bitset template that can be used to traverse a set in [|STL iterator] style, hiding bitscan and reset ... code format="cpp" typedef bit_set bitset;

void testLoop(int* p, const bitset & x) { for (auto it = x.begin; it != x.end; ++it) *p++ = *it; } /* or std::copy */ void testCopy(int* p, const bitset & x) { std::copy(x.begin, x.end, p); } code ... which yields in following X86-64 assembly, almost identical for the iterator loop and std::copy : code test rsi, rsi jne  .L13 .L7: ret .L13: bsf  rax, rsi mov  DWORD PTR [rdi], eax lea  rax, [rsi-1] add  rdi, 4 and  rsi, rax jne  .L13 jmp  .L7 code 
 * testLoop(int*, bit_set):

Hashing Multiple Bits
Similar to the idea to hash occupancies in kindergarten bitboards and magic bitboards, one may hash certain move-target subsets of one piece in one run, to lookup tables with pre-calculated moves-lists. Sounds like doing up to eight bitscans in parallel. The idea is to muliply-shift-lookup a move-target bitboard, to do an almost branch-less move-generation with pre-calculated moves inside move-lists. For instance king- and knight-moves as well as moves of sliding pieces per line: code format="cpp" moveTarget = KingAttacks[sq] & ~(ownPieces | attackedSquares); idx = (moveTarget * kingMagic[sq]) >> kingShift[sq]); movelists = kingMoveLists[sq]; movelist = movelists[idx]; moveCpy(target, movelist, movelist->n); code A king in a corner may have up to three moves. Thus there are 2^3 == 8 possible move-lists. For instance for a king on a1 (number of moves: vector of moves): code 0:{empty} 1:{a1-b1} 1:{a1-a2} 1:{a1-b2} 2:{a1-b1,a1-a2} 2:{a1-b1,a1-b2} 2:{a1-a2,a1-b2} 3:{a1-b1,a1-a2,a1-b2} code Kings on edges have 5 potential target squares, thus there are 32 possible move-lists. All other kings have 8 all the 8 neighbors with up to 256 move-lists. Similar move-list enumeration is possible with knights and others. All possible move-target subsets of kings and knights for all 64 from-squares are perfectly minimal hashtable with a magic factor of four one-bits set. 10016 possible king-move-lists and 5520 knight-move-lists. To reduce memory one may offset the sets to a "normalized" source square per king, knight and sliding piece line, implying some vector arithmetic in the board centric world considering the offset.

The less populated move-target subsets are, the less efficient this hashing technique. This might become a problem since bitboard move-generation is essentially about subsets of moves with certain properties, like most importantly fast winning captures at Cut-nodes.

=See also= > Belle | Hardware Move Generation
 * Move Generation
 * Move Ordering
 * Pieces versus Directions
 * Table-driven Move Generation
 * Traversing Subsets of a Set

=Forum Posts=
 * [|Subject: sliding move generation idea with bitboards] by Gerd Isenberg, CCC, February 19, 2006
 * [|Magic Knight- and King-Move Generation] by Gerd Isenberg, Winboard Forum, Januar 11, 2007
 * [|Re: C vs ASM] by Rein Halbersma, CCC, March 05, 2013
 * [|Symmetric move generation using bitboards] by Lasse Hansen, CCC, December 20, 2014 » BitScan

=External Links=
 * [|Serialization from Wikipedia]

=References= =What links here?= include page="Bitboard Serialization" component="backlinks" limit="40"
 * Up one Level**