Relational operators on bitboards are the test for equality whether they are the same or not. Greater or less in the arithmetical sense is usually not relevant with bitboards ^{[5]} - instead we often compare bit for bit of two bitboards by certain bitwise boolean operations to retrieve bitwise greater, less or equal results.

Equality

In C, C++ or Java "==" is used, to test for equality, "!=" for not equal. Pascal uses "=", "<>" and has ":=" to distinguish relational equal operators from assignment.

if(a == b)-> both sets are equal
if(a != b)-> both sets are not equal

x86-mnemonics x86 has a cmp-instruction, which internally performs a subtraction to set its internal processor flags (carry, zero, overflow) accordantly, for instance the zero-flag if both sets are equal. Those flags are then used by conditional jump or move instructions.

cmp rax, rbx ; rax == rbx
je equal ; (jz) conditional jump if equal (jne, jnz for not equal)

Programmers often wonder to use -1 in C, C++ as unsigned constant. See The Two's Complement - alternately one may use ~0 to define the universal set. Since in C or C++, decimal numbers without ULL suffix are treated as 32-bit integers, constants outside the integer range need some care concerning sign or zero extension. Const declarations or using the C64 Macro is recommended:

const U64 universe = 0xffffffffffffffffULL;

To test whether a set is empty or not, one may compare with zero or use the logical not operator '!' in C, C++ or Java:

if(a ==0)-> empty set
if(!a)-> empty set
if(a !=0)-> set is not empty
if(a)-> set is not empty

To test for the universal set is less likely:

if(a == universe)-> universal set
if(a +1==0)-> universal set

Specifically, Boolean algebra deals with the set operations of intersection, union and complement, their equivalents of conjunction, disjunction and negation and their bitwise boolean operations of AND, OR and NOT to implement combinatorial logic in software. Bitwise boolean operations on 64-bit words are in fact 64 parallel operations on each bit performing one setwise operation without any "side-effects". Square mapping don't cares as long all sets use the same.

Subset
The intersection of two sets is subset of both.

Assume we have a attack set of a queen, and like to know whether the queen attacks opponent pieces it may capture, we need to 'and' the queen-attacks with the set of opponent pieces.

To prove whether set 'a' is subset of another set 'b', we compare whether the intersection equals the subset:

bool isASubsetOfB(U64 a, U64 b){return(a & b)== a;}

Disjoint Sets
To test whether two sets are disjoint - that is their intersection is empty - compiler emit the x86 test-instruction instead of and. That saves the content of a register, if the intersection is not otherwise needed:

if((a & b)==0)-> a and b are disjoint sets

In chess the bitboards of white and black pieces are obviously always disjoint, same for sets of different piece-types, such as knights or pawns. Of course this is because one square is occupied by one piece only.

The union or disjunction of two bitboards is applied by bitwise or (binary operator | in C, C++ or Java, or the keyword "OR" in Pascal). The union is superset of the intersection, while the intersection is subset of the union.

union= a | b

Truth Table
Truth table of or for one bit, one set input bits is sufficient to set the output:

a

b

a or b

0

0

0

0

1

1

1

0

1

1

1

1

Disjunction acts like bitwise maximum, max(a, b) or as addition with saturation, min(a + b, 1). It can also be interpreted as sum minus product, a + b - a*b, with possible temporary overflow of one binary digit to two - or with modulo 2 arithmetic.

x86-mnemonics x86 has general purpose instruction as well as SIMD-instructions for bitwise or:

Since white and black pieces are always disjoint, one may use addition here as well. That fails for union of attack sets, since squares may be attacked or defended by multiple pieces of course.

The complement set (absolute complement set), negation or ones' complement has it's equivalent in bitwise not (unary operator '~' in C, C++ or Java, or the keyword "NOT" in Pascal).

For instance to get the set of empty squares, we can complement the union of white and black pieces. Or we can intersect the complements of white and black pieces.

The relative complement is the absolute complement restricted to some other set. The relative complement of 'a' inside 'b' is also known as the set theoretic difference of 'b' minus 'a'. It is the set of all elements that belong to 'b' but not to 'a'. Also called 'b' without 'a'. It is the intersection of 'b' with the absolute complement of 'a'.

not_a_in_b = ~a & b
b_without_a = b & ~a

Truth Table
Truth table of relative complement for one bit:

a

b

b andnot a

0

0

0

0

1

1

1

0

0

1

1

0

The relative complement of 'a' in 'b' may be interpreted as a bitwise (a < b) relation.

x86-mnemonics x86 don't has an own general purpose instruction for relative complement, but x86-64 expansion BMI1, and SIMD-instructions:

Super minus Sub
In presumption of subtraction or exclusive or there are alternatives to calculate the relative complement - superset minus subset. We can take either the union without the complementing set - or the other set without the intersection

Logical Implication or the boolean Material conditional 'a' implies 'b' (if 'a' then 'b') is an derived boolean operation, implemented as union of the absolute complement of 'a' with 'b':

a_implies_b == ~a | b

Truth Table
Truth table of logical implication for one bit:

a

b

a implies b

0

0

1

0

1

1

1

0

0

1

1

1

Implication may be interpreted as a bitwise (a <= b) relation.

Exclusive or, also exclusive disjunction (xor, binary operator '^' in C, C++ or Java, or the keyword "XOR" in Pascal), also called symmetric difference, leaves all elements which are exclusively set in one of the two sets. Xor is really a multi purpose operation with a lot of applications not only bitboards of course.

Truth Table
Truth table of exclusive or for one bit:

a

b

a xor b

0

0

0

0

1

1

1

0

1

1

1

0

Xor implements a bitwise (a != b) relation.
It acts like a bitwise addition (modulo 2), since (1 + 1) mod 2 = 0.
It also acts like a bitwise subtraction (modulo 2).

x86-mnemonics x86 has general purpose instruction as well as SIMD-instructions for bitwise exclusive or:

Distributive Conjunction is distributive over exclusive disjunction - but not vice versa, since conjunction acts like multiplication, while xor acts as addition in the Galois fieldGF(2) :

x &(y ^ z)==(x & y)^(x & z)

Own Inverse
If applied two (even) times with the same operand, xor restores the original result. It is own inverse or an involution .

Subset
If one operand is subset of the other, xor (or subtraction) implements the relative complement.

Subtraction
While commutative, xor is a better replacement for subtracting from power of two minus one values, such as 63.

(2**n -1)- a == a ^(2**n -1) with a subset of 2**n -1

This is because it usually safes one x86 load instruction and an additional register, but uses opcodes with immediate operands - for instance:

1- a == a ^13- a == a ^37- a == a ^715- a == a ^1531- a == a ^3163- a == a ^63
...
-1- a == a ^-1

Or without And
Xor is the same as a union without the intersection - all the bits different, 0,1 or 1,0. Since the intersection is subset of the union, xor or subtraction can replace the "without" operation & ~:

a ^ b ==(a | b)&~(a & b)
a ^ b ==(a | b)^(a & b)
a ^ b ==(a | b)-(a & b)

Disjoint Sets
The symmetric difference of disjoint sets is equal to the union or arithmetical addition. Since intersection and symmetric difference are disjoint, the union might defined that way:

a | b =( a & b )^( a ^ b )
a | b =( a & b )^ a ^ b
a | b =( a & b )|( a ^ b )
a | b =( a & b )+( a ^ b )

Assume we have distinct attack sets of pawns in left or right direction. The set of all squares attacked by two pawns is the intersection, the set exclusively attacked by one pawn (either right or left) is the xor-sum, while all squares attacked by any pawn is the union, see pawn attacks.

a ^ b ==(a & ~b)|(b & ~a)
a ^ b ==(a & ~b)^(b & ~a)
a ^ b ==(a & ~b)+(b & ~a)

Toggle
Xor can be used to toggle or flip bits by a mask.

x ^= mask;

Complement
xor with the universal set -1 flips each bit and results in the ones' complement.

a ^-1== ~a

Without
Due to distributive law and since symmetric difference of set and subset is the relative complement of subset in set, there are some equivalent ways to calculate the relative complement by xor. Based on surrounding expressions or whether subexpressions such as union, intersection or symmetric difference may be reused one may prefer the one or other alternative.

a & ~b == a &(-1^ b )
a & ~b == a &( a ^ b )
a & ~b == a ^( a & b )== a -( a & b )
a & ~b == b ^( a | b )==( a | b )- b

Also note that

a & a == a &-1

Clear
Since 'a' xor 'a' is zero, it is the shorter opcode to clear a register, since it takes no immediate operand. Applied by optimizing compilers. Same is true for subtraction by the way.

... 'a' becomes 'b', but only a part of 'b', where mask is one, becomes 'a'. Bits from two Sources
Getting arbitrary, disjoint bits from two sources by a mask:

// if mask-bit is zero, bit from a, otherwise from b - since a^(a^b) == b
U64 mask = C64(0xFFFF0000FFFF0000);
U64 result = a ^((a ^ b)& mask);

a ^ ((a ^ b) & mask)
== (a & ~mask) | (b & mask)
== (a & ~mask) ^ (b & mask) because both sets of the union are disjoint
== (a & ~mask) + (b & mask) because both sets of the union are disjoint

The majority function or median operator is a function from n inputs to one output. The value of the operation is false when n/2 or fewer arguments are false, and true otherwise. For two inputs it is the intersection. Three inputs require some more computation:

Truth Table
Truth table of majority for three inputs:

a

b

c

maj(a,b,c)

0

0

0

0

0

0

1

0

0

1

0

0

0

1

1

1

1

0

0

0

1

0

1

1

1

1

0

1

1

1

1

1

major =(a & b)|(a & c)|(b & c);
major =(a & b)|((a ^ b )& c);

x86-mnemonics AVX-512VPTERNLOG imm8 = 0xe8 implements the majority function.

Greater One Sets

Greater One is a function from n inputs to one output. The value of the operation is true if more than one argument is true, false otherwise. Obviously, for two inputs it is the intersection, for three inputs it is the majority function. For more inputs it is the union of all distinct pairwise intersections, which can be expressed with setwise operators that way:

O(n^2) to O(n)
Due to distibutive law one can factor out common sets ...

(a1 &( a0))|(a2 &( a1|a0))|(a3 &(a2|a1|a0))

... with further reductions of the number of operations, also due to aggregation of the inner or-terms. Three additional operations for an increment of n, thus the former quadratic increase becomes linear.

In general, as mentioned,

requires

operations, which can be reduced to

operations.

This O(n^2) to O(n) simplification is helpful to determine for instance knight fork target squares from eight distinct knight-wise direction attack sets of potential targets, like king, queen, rooks and hanging bishops or even pawns - or any other form of at least double attacks from n attack bitboards:

Well, if you need additionally at least triple attacks, you'll get the idea how this would work as well, see also Odd and Major Digit Counts from the Population Count page.

Shifting Bitboards

In the 8*8 board centric world with one scalar square-coordinate 0..63, each of the max eight neighboring squares can be determined by adding an offset for each direction. For border squares one has to care about overflows and wraps from a-file to h-file or vice versa. Some conditional code is needed to avoid that. Such code is usually part of move generation for particular pieces.

northwest north northeast
noWe nort noEa
+7 +8 +9
\ | /
west -1 <- 0 -> +1 east
/ | \
-9 -8 -7
soWe sout soEa
southwest south southeast

In the setwise world of bitboards, where a square as member of a set is determined by an appropriate one-bit 2^square, the operation to apply such movements is shifting . Unfortunately most architectures don't support a "generalized" shift by signed values but only shift left or shift right. That makes bitboard code less general as one has usually separate code for each direction or at least for the positive and negative directions.

Shift left (<<) is arithmetically a multiplication by power of two.

Shift right (>> or >>> in Java^{[10]}) is arithmetically a division by power of two.

Since the square-index is encoded as power of two exponent inside a bitboard, the power of two multiplication or division is adding or subtracting the square-index.

The reason the bitboard type-definintion is unsigned in C, C++ is to avoid so called arithmetical shift right in opposition to logical shift right . Arithmetical shift right implies filling one-bits in from MSB-direction if the operand is negative and has MSB bit 63 set. Logical shift right always shifts in zeros - that is what we need. Java has no unsigned types, but a special unsigned shift right operator >>>.

x86-mnemonics x86 has general purpose instruction as well as SIMD-instructions for various shifts:

The advantage with bitboards is, that the shift applies to all set bits in parallel, e.g. with all pawns. Vertical shifts by +-8 don't need any under- or overflow conditions since bits simply fall out and disappear.

U64 soutOne (U64 b){return b >>8;}
U64 nortOne (U64 b){return b <<8;}

Wraps from a-file to h-file or vice versa may be considered by only shifting subsets which may not wrap.
Thus we can mask off the a- or h-file before or after a +-1,7,9 shift:

SSE2 one step only provides some optimizations according to the wraps on vectors of two bitboards.

Main application of shifts is to get attack sets or move-target sets of appropriate pieces, eg. one step for pawns and king. Applying one step multiple times may used to generate attack sets and moves of pieces like knights and sliding pieces.

For instance all push-targets of white pawns can be determined with one shift left plus intersection with empty squares.

Square-Mapping is crucial while shifting bitboards. Shifting left inside a computer word may mean shifting right on the board with little-endian file-mapping as used in most sample code here.

Rotate

For the sake of completeness - Rotate is similar to shift but wraps bits around. Rotate does not alter the number of set bits. With x86-64 like shift operand s modulo 64, each bit index i, in the 0 to 63 range, is transposed by

rotateLeft ::= i := (i + s) mod 64
rotateRight::= i := (i - s) mod 64

Additionally, following relations hold:

rotateLeft (s) == rotateRight(64-s)
rotateRight(s) == rotateLeft (64-s)

Most processors have rotate instructions, but are not supported by standard programming languages like C or Java. Some compilers provide intrinsic, processor specific functions.

U64 rotateLeft (U64 x, int s){return _rotl64(x, s);}
U64 rotateRight(U64 x, int s){return _rotr64(x, s);}

x86-mnemonics

rol rax, cl
ror rax, cl

Rotate by Shift
Otherwise rotate has to be emulated by shifts, with some chance optimizing compiler will emit exactly one rotate instruction.

U64 rotateLeft (U64 x, int s){return(x << s)|(x >>(64-s));}
U64 rotateRight(U64 x, int s){return(x >> s)|(x <<(64-s));}

Since x86-64 64-bit shifts are implicitly modulo 64 (and 63), one may replace (64-s) by -s.

Generalized Shift

shifts left for positive amounts, but right for negative amounts.

U64 genShift(U64 x, int s){return(s >0)?(x << s):(x >>-s);}

If compiler are not able to produce speculative execution of both shifts with a conditional move instruction, one may try an explicit branch-less solution:

/**
* generalized shift
* @author Gerd Isenberg
* @param x any bitboard
* @param s shift amount -64 < s < +64
* left if positive
* right if negative
* @return shifted bitboard
*/
U64 genShift(U64 x, int s){char left =(char) s;char right =-((char)(s >>8)& left);return(x >> right)<<(right + left);}

Due to the value range of the shift, one may save the arithmetical shift right in assembly:

; input
; ecx - shift amount,
; left if positive
; right if negative
; rax - bitboard to shift
mov dl, cl
and cl, ch
neg cl
shr rax, cl
add cl, dl
shl rax, cl

One Step x86-64 rot64 works like a generalized shift with positive or negative shift amount - since it internally applies an unsigned modulo 64 ( & 63) and makes -i = 64-i. We need to clear either the lower or upper bits by intersection with a mask, which might be combined with the wrap-ands for one step. It might be applied to get attacks for both sides with a direction parameter and small lookups for shift amount and wrap-ands - instead of multiple code for eight directions. Of course generalized shift will be a bit slower due to lookups and using cl as the shift amount register.

Since single populated bitboards are always power of two values, shifting 2^0 left implements pow2(square) to convert square-indices to a member of a bitboard.

U64 singleBitset = C64(1)<< square;// or lookup[square]

Shift versus Lookup
While 1 << square sounds cheap, it is rather expensive in 32-bit mode - and therefor often precalculated in a small lookup-table of 64-single bit bitboards. Also, on x86-64-processors a variable shift is restricted to the byte-register cl. Thus, two or more variable shifts are constrained by sequential execution ^{[11]}.

Test
Test a bit of a square-index by intersection-operator 'and'.

if(x & singleBitset)-> bit is set;

Set
Set a bit of a square-index by union-operator 'or'.

Set and toggle (or, xor) might the faster way to reset a bit inside a register (not, and).

x |= singleBitset;// set bit
x ^= singleBitset;// resets set bit

If singleBitset needs to preserved, an extra register is needed for the complement.

x86-Instructions x86 processor provides a bit-test instruction family (bt, bts, btr, btc) with 32- and 64-bit operands. They may be used implicitly by compiler optimization or explicitly by inline assembler or compiler intrinsics. Take care that they are applied on local variables likely registers rather than memory references:

For simplicity we assume piece plus color and captured piece are member or method of a move-structure/class.

Quiet moves toggle both from- and to-squares of the piece-bitboard, as well for the redundant union-sets:

U64 fromBB = C64(1)<< move->from;
U64 toBB = C64(1)<< move->to;
U64 fromToBB = fromBB ^ toBB;// |+
pieceBB[move->piece]^= fromToBB;// update piece bitboard
pieceBB[move->color]^= fromToBB;// update white or black color bitboard
occupiedBB ^= fromToBB;// update occupied ...
emptyBB ^= fromToBB;// ... and empty bitboard

Captures need to consider the captured piece of course:

U64 fromBB = C64(1)<< move->from;
U64 toBB = C64(1)<< move->to;
U64 fromToBB = fromBB ^ toBB;// |+
pieceBB[move->piece]^= fromToBB;// update piece bitboard
pieceBB[move->color]^= fromToBB;// update white or black color bitboard
pieceBB[move->cPiece]^= toBB;// reset the captured piece
pieceBB[move->cColor]^= toBB;// update color bitboard by captured piece
occupiedBB ^= fromBB;// update occupied, only from becomes empty
emptyBB ^= fromBB;// update empty bitboard

Similar for special moves like castling, promotions and en passant captures. Upper Squares
To get a set of all upper squares or bits, either shift ~1 or -2 left by square:

Swapping none overlapping bit-sequences in a bitboard is the base of a lot of permutation tricks.

by Position
Suppose we like to swap n bits from two none overlapping bit locations of a bitboard. The trick is to set all n least significant bits by subtracting one from n power of 2. Both substrings are shifted to bit zero, exclusive ored and masked by the n ones. This sequence is then twice shifted back to their original places, while the union (xor-union due to disjoint bits) is finally exclusive ored with the original bitboard to swap both sequences.

/**
* swap n none overlapping bits of bit-index i with j
* @param b any bitboard
* @param i,j positions of bit sequences to swap
* @param n number of consecutive bits to swap
* @return bitboard b with swapped bit-sequences
*/
U64 swapNBits(U64 b, int i, int j, int n){
U64 m =(1<< n)-1;
U64 x =((b >> i)^(b >> j))& m;return b ^(x << i)^(x << j);}

For instance swap 6 bits each, from bit-index 9 (bits named ABCDEF, either 0,1) with bit-index 41 (abcdef):

b m =(1<<6)-1
. . . . . . . . . . . . . . . .
* . . . . . . . . . . . . . . .
*|a b c d e f|* . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
*|A B C D E F|* . . . . . . . .
. . . . . . . . 111111 . .
b>> j ^ b >> i => x = .xor& m with
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . r= a ^ A
. . . . . . . . a b c d e f ** . . . . . . . . s= b ^ B
. . . . . . . . ^ . . . . . . . *=> . . . . . . . . t= c ^ C
. . . . . . . . . . . . . . . . . . . . . . . . u= d ^ D
. . . . . . . . . . . . . . . . . . . . . . . . v= e ^ E
a b c d e f ** A B C D E F * . r s t u v w . . w= f ^ F
b ^ x << i | x << j => swapNBits(9,41,6)
. . . . . . . . . . . . . . . . . . . . . . . .
* . . . . . . . . . . . . . . . * . . . . . . .
*|a b c d e f|* . r s t u v w . *|A B C D E F|*
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . ^ . . . . . . . . => . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
*|A B C D E F|* . r s t u v w . *|a b c d e f|*
. . . . . . . . . . . . . . . . . . . . . . . .

Delta Swap
To swap any none overlapping pairs we can shift by the difference (j-i, with j>i) and supply an explicit mask with a '1' on the least significant position for each pair supposed to be swapped.

/**
* swap any none overlapping pairs of bits
* that are delta places apart
* @param b any bitboard
* @param mask has a 1 on the least significant position
* for each pair supposed to be swapped
* @param delta of pairwise swapped bits
* @return bitboard b with bits swapped
*/
U64 deltaSwap(U64 b, U64 mask, int delta){
U64 x =(b ^(b >> delta))& mask;return x ^(x << delta)^ b;}

To apply the swapping of the swapNBits sample above, we call deltaSwap with delta of 32 and 0x7E00 as mask. But we may apply any arbitrary and often periodic mask pattern, as long as no overlapping occurs. The intersection of mask with (mask << delta) must therefor be empty. But we can also swap odd or even files of a bitboard by calling deltaSwap with delta of one, and mask of 0x5555555555555555:

Applications of delta swaps are flipping, mirroring and rotating. In Knuth'sThe Art of Computer Programming, Vol 4, page 13, bit permutation in general^{[12]}, he mentions 2^k delta swaps with k = {0,1,2,3,4,5,4,3,2,1,0} to obtain any arbitrary permutation. Special cases might be cheaper.

Unlike bitwise boolean operations on 64-bit words, which are in fact 64 parallel operations on each bit without any interaction between them, arithmetic operations like addition need to propagate possible carries from lower to higher bits. Despite, Add and Sub are usually as fast their bitwise boolean counterparts, because they are implemented in Hardware within the ALU of the CPU. A so called half-adder to add two bits (A, B), requires an And-Gate for the carry (C) and a Xor-Gate for the sum (S):

two_bitsum =(bitA ^ bitB)|((bitA & bitB)<<1);

To get an idea of the "complexity" of a simple addition, and how to implement an carry-lookahead adder in software with bitwise boolean and shift instructions only, and presumption on parallel prefix algorithms, this is how a 64-bit Kogge-Stone adder would look like in C:

U64 koggeStoneAdd(U64 a, U64 b){
U64 gen = a&b;// carries
U64 pro = a^b;// sum
gen |= pro &(gen <<1);
pro = pro &(pro <<1);
gen |= pro &(gen <<2);
pro = pro &(pro <<2);
gen |= pro &(gen <<4);
pro = pro &(pro <<4);
gen |= pro &(gen <<8);
pro = pro &(pro <<8);
gen |= pro &(gen <<16);
pro = pro &(pro <<16);
gen |= pro &(gen <<32);return a^b ^(gen <<1);}

Addition

Addition might be used instead of bitwise 'xor' or 'or' for a union of disjoint (intersection zero) sets, which may yield to simplification of the surrounding expression or may take advantage of certain address calculation instruction such as x86 load effective address (lea).

The enriched algebra with arithmetical and bitwise-boolean operations becomes aware with following relation - the bitwise overflows are the intersection, otherwise the sum modulo two is the symmetric difference - thus the arithmetical sum is the xor-sum plus the carries shifted left one:

x + y = (x ^ y) + 2*(x & y)
x ^ y = x + y - 2*(x & y)

This is particular interesting in SWAR-arithmetic, or if we like to compute the average without possible temporary overflows:

A lot of bit-twiddling tricks on bitboards to traverse or isolate subsets, rely on two's complement arithmetic. Most recent processors (and compiler or interpreter for these processors) use the two's complement to implement the unary minus operator for signed as well for unsigned integer types. In C it is guaranteed for unsigned integer types. Java guarantees two's complement for all implicit signed integral types char, short, int, long.

x86-mnemonics

neg rax; rax = -rax; rax *= -1

2^N is used as power operator in this paragraph not xor !

Increment of Complement
The two's complement is defined as a value, we need to add to the original value to get 2^64 which is an "overflowed" zero - since all 64-bit values are implicitly modulo 2^64. Thus, the two's complement is defined as ones' complement plus one:

-x == ~x +1

That fulfills the condition that x + (-x) == 2 ^ bitsize (2 ^ 64) which overflows to zero:

x +(-x)==0
x + ~x +1==0==> x + ~x ==-1 the universal set

Complement of Decrement
Replacing x by x - 1 in the increment of complement formula, leaves another definition - two's complement or Negation is also the ones' complement of the ones' decrement:

-x == ~(x -1)

Thus, we can reduce subtraction by addition and ones' complement:

~(x - y)== ~x + y
x - y == ~(~x + y)

Bitwise Copy/Invert
The two's complement may also defined by a bitwise copy-loop from right (LSB) to left (MSB):

Copy bits from source to destination from right to left
- until the first binary "one" is copied.
Then invert each of the remaining higher bits.

Signed-Unsigned
This works independently whether we interpret 'x' as signed or unsigned. While 0 is is the synonym for all bits clear, -1 is the synonym for all bits set in a computer word of any arbitrary bit-size, also for 64-bit words such as bitboards.

The signed-unsigned "independence" of the two's complement is the reason that processors don't need different add or sub instructions for signed or unsigned integers. The binary pattern of the result is the same, only the interpretation differs and processors flag different overflow- or underflow conditions simultaneously.

Unsigned 64-bit values as used for bitboards have this value range:

There is no "negative" zero. What makes the value range of negative values one greater than the positive numbers - and implies that

-0x8000000000000000 == 0x8000000000000000

Least Significant One

At some point bitboards require serialization, thus isolation of single populated sub-sets which are power of two values if interpreted as number. Dependent on the bitboard-api those values need a further log2(powOfTwo) to convert them into the square index range from 0 to 63. Bitwise boolean operations (and, xor, or) with two's complement or ones' decrement can compute relatives of a set x in several useful ways.

Isolation

The intersection of a none empty bitboard with it's two's complement isolates the LS1B:

Masks separated by LS1B by xor with two's complement or ones' decrement. Intersection of one's complement with decrement leaves the below mask excluding LS1B:

above_LS1B_mask = x ^-x;
below_LSB1_mask_including = x ^(x-1);
below_LSB1_mask = ~x &(x-1);

x86-mnemonics x86-64 expansion BMI1 has BLSMSK (Mask Up to Lowest Set Bit = below_LSB1_mask_including), AMD'sx86-64 expansion TBM has TZMSK (Mask From Trailing Zeros = below_LSB1_mask):

Dealing with the least significant zero bit (LS0B) or clear bit can be derived from the complement of the LS1B. AMD'sx86-64 expansion TBM has six instructions based on boolean operations with the one's increment:

The MS1B is not that simple to isolate as long we have no reverse arithmetic with carries propagating from left to right. To isolate MS1B, one needs to set all lower bits below MS1B, shift the resulting mask right by one and finally add one.
Setting all lower bits in the general case requires 63 times x |= x >> 1 which might be done in parallel prefix manner in log2(64) = 6 steps:

x |= x >>32;
x |= x >>16;
x |= x >>8;
x |= x >>4;
x |= x >>2;
x |= x >>1;
MS1B =(x >>1)+1;

Still quite expensive - better to traverse sets the other way around or rely on intrinsic functions to use special processor instructions like BitScanReverse or LeadingZeroCount, which implicitly performs not only the isolation but also the log2.

Common MS1B
Two sets have a common MS1B, if the intersection is greater than the xor sum:

if((a & b)>(a ^ b))-> a and b have common MS1B

This is because a common MS1B is set in the intersection but cleared in the xor sum. Otherwise, with no common MS1B, the xor-sum is greater except equal for two zero operands.

Multiplication

64-bit Multiplication has become awfully fast on recent processors. Shift left is of course still faster than multiplication by power of two, but if we have more than one bit set in a factor, it already makes sense to replace for instance

y =(x <<8)+(x <<16);

by

y = x *0x00010100;

Fill-Multiplication
In fact, we can replace parallel prefix left shifts like,

x |= x <<32;
x |= x <<16;
x |= x <<8;

where x has max one bit per file, and we can therefor safely replace 'or' by 'add'

x += x <<32;
x += x <<16;
x += x <<8;

by multiplication with 0x0101010101010101 (which is the A-File in little endian mapping):

De Bruijn Multiplication
Another bitboard related application of multiplication is to determine the bit-index of the least significant one bit. A isolated, single bit is multiplied with a De Bruijn sequence to implement a bitscan.

Division

64-bit Division is still a slow instruction which takes a lot of cycles - it should be avoided at runtime. Division by a power of two is done by right shift.

An interesting application to calculate various masks for delta swaps, e.g. swapping bits, bit-duos, nibbles, bytes, words and double words, is the 2-adic division of the universal set (-1) by 2^(2^i) plus one, which may be done at compile time:

See generalized flipping, mirroring and reversion. Often used masks and factors are the 2-adic division of the universal set (-1) by 2^(2^i) minus one, which results in the lowest bit of SWAR-wise bits set, bit-duos, nibbles, bytes, words and double words:

Likely 64-bit compiler will optimize modulo (and division) by reciprocal, 2^64 div constant, to perform a 64*64 = 128bit fixed point multiplication to get the quotient in the upper 64-bit, and a second multiplication and subtraction to finally get the remainder. Here some sample x86-64 assembly:

As a remainder, and to close the cycle to bitwise boolean operations, the well known trick is mentioned, to replace modulo by power of two by intersection with power of two minus one:

Home * Board Representation * Bitboards * General Setwise OperationsGeneral Setwise Operations,binary and unary operations, essential in testing and manipulating bitboards within a chess program. Relational operators on bitboards test for equality, bitwise boolean operators perform the intrinsic setwise operations

^{[1]}^{[2]}, such as intersection, union and complement. Shifting bitboards simulates piece movement, while finally arithmetical operations are used in bit-twiddling applications and to calculate various hash-indicies.Operators are denoted with focus on the C, C++, Java and Pascal programming languages, as well as the mnemonics of x86 or x86-64 Assembly language instructions including bit-manipulation (BMI1, BMI2, TBM) and SIMD expansions (MMX, SSE2, AVX, AVX2, AVX-512, XOP), Mathematical symbols, some Venn diagrams

^{[3]}, Truth tables, and bitboard diagrams where appropriate.^{[4]}## Table of Contents

## Relational

Relational operators on bitboards are the test for equality whether they are the same or not. Greater or less in the arithmetical sense is usually not relevant with bitboards^{[5]}- instead we often compare bit for bit of two bitboards by certain bitwise boolean operations to retrieve bitwise greater, less or equal results.## Equality

In C, C++ or Java "==" is used, to test for equality, "!=" for not equal. Pascal uses "=", "<>" and has ":=" to distinguish relational equal operators from assignment.x86-mnemonicsx86 has a cmp-instruction, which internally performs a subtraction to set its internal processor flags (carry, zero, overflow) accordantly, for instance the zero-flag if both sets are equal. Those flags are then used by conditional jump or move instructions.

## Empty and Universe

Two important sets are:The numerical values and setwise representations of those sets:

as Venn diagram

or bitboard diagrams

Programmers often wonder to use -1 in C, C++ as unsigned constant. See The Two's Complement - alternately one may use ~0 to define the universal set. Since in C or C++, decimal numbers without ULL suffix are treated as 32-bit integers, constants outside the integer range need some care concerning sign or zero extension. Const declarations or using the C64 Macro is recommended:

To test whether a set is empty or not, one may compare with zero or use the logical not operator '!' in C, C++ or Java:

To test for the universal set is less likely:

## Bitwise Boolean

Boolean algebra is an algebraic structure^{[6]}^{[7]}that captures essential properties of both set operations and logical operations. The properties of associativity, commutativity, and absorption, which define an ordered lattice, in conjunction with distributive and complement laws define the Algebra of sets is in fact a Boolean algebra.Specifically, Boolean algebra deals with the set operations of intersection, union and complement, their equivalents of conjunction, disjunction and negation and their bitwise boolean operations of AND, OR and NOT to implement combinatorial logic in software. Bitwise boolean operations on 64-bit words are in fact 64 parallel operations on each bit performing one setwise operation without any "side-effects". Square mapping don't cares as long all sets use the same.

## Intersection

In set theory intersection is denoted as:

In boolean algebra conjunction is denoted as:

Bitboard intersection or conjunction is performed by bitwise and (binary operator & in C, C++ or Java, and the keyword "AND" in Pascal).

Truth TableTruth table of and for one bit, for a '1' result both inputs need to be '1':

x86-mnemonicsx86 has general purpose instruction as well as SIMD-instructions for bitwise and:

SSE2-intrinsic _mm_and_si128.

AVX2-intrinsic _mm256_and_si256

AVX-512 has VPTERNLOG

IdempotentConjunction is idempotent.

CommutativeConjunction is commutative

AssociativeConjunction is associative.

SubsetThe intersection of two sets is subset of both.

Assume we have a attack set of a queen, and like to know whether the queen attacks opponent pieces it may capture, we need to 'and' the queen-attacks with the set of opponent pieces.

To prove whether set 'a' is subset of another set 'b', we compare whether the intersection equals the subset:

Disjoint SetsTo test whether two sets are disjoint - that is their intersection is empty - compiler emit the x86 test-instruction instead of and. That saves the content of a register, if the intersection is not otherwise needed:

In chess the bitboards of white and black pieces are obviously always disjoint, same for sets of different piece-types, such as knights or pawns. Of course this is because one square is occupied by one piece only.

## Union

In set theory union is denoted as:

In boolean algebra disjunction is denoted as:

The union or disjunction of two bitboards is applied by bitwise or (binary operator | in C, C++ or Java, or the keyword "OR" in Pascal). The union is superset of the intersection, while the intersection is subset of the union.

Truth TableTruth table of or for one bit, one set input bits is sufficient to set the output:

x86-mnemonicsx86 has general purpose instruction as well as SIMD-instructions for bitwise or:

SSE2-intrinsic _mm_or_si128.

AVX2-intrinsic _mm256_or_si256

AVX-512 has VPTERNLOG

IdempotentDisjunction is idempotent.

CommutativeDisjunction is commutative

AssociativeDisjunction is associative.

DistributiveDisjunction is distributive over conjunction and vice versa:

SupersetThe union of two sets is superset of both. For instance the union of all white and black pieces are the set of all occupied squares:

Since white and black pieces are always disjoint, one may use addition here as well. That fails for union of attack sets, since squares may be attacked or defended by multiple pieces of course.

## Complement Set

In set theory complement set is denoted as:

In boolean algebra negation is denoted as:

The complement set (absolute complement set), negation or ones' complement has it's equivalent in bitwise not (unary operator '~' in C, C++ or Java, or the keyword "NOT" in Pascal).

Truth TableTruth table of not for one bit:

x86-mnemonicsAvailable as general purpose instruction.

AVX-512 has VPTERNLOG

Empty SquaresThe set of empty squares for instance is the complement-set of all occupied squares and vice versa:

Don't confuse bitwise not with logical not-operator '!' inC:Complement lawsDe Morgan's laws- Complement of union (NOR ) is the intersection of the complements
- Complement of intersection (NAND or Sheffer stroke ) is the union of the complements.

For instance to get the set of empty squares, we can complement the union of white and black pieces. Or we can intersect the complements of white and black pieces.^{[8]}.## Relative Complement

In set theory relative complement is denoted as:

The relative complement is the absolute complement restricted to some other set. The relative complement of 'a' inside 'b' is also known as the

set theoretic differenceof 'b' minus 'a'. It is the set of all elements that belong to 'b' butnotto 'a'. Also called 'b' without 'a'. It is the intersection of 'b' with the absolute complement of 'a'.Truth TableTruth table of relative complement for one bit:

x86-mnemonicsx86 don't has an own general purpose instruction for relative complement, but x86-64 expansion BMI1, and SIMD-instructions:

SSE2-intrinsic _mm_andnot_si128.

AVX2-intrinsic _mm256_andnot_si256

AVX-512 has VPTERNLOG

Super minus SubIn presumption of subtraction or exclusive or there are alternatives to calculate the relative complement - superset minus subset. We can take either the union without the complementing set - or the other set without the intersection

## Implication

Logical Implication or Entailment is denoted as:

The boolean Material conditional is denoted as:

Logical Implication or the boolean Material conditional 'a' implies 'b' (if 'a' then 'b') is an derived boolean operation, implemented as union of the absolute complement of 'a' with 'b':

Truth TableTruth table of logical implication for one bit:

x86-mnemonicsAVX-512 has VPTERNLOG

## Exclusive Or

In set theory symmetric difference is denoted as:

In boolean algebra Exclusive or is denoted as:

Exclusive or, also exclusive disjunction (xor, binary operator '^' in C, C++ or Java, or the keyword "XOR" in Pascal), also called symmetric difference, leaves all elements which are exclusively set in one of the two sets. Xor is really a multi purpose operation with a lot of applications not only bitboards of course.

Truth TableTruth table of exclusive or for one bit:

It acts like a bitwise addition (modulo 2), since (1 + 1) mod 2 = 0.

It also acts like a bitwise subtraction (modulo 2).

x86-mnemonicsx86 has general purpose instruction as well as SIMD-instructions for bitwise exclusive or:

SSE2-intrinsic _mm_xor_si128.

AVX2-intrinsic _mm256_xor_si256

AVX-512 has VPTERNLOG

CommutativeExclusive disjunction is commutative

AssociativeXor is associative as well.

DistributiveConjunction is distributive over exclusive disjunction - but

notvice versa, since conjunction acts like multiplication, while xor acts as addition in the Galois field GF(2) :Own InverseIf applied two (even) times with the same operand, xor restores the original result. It is own inverse or an involution .

SubsetIf one operand is subset of the other, xor (or subtraction) implements the relative complement.

SubtractionWhile commutative, xor is a better replacement for subtracting from power of two minus one values, such as 63.

This is because it usually safes one x86 load instruction and an additional register, but uses opcodes with immediate operands - for instance:

Or without AndXor is the same as a union without the intersection - all the bits different, 0,1 or 1,0. Since the intersection is subset of the union, xor or subtraction can replace the "without" operation & ~:

Disjoint SetsThe symmetric difference of disjoint sets is equal to the union or arithmetical addition. Since intersection and symmetric difference are disjoint, the union might defined that way:

Assume we have distinct attack sets of pawns in left or right direction. The set of all squares attacked by two pawns is the intersection, the set exclusively attacked by one pawn (either right or left) is the xor-sum, while all squares attacked by any pawn is the union, see pawn attacks.

Union of ComplementsThe symmetric difference is equivalent to the union of both relative complements. Since both relative complements are disjoint, bitwise or or add can replaced by xor itself:

ToggleXor can be used to toggle or flip bits by a mask.

Complementxor with the universal set -1 flips each bit and results in the ones' complement.

WithoutDue to distributive law and since symmetric difference of set and subset is the relative complement of subset in set, there are some equivalent ways to calculate the relative complement by xor. Based on surrounding expressions or whether subexpressions such as union, intersection or symmetric difference may be reused one may prefer the one or other alternative.

Also note that

ClearSince 'a' xor 'a' is zero, it is the shorter opcode to clear a register, since it takes no immediate operand. Applied by optimizing compilers. Same is true for subtraction by the way.

Xor SwapThree xors on the same registers swap their content: (Note: this only works when a and b are stored on distinct memory adresses!)

If we provide an intersection by a mask, ...

... 'a' becomes 'b', but only a part of 'b', where mask is one, becomes 'a'.

Bits from two SourcesGetting arbitrary, disjoint bits from two sources by a mask:

This takes one instruction less, than the union of relative complement of the mask in 'a' with intersection of mask with 'b'.

XOR-applications and affairs^{[9]}## Equivalence

If and only if is denoted as:

Logical equivalence is denoted as:

Logical equality, logical equivalence or biconditional (if and only if, XNOR ) is the complement of xor.

Truth TableTruth table of equivalence or for one bit:

x86-mnemonicsAVX-512 has VPTERNLOG

## Majority

The majority function ormedian operatoris a function from n inputs to one output. The value of the operation is false when n/2 or fewer arguments are false, and true otherwise. For two inputs it is the intersection. Three inputs require some more computation:Truth TableTruth table of majority for three inputs:

x86-mnemonicsAVX-512 VPTERNLOG imm8 = 0xe8 implements the majority function.

## Greater One Sets

Greater Oneis a function from n inputs to one output. The value of the operation is true if more than one argument is true, false otherwise. Obviously, for two inputs it is the intersection, for three inputs it is the majority function. For more inputs it is the union of all distinct pairwise intersections, which can be expressed with setwise operators that way:With four bitboards this is equivalent to:

with

operations - that is 11 for n == 4.

O(n^2) to O(n)Due to distibutive law one can factor out common sets ...

... with further reductions of the number of operations, also due to aggregation of the inner or-terms. Three additional operations for an increment of n, thus the former quadratic increase becomes linear.

In general, as mentioned,

requires

operations, which can be reduced to

operations.

This O(n^2) to O(n) simplification is helpful to determine for instance knight fork target squares from eight distinct knight-wise direction attack sets of potential targets, like king, queen, rooks and hanging bishops or even pawns - or any other form of at least double attacks from n attack bitboards:

Well, if you need additionally at least triple attacks, you'll get the idea how this would work as well, see also Odd and Major Digit Counts from the Population Count page.

## Shifting Bitboards

In the 8*8 board centric world with one scalar square-coordinate 0..63, each of the max eight neighboring squares can be determined by adding an offset for each direction. For border squares one has to care about overflows and wraps from a-file to h-file or vice versa. Some conditional code is needed to avoid that. Such code is usually part of move generation for particular pieces.Code samples and bitboard diagrams rely on Little endian file and rank mapping.In the setwise world of bitboards, where a square as member of a set is determined by an appropriate one-bit 2^square, the operation to apply such movements is shifting . Unfortunately most architectures don't support a "generalized" shift by signed values but only shift left or shift right. That makes bitboard code less general as one has usually separate code for each direction or at least for the positive and negative directions.

^{[10]}) is arithmetically a division by power of two.Since the square-index is encoded as power of two exponent inside a bitboard, the power of two multiplication or division is adding or subtracting the square-index.

The reason the bitboard type-definintion is unsigned in C, C++ is to avoid so called arithmetical shift right in opposition to logical shift right . Arithmetical shift right implies filling one-bits in from MSB-direction if the operand is negative and has MSB bit 63 set. Logical shift right always shifts in zeros - that is what we need. Java has no unsigned types, but a special unsigned shift right operator >>>.

x86-mnemonicsx86 has general purpose instruction as well as SIMD-instructions for various shifts:

SSE2-intrinsics with variable register or constant immediate shift amounts, working on vectors of two bitboards:

XOP has individual, generalized shifts for each of two bitboards and also has byte-wise shifts

AVX2 has individual shifts for each of four bitboards:

## One Step Only

The advantage with bitboards is, that the shift applies to all set bits in parallel, e.g. with all pawns. Vertical shifts by +-8 don't need any under- or overflow conditions since bits simply fall out and disappear.Wraps from a-file to h-file or vice versa may be considered by only shifting subsets which may not wrap.

Thus we can mask off the a- or h-file before or after a +-1,7,9 shift:

Post-shift masks, ...

... and pre-shift, with the mirrored file masks.

SSE2 one step only provides some optimizations according to the wraps on vectors of two bitboards.

Main application of shifts is to get attack sets or move-target sets of appropriate pieces, eg.

one stepfor pawns and king. Applying one stepmultipletimes may used to generate attack sets and moves of pieces like knights and sliding pieces.For instance all push-targets of white pawns can be determined with one shift left plus intersection with empty squares.

Square-Mapping is crucial while shifting bitboards. Shifting left inside a computer word may mean shifting right on the board with little-endian file-mapping as used in most sample code here.

## Rotate

For the sake of completeness - Rotate is similar to shift but wraps bits around. Rotate does not alter the number of set bits. With x86-64 like shift operand s modulo 64, each bit index i, in the 0 to 63 range, is transposed byAdditionally, following relations hold:

Most processors have rotate instructions, but are not supported by standard programming languages like C or Java. Some compilers provide intrinsic, processor specific functions.

x86-mnemonicsRotate by ShiftOtherwise rotate has to be emulated by shifts, with some chance optimizing compiler will emit exactly one rotate instruction.

Since x86-64 64-bit shifts are implicitly modulo 64 (and 63), one may replace (64-s) by -s.

## Generalized Shift

shifts left for positive amounts, but right for negative amounts.If compiler are not able to produce speculative execution of both shifts with a conditional move instruction, one may try an explicit branch-less solution:

Due to the value range of the shift, one may save the arithmetical shift right in assembly:

One Stepx86-64 rot64 works like a generalized shift with positive or negative shift amount - since it internally applies an unsigned modulo 64 ( & 63) and makes -i = 64-i. We need to clear either the lower or upper bits by intersection with a mask, which might be combined with the wrap-ands for one step. It might be applied to get attacks for both sides with a direction parameter and small lookups for shift amount and wrap-ands - instead of multiple code for eight directions. Of course generalized shift will be a bit slower due to lookups and using cl as the shift amount register.

The avoidWrap masks by some arbitrary dir8 enumeration and shift amount:

## See also

## Bit by Square

Since single populated bitboards are always power of two values, shifting 2^0 left implements pow2(square) to convert square-indices to a member of a bitboard.The inverse function square = log2(x), is topic of bitscan and bitboard serialization.Shift versus LookupWhile 1 << square sounds cheap, it is rather expensive in 32-bit mode - and therefor often precalculated in a small lookup-table of 64-single bit bitboards. Also, on x86-64-processors a variable shift is restricted to the byte-register cl. Thus, two or more variable shifts are constrained by sequential execution

^{[11]}.TestTest a bit of a square-index by intersection-operator 'and'.

SetSet a bit of a square-index by union-operator 'or'.

ToggleToggle a bit of square-index by xor.

ResetReset a bit of square-index by relative complement of the single bit.

Set and toggle (or, xor) might the faster way to reset a bit inside a register (not, and).

If singleBitset needs to preserved, an extra register is needed for the complement.

x86-Instructionsx86 processor provides a bit-test instruction family (bt, bts, btr, btc) with 32- and 64-bit operands. They may be used implicitly by compiler optimization or explicitly by inline assembler or compiler intrinsics. Take care that they are applied on local variables likely registers rather than memory references:

## Update by Move

This technique to toggle bits by square is likely used to initialize or update the bitboard board-definition. While making or unmaking moves, the single bit either correspondents with the from- or to-square of the move. Which particular bitboard has to be updated depends on the moving piece or captured piece.For simplicity we assume piece plus color and captured piece are member or method of a move-structure/class.Quiet moves toggle both from- and to-squares of the piece-bitboard, as well for the redundant union-sets:

Captures need to consider the captured piece of course:

Similar for special moves like castling, promotions and en passant captures.

Upper SquaresTo get a set of all upper squares or bits, either shift ~1 or -2 left by square:

for instance d4 (27)

Lower SquaresLower squares are simply Bit by Square minus one.

for instance d4 (27)

## Swapping Bits

Swapping none overlapping bit-sequences in a bitboard is the base of a lot of permutation tricks.by PositionSuppose we like to swap n bits from two none overlapping bit locations of a bitboard. The trick is to set all n least significant bits by subtracting one from n power of 2. Both substrings are shifted to bit zero, exclusive ored and masked by the n ones. This sequence is then twice shifted back to their original places, while the union (xor-union due to disjoint bits) is finally exclusive ored with the original bitboard to swap both sequences.

For instance swap 6 bits each, from bit-index 9 (bits named ABCDEF, either 0,1) with bit-index 41 (abcdef):

Delta SwapTo swap any none overlapping pairs we can shift by the difference (j-i, with j>i) and supply an explicit mask with a '1' on the least significant position for each pair supposed to be swapped.

To apply the swapping of the swapNBits sample above, we call deltaSwap with delta of 32 and 0x7E00 as mask. But we may apply any arbitrary and often periodic mask pattern, as long as no overlapping occurs. The intersection of mask with (mask << delta) must therefor be empty. But we can also swap odd or even files of a bitboard by calling deltaSwap with delta of one, and mask of 0x5555555555555555:

Applications of delta swaps are flipping, mirroring and rotating. In Knuth's

The Art of Computer Programming, Vol 4, page 13, bit permutation in general^{[12]}, he mentions 2^k delta swaps with k = {0,1,2,3,4,5,4,3,2,1,0} to obtain any arbitrary permutation. Special cases might be cheaper.## Arithmetic Operations

At the first glance, arithmetic operations, that is addition, subtraction, multiplication and division, doesn't make much sense with bitboards. Still, there are some bit-twiddling applications related to least significant one bit (LS1B), to enumerate all subsets of a set or sliding attack generation. Multiplication of certain pattern has some applications as well, most likely to calculate hash-indicies of masked occupancies.## Derived from Bitwise

Unlike bitwise boolean operations on 64-bit words, which are in fact 64 parallel operations on each bit without any interaction between them, arithmetic operations like addition need to propagate possible carries from lower to higher bits. Despite, Add and Sub are usually as fast their bitwise boolean counterparts, because they are implemented in Hardware within the ALU of the CPU. A so called half-adder to add two bits (A, B), requires an And-Gate for the carry (C) and a Xor-Gate for the sum (S):To get an idea of the "complexity" of a simple addition, and how to implement an carry-lookahead adder in software with bitwise boolean and shift instructions only, and presumption on parallel prefix algorithms, this is how a 64-bit Kogge-Stone adder would look like in C:

## Addition

Addition might be used instead of bitwise 'xor' or 'or' for a union of disjoint (intersection zero) sets, which may yield to simplification of the surrounding expression or may take advantage of certain address calculation instruction such as x86 load effective address (lea).The enriched algebra with arithmetical and bitwise-boolean operations becomes aware with following relation - the bitwise overflows are the intersection, otherwise the sum modulo two is the symmetric difference - thus the arithmetical sum is the xor-sum plus the carries shifted left one:

This is particular interesting in SWAR-arithmetic, or if we like to compute the average without possible temporary overflows:

x86-mnemonics## Subtraction

Subtraction (like xor) might be used to implement the relative complement, of a subset inside it's superset. As mentioned, subtraction may be useful in calculating sliding attacks.x86-mnemonics## The Two's Complement

A lot of bit-twiddling tricks on bitboards to traverse or isolate subsets, rely on two's complement arithmetic. Most recent processors (and compiler or interpreter for these processors) use the two's complement to implement the unary minus operator for signed as well for unsigned integer types. In C it is guaranteed for unsigned integer types. Java guarantees two's complement for all implicit signed integral types char, short, int, long.x86-mnemonics2^N is used as power operator in this paragraph not xor !Increment of ComplementThe two's complement is defined as a value, we need to add to the original value to get 2^64 which is an "overflowed" zero - since all 64-bit values are implicitly modulo 2^64. Thus, the two's complement is defined as

ones' complement plus one:That fulfills the condition that x + (-x) == 2 ^ bitsize (2 ^ 64) which overflows to zero:

Complement of DecrementReplacing x by x - 1 in the increment of complement formula, leaves another definition - two's complement or Negation is also the ones' complement of the ones' decrement:

Thus, we can reduce subtraction by addition and ones' complement:

Bitwise Copy/InvertThe two's complement may also defined by a bitwise copy-loop from right (LSB) to left (MSB):

Signed-UnsignedThis works independently whether we interpret 'x' as signed or unsigned. While 0 is is the synonym for all bits clear, -1 is the synonym for all bits set in a computer word of any arbitrary bit-size, also for 64-bit words such as bitboards.

The signed-unsigned "independence" of the two's complement is the reason that processors don't need different add or sub instructions for signed or unsigned integers. The binary pattern of the result is the same, only the interpretation differs and processors flag different overflow- or underflow conditions simultaneously.

Unsigned 64-bit values as used for bitboards have this value range:

With signed interpretation, the positive numbers are subset of the unsigned with MSB clear:

Negative numbers have MSB set to one, thus the sign bit interpretation

There is no "negative" zero. What makes the value range of negative values one greater than the positive numbers - and implies that

## Least Significant One

At some point bitboards require serialization, thus isolation of single populated sub-sets which are power of two values if interpreted as number. Dependent on the bitboard-api those values need a further log2(powOfTwo) to convert them into the square index range from 0 to 63. Bitwise boolean operations (and, xor, or) with two's complement or ones' decrement can compute relatives of a set x in several useful ways.## Isolation

The intersection of a none empty bitboard with it's two's complement isolates the LS1B:With some arbitrary sample set:

Some C++ compiler warn -x still unsigned - (0-x) may used to avoid that with no overhead.

x86-mnemonicsx86-64 expansion BMI1 has LS1B bit isolation:

BMI1-intrinsic _blsi_u32/64.

AMD's x86-64 expansion TBM further has a Isolate Lowest Set Bit and Complement instruction, which applies De Morgan's law to get the complement of the LS1B:

## Reset

The intersection of a none empty bitboard with it's ones' decrement resets the LS1B^{[13]}:With some arbitrary sample set:

... since we already know two's complement (-x) and ones' decrement (x-1) are complement sets.

x86-mnemonicsx86-64 expansion BMI1 has LS1B bit reset:

BMI1-intrinsic _blsr_u32/64.

## Separation

Masks separated by LS1B by xor with two's complement or ones' decrement. Intersection of one's complement with decrement leaves the below mask excluding LS1B:With some arbitrary sample set:

x86-mnemonicsx86-64 expansion BMI1 has BLSMSK (Mask Up to Lowest Set Bit = below_LSB1_mask_including), AMD's x86-64 expansion TBM has TZMSK (Mask From Trailing Zeros = below_LSB1_mask):

BMI1-intrinsic _blsmsk_u32/64.

## Smearing

To smear the LS1B up and down, we use the union with two's complement or ones' decrement:With some arbitrary sample set:

x86-mnemonicsAMD's x86-64 expansion TBM has a Fill From Lowest Set Bit instruction:

## Least Significant Zero

Dealing with the least significant zero bit (LS0B) or clear bit can be derived from the complement of the LS1B. AMD's x86-64 expansion TBM has six instructions based on boolean operations with the one's increment:## Most Significant One

The MS1B is not that simple to isolate as long we have no reverse arithmetic with carries propagating from left to right. To isolate MS1B, one needs to set all lower bits below MS1B, shift the resulting mask right by one and finally add one.Setting all lower bits in the general case requires 63 times x |= x >> 1 which might be done in parallel prefix manner in log2(64) = 6 steps:

Still quite expensive - better to traverse sets the other way around or rely on intrinsic functions to use special processor instructions like BitScanReverse or LeadingZeroCount, which implicitly performs not only the isolation but also the log2.

Common MS1BTwo sets have a common MS1B, if the intersection is greater than the xor sum:

This is because a common MS1B is set in the intersection but cleared in the xor sum. Otherwise, with no common MS1B, the xor-sum is greater except equal for two zero operands.

## Multiplication

64-bit Multiplication has become awfully fast on recent processors. Shift left is of course still faster than multiplication by power of two, but if we have more than one bit set in a factor, it already makes sense to replace for instanceby

Fill-MultiplicationIn fact, we can replace parallel prefix left shifts like,

where x has max one bit per file, and we can therefor safely replace 'or' by 'add'

by multiplication with 0x0101010101010101 (which is the A-File in little endian mapping):

See Kindergarten-Bitboards- or Magic-Bitboards as applications of fill-multiplication.

De Bruijn MultiplicationAnother bitboard related application of multiplication is to determine the bit-index of the least significant one bit. A isolated, single bit is multiplied with a De Bruijn sequence to implement a bitscan.

## Division

64-bit Division is still a slow instruction which takes a lot of cycles - it should be avoided at runtime. Division by a power of two is done by right shift.An interesting application to calculate various masks for delta swaps, e.g. swapping bits, bit-duos, nibbles, bytes, words and double words, is the 2-adic division of the universal set (-1) by 2^(2^i) plus one, which may be done at compile time:

See generalized flipping, mirroring and reversion. Often used masks and factors are the 2-adic division of the universal set (-1) by 2^(2^i) minus one, which results in the lowest bit of SWAR-wise bits set, bit-duos, nibbles, bytes, words and double words:

## Modulo

Modular arithmetic with 64-bit modulo by a constant, has applications in Cryptography^{[14]}, Hashing, and with Bitboards in Bit Scanning, Population Count and Congruent Modulo Bitboards for Sliding Piece Attacks.## Casting out 255

Similar to Casting out nines with decimals and due to the congruence relationcasting out 255 can be used to add all the eight bytes within a SWAR-wise 64-bit quad word if the sum is less than 255, as mentioned, applicable in Population Count and Congruent Modulo Bitboards - Casting out 255.

## Reciprocal Multiplication

Likely 64-bit compiler will optimize modulo (and division) by reciprocal, 2^64 div constant, to perform a 64*64 = 128bit fixed point multiplication to get the quotient in the upper 64-bit, and a second multiplication and subtraction to finally get the remainder. Here some sample x86-64 assembly:## Power of Two

As a remainder, and to close the cycle to bitwise boolean operations, the well known trick is mentioned, to replace modulo by power of two by intersection with power of two minus one:## Selected Publications

## 1847 ...

1847).The Mathematical Analysis of Logic, Being an Essay towards a Calculus of Deductive Reasoning. Macmillan, Barclay & Macmillan1848).The Calculus of Logic. Cambridge and Dublin Mathematical Journal, Vol. III1860).Syllabus of a Proposed System of Logic. Walton & Malbery1867).On an Improvement in Boole's Calculus of Logic. Proceedings of the American Academy of Arts and Sciences, Series Vol. 71874).Ueber eine Eigenschaft des Inbegriffes aller reellen algebraischen Zahlen. Journal für die reine und angewandte Mathematik, No. 771880).On the Algebra of Logic. American Journal of Mathematics, Vol. 31880).On the Algebra of Logic. American Journal of Mathematics, Vol. 31880).On the Diagrammatic and Mechanical Representation of Propositions and Reasonings. Philosophical Magazine, Vol. 9, No. 51881).Symbolic Logic. MacMillan & Co.## 1950 ...

1952).Programming for High-Speed Electronic Computers. (Программирование для электронных счетных машин)1961).Bitwise operations. Communications of the ACM, Vol. 4, No. 3## 2000 ...

2002, 2012).Hacker's Delight. Addison-Wesley2009).The Art of Computer Programming, Volume 4, Fascicle 1: Bitwise tricks & techniques, as Pre-Fascicle 1a postscript## Forum Posts

## External Links

## Sets

Naive set theory from Wikipedia

Zermelo–Fraenkel set theory from Wikipedia » Ernst Zermelo, Abraham Fraenkel

## Algebra

## Logic

## Operations

## Setwise

Intersection (set theory) from Wikipedia

Union (set theory) from Wikipedia

Complement (set theory) from Wikipedia

## Bitwise

Logical conjunction from Wikipedia

Logical disjunction from Wikipedia

Exclusive or from Wikipedia

Negation from Wikipedia

Bit Shifts from Wikipedia

Circular shift from Wikipedia

## Arithmetic

Addition from Wikipedia

Subtraction from Wikipedia

Two's complement from Wikipedia

Multiplication from Wikipedia

Division from Wikipedia

Modulo operation from Wikipedia

## Modular arithmetic

## Misc

## References

1980).The Early Development of Programming in the USSR. in Nicholas C. Metropolis (ed.)A History of Computing in the Twentieth Century. Academic Press, preprint pp. 431952).Programming for High-Speed Electronic Computers. (Программирование для электронных счетных машин)1880).On the Diagrammatic and Mechanical Representation of Propositions and Reasonings. Philosophical Magazine, Vol. 9, No. 591847).The Mathematical Analysis of Logic, Being an Essay towards a Calculus of Deductive Reasoning. Macmillan, Barclay & Macmillan1880).On the Algebra of Logic. American Journal of Mathematics, Vol. 31860).Syllabus of a Proposed System of Logic. Walton & Malbery1972).Perceptrons: An Introduction to Computational Geometry. The MIT Press, ISBN 0-262-63022-22009).The Art of Computer Programming, Volume 4, Fascicle 1: Bitwise tricks & techniques, as Pre-Fascicle 1a postscript1960).A technique for counting ones in a binary computer. Communications of the ACM, Volume 3, 1960## What links here?

Up one Level