Miss-predicted branches causes huge penalties on todays super pipelined processors. While processors become smarter to predict branches with several heuristics, branches on random data should be avoided.
The techniques shown here often use arithmetical shift right (by bit-width - 1, that is 31 for 32-bit double words as integers) to determine a mask of sign-bits, either all bits set (-1) or all bits clear 0. x86 compiler may emit an cdq (Convert Double to Quad) instruction, which sign extends a 32 bit register to two 32 bit registers. Since arithmetical shift right is not strictly specified in C, it might be not portable through all compilers and architectures. Note that in C, a comparison or a boolean expression with the result {false, true} might be treated as numerical value {0, 1}.
The tricks shown here, might be useful if compiler don't support those functions or don't generate the intended branchless assembly and the input is quite random, so that the branch prediction heuristics will fail often.
Absolute value of an Integer
Abs as C intrinsic [1] is likely implemented based on following code snippet ...
intabs(int a){int s = a >>31;// cdq, signed shift, -1 if negative, else 0
a ^= s;// ones' complement if negative
a -= s;// plus one if negative -> two's complement if negativereturn a;}
... by compilers, on x86 a sequence three instructions: {cdq, xor, sub} or {cdq, add, xor}.
Maximum of two Integers
By CRT
Microsoft Visual C Run-Time Library provides a _max macro [2].
By Sign-Mask
Following trick only works for a reduced integer range of effectively one bit less, which is most often no problem for 32-bit integers in chess programs, like scores and that like: INT_MIN <= a - b <= INT_MAX:
If a is greater b, a - 0 is returned, otherwise a - (a - b) == +b
int max(int a, int b){int diff = a - b;int dsgn = diff >>31;return a -(diff & dsgn);}
Minimum of two Integers
By CRT
Microsoft Visual C Run-Time Library provides a _min macro [3].
By Sign-Mask
Following trick only works for a reduced integer range of effectively one bit less, which is most often no problem for 32-bit integers in chess programs, like scores and that like: INT_MIN <= a - b <= INT_MAX:
If a is greater b, b + 0 is returned, otherwise b + (a - b) == +a
int min(int a, int b){int diff = a - b;int dsgn = diff >>31;return b +(diff & dsgn);}
Conditional Expressions
Conditional Assignment
A conditional assignment in C or C++ may be implemented by compilers as x86 conditional move (cmovCC) instruction.
x =( a > b )? C : D;
otherwise it might be reformulated with conditional increment:
x = D;if( a > b ) x += C - D;
Conditional Increment
If a > b is hard to predict,
if( a > b ) x += C;
it might be reformulated branch-less in C, which likely emits a x86 setCC instruction:
x +=-( a > b )& C;// with any boolean expression
With a reduced value range and INT_MIN <= b - a <= INT_MAX, greater and less relations might be implemented using a sign mask:
x +=(( b - a )>>31)& C;
Conditional Write
During list generation, while conditionally writing data to an array with post-incrementing a pointer or index, one may try to avoid the conditional branch by storing always and to increment the pointer by the condition, which is either 0 or 1 [4][5].
There are two parts to predicting a branch on x86. 1. Is the branch taken (for a call it is always "yes")? 2. Where is the branch going?
(2) is more interesting because when you fetch and then predict the branch, you don't have a clue where it is going since the register being used might not yet be ready for access. The solution is a "branch target buffer" which simply predicts the branch AND where it is going, based on the last time it was encountered. You can do a conditional jump to an indirect address and predict the jump correctly and miss the address (entire thing is then predicted wrong) or you can predict the address correctly and miss the jump (again, entire thing is wrong), or you can miss both. Only when you get both right do you have any success.
Your code always jumps to the same place, whether you use the explicit jump address, or the indirect address through a register. When you get into a call where the address changes, performance will drop. Your code really is not testing that at all...
Table of Contents
Miss-predicted branches causes huge penalties on todays super pipelined processors. While processors become smarter to predict branches with several heuristics, branches on random data should be avoided.
The techniques shown here often use arithmetical shift right (by bit-width - 1, that is 31 for 32-bit double words as integers) to determine a mask of sign-bits, either all bits set (-1) or all bits clear 0. x86 compiler may emit an cdq (Convert Double to Quad) instruction, which sign extends a 32 bit register to two 32 bit registers. Since arithmetical shift right is not strictly specified in C, it might be not portable through all compilers and architectures. Note that in C, a comparison or a boolean expression with the result {false, true} might be treated as numerical value {0, 1}.
Abs, Max, Min
It is recommend to use functions provided by the programming language. In C or C++ one should use appropriate compiler intrinsics and/or template functions provided by the C Runtime Library or Standard Template Library.The tricks shown here, might be useful if compiler don't support those functions or don't generate the intended branchless assembly and the input is quite random, so that the branch prediction heuristics will fail often.
Absolute value of an Integer
Abs as C intrinsic [1] is likely implemented based on following code snippet ...... by compilers, on x86 a sequence three instructions: {cdq, xor, sub} or {cdq, add, xor}.
Maximum of two Integers
By CRT
Microsoft Visual C Run-Time Library provides a _max macro [2].By Sign-Mask
Following trick only works for a reduced integer range of effectively one bit less, which is most often no problem for 32-bit integers in chess programs, like scores and that like: INT_MIN <= a - b <= INT_MAX:If a is greater b, a - 0 is returned, otherwise a - (a - b) == +b
Minimum of two Integers
By CRT
Microsoft Visual C Run-Time Library provides a _min macro [3].By Sign-Mask
Following trick only works for a reduced integer range of effectively one bit less, which is most often no problem for 32-bit integers in chess programs, like scores and that like: INT_MIN <= a - b <= INT_MAX:If a is greater b, b + 0 is returned, otherwise b + (a - b) == +a
Conditional Expressions
Conditional Assignment
A conditional assignment in C or C++ may be implemented by compilers as x86 conditional move (cmovCC) instruction.otherwise it might be reformulated with conditional increment:
Conditional Increment
If a > b is hard to predict,it might be reformulated branch-less in C, which likely emits a x86 setCC instruction:
With a reduced value range and INT_MIN <= b - a <= INT_MAX, greater and less relations might be implemented using a sign mask:
Conditional Write
During list generation, while conditionally writing data to an array with post-incrementing a pointer or index, one may try to avoid the conditional branch by storing always and to increment the pointer by the condition, which is either 0 or 1 [4] [5].might be rewritten by
Indirect Branch
Robert Hyatt on x86 Branch predictor, Branch target predictor, and Indirect branch in CCC [6]:(2) is more interesting because when you fetch and then predict the branch, you don't have a clue where it is going since the register being used might not yet be ready for access. The solution is a "branch target buffer" which simply predicts the branch AND where it is going, based on the last time it was encountered. You can do a conditional jump to an indirect address and predict the jump correctly and miss the address (entire thing is then predicted wrong) or you can predict the address correctly and miss the jump (again, entire thing is wrong), or you can miss both. Only when you get both right do you have any success.
Your code always jumps to the same place, whether you use the explicit jump address, or the indirect address through a register. When you get into a call where the address changes, performance will drop. Your code really is not testing that at all...
See also
Forum Posts
External Links
lineup: Joe Bowie, Ronny Drayton, Bill Bickford, Kim Clarke, John Mulkerin, Kenny Martin
References
What links here?
Up one Level