Engine+Testing

the process either to eliminate [|bugs] and to measure performance of a chess engine. New implementations of move generation are tested with Perft, while new features and tuning of search and evaluation are verified by test-positions and by playing matches against other engines. || toc =Bug Hunting=
 * Home * Engine Testing**
 * [[image:coyote.jpg link="http://sanseverything.wordpress.com/2008/01/16/hope-springs-eternal/"]] ||~  || **Engine Testing**,
 * The ever-optimistic [|Wile E. Coyote] ||~  ||^   ||
 * Perft (Perft Results)
 * Debugging

=Analyzing=
 * Logging
 * Profiling
 * Search Statistics

=Tuning=
 * Automated Tuning

=Test-Positions= Running sets of test-positions with number of solutions per fixed time-frame is useful to prove whether things are broken after program changes or to get hints about missing knowledge. But one should be careful to tune engines based on test-position results, since solving (possible tactical) test-positions does not necessarily correlate with practical playing strength in matches against other opponents.  =Matches= Most testing involves running different versions of a program in matches, and comparing results.
 * Test-Positions

Time Controls
Generally speaking, for testing changes that don't alter the search tree itself, but only affect performance (eg. move generation) can be tested with given fixed nodes, fixed time or fixed depth. In all other cases the time management should be left to the engine to simulate real tournament conditions. On the other hand, debugging is much easier under fixed conditions as the games become deterministic.

A side from the type of time control one also has to decide on how much time should be spent per game, ie. what the average quality of the games should be like. While one can test more changes in the a certain time at short time controls, it is also relevant how a certain change scales to different strengths. So for example should one increase the R in Null move pruning to 3 in depths > 7, this change may only be effectively tested on time controls where this new condition is triggered frequently enough, ie. where the average search depth is far greater than seven. It is hard to generalize, but on average changes of the search functions (LMR, nullmove, futility or similar pruning, reductions and extensions ) tend to be more sensitive to the time control than the tuning of evaluation parameters.

Opening
During testing the engines should ideally play the same style of openings they would play in a normal tournament, so not to optimize them for different types of positions. One option is to use the engines own opening book or one can use opening suites, a set of quiet test positions. In the latter case the same opening suit would be used for each tournament conducted and furthermore each position is played a second time with colors reversed. With these measures one can try to minimize the disparity between tests caused by different openings.

Interfaces
Free graphical user interfaces or command line tools for UCI and Chess Engine Communication Protocol compatible engines in engine-engine matches are: > Cutechess-cli
 * Amoeba Tournament Manager
 * Arena by Martin Blume
 * Cute Chess by Arto Jonsson and Ilari Pihlajisto
 * LittleBlitzer by Nathan Thom

Frameworks

 * Fishtest

Chess Server
One can also test an engine's performance by comparing it to other programs on the various internet platforms. In this case the different hardware and features like different Endgame Tablebases or Opening Books have to be considered.
 * Chess Server
 * Tournaments

Statistics
The question whether certain results actually indicates a strength increase or not, can be answered with 
 * Match Statistics
 * Pawn Advantage, Win Percentage, and Elo
 * LOS Table

Ratings
 =Test Results=  =Notable Bugs=
 * Rating System
 * Engine Rating Lists
 * Null Move Pruning Test Results
 * Late Move Reduction Test Results
 * Brute Force (Program), En passant bug, ACM 1977 and ACM 1978
 * Coko - Mate in One?, ACM 1971
 * Chess 2175X vs. Genesis, Promotion bug, 4th Computer Olympiad 1992
 * Nimzo's winning white-black bug, WMCCC 1993
 * Novag Micro Chess - Castling bug, CPWTIPC 1981
 * Proscha capturing its own king versus Daja, First GI Computer Chess Tournament 1975
 * System Tal vs. XXXX, Promotion bug, WMCCC 1995
 * Xinix - Mate in One, DOCCC 2000

=Publications=
 * Tony Marsland, Paul Rushton (**1973**). //[|Mechanisms for Comparing Chess Programs].// ACM Annual Conference, [|pdf]
 * Tim Breitkreutz, Jonathan Schaeffer (**1984**). //Computer vs Computer via Computer//. ICCA Journal, Vol. 7, No. 4
 * John Stanback (**1990**). //Supercomputing '90: Computer-Chess Testing and Programming Session//. ICCA Journal, Vol. 13, No. 4 » ACM 1990
 * Larry Kaufman (**1993**). //How Our PC Chess Programs Are Developed//. Computer Chess Reports 1992-93, Vol. 3, No. 2, pp. 12
 * Thomas Mally (**1993**). //Matt in Wieviel?// PC Schach 3/93 (German)
 * Jeff Rollason (**2007**). //[|Statistical Minefields with Version Testing]//. AI Factory, Winter 2007 » Match Statistics
 * Jónheiður Ísleifsdóttir (**2007**). //GTQL: A Query Language for Game Trees//. M.Sc. thesis, [|Reykjavík University], [|pdf]
 * Jónheiður Ísleifsdóttir, Yngvi Björnsson. (**2008**). //[|GTQ: A Language and Tool for Game-Tree Analysis]//. CG 2008, [|pdf]

=Forum Posts=

1995 ...

 * [|Testing Chess Programs] by Jan Eric Larsson, rgcc, February 09, 1996
 * [|Self-test and others rating stuffs...] by Christophe Théron, CCC, January 01, 1998
 * [|Proposal: New testing methods for SSDF (1)] by Jeroen Noomen, CCC, April 13, 1998

2000 ...

 * [|Using 2 machines for matches (Linux)] by Jon Dart, CCC, June 24, 2001 » XBoard, Linux
 * [|A proposed WAC replacement for testing] by Gian-Carlo Pascutto, CCC, September 18, 2001 » Win at Chess
 * [|Value of playing different versions of a program against each other] by Tom King, CCC, January 06, 2003
 * [|testing of evaluation function] by Steven Chu, CCC, April 17, 2003 » Evaluation
 * [|Testing the reliability of forward pruning] by Russell Reagan, CCC, May 15, 2003 » Pruning
 * [|To programmers: Hints for testing after a partial rewrite] by Federico Corigliano, CCC, December 08, 2003
 * [|Is there a way?] by Ed Schröder, CCC, December 13, 2004

2005 ...

 * [|table for detecting significant difference between two engines] by Joseph Ciarrochi, CCC, February 03, 2006
 * [|test methodology] by Giuseppe Cannella, Winboard Forum, November 13, 2006
 * [|Testing and debugging chess engines] by Patrice Duhamel, Winboard Forum, December 03, 2006
 * [|Programmer bug hunt challenge] by Ed Schröder, CCC, May 04, 2007 » Portable Game Notation, En passant
 * [|a beat b,b beat c,c beat a question] by Uri Blass, CCC, May 16, 2007 » Playing Strength
 * [|An objective test process for the rest of us?] by Nicolai Czempin, CCC, September 12, 2007
 * [|My new testing scheme] by Zach Wegner, CCC, November 20, 2007
 * [|New testing thread] by Robert Hyatt, CCC, August 07, 2008
 * [|Cutechess-cli: A command line tool for engine-engine matches], by Ilari Pihlajisto, CCC, March 16, 2009
 * [|Cutechess-cli version 0.1.8 released] by Ilari Pihlajisto, CCC, September 29, 2009
 * [| A reason for testing at fixed number of nodes] by J. Wesley Cleveland, CCC, November 06, 2009
 * [|different kinds of testing] by Don Dailey, CCC, November 09, 2009
 * [|more on fixed nodes] by Robert Hyatt, CCC, November 10, 2009

2010 ...

 * [|XBoard and epd tournament] by Vlad Stamate, CCC, January 31, 2010 » Chess Engine Communication Protocol
 * [|Long game vs short game testing] by Vlad Stamate, CCC, April 08, 2010
 * [|Pairings generation based on a big PGN file] by Harun Taner, CCC, July 22, 2010
 * [|hiatus good for bug-finding] by Stuart Cracraft, CCC, June 27, 2010
 * 2011**
 * [|testing question] by Larry Kaufman, CCC, June 01, 2011
 * [|Debugging regression tests] by Onno Garms, CCC, June 16, 2011
 * 2012**
 * [|fast game testing] by Jon Dart, CCC, January 08, 2012
 * [|Your best bug ?] by Ed Schröder, CCC, August 06, 2012
 * [|Yet Another Testing Question] by Brian Richardson, CCC, September 15, 2012
 * [|Another testing question] by Larry Kaufman, CCC, September 23, 2012
 * [|A word for casual testers] by Don Dailey, CCC, December 25, 2012
 * 2013**
 * [|A poor man's testing environment] by Ed Schröder, CCC, January 04, 2013 » Match Statistics
 * [|engine-engine testing isues] by Jens Bæk Nielsen, CCC, January 20, 2013
 * [|Beta for Stockfish distributed testing] by Gary, CCC, March 05, 2013 » Fishtest
 * [|Fishtest Distributed Testing Framework] by Marco Costalba, CCC, May 01, 2013 » Fishtest
 * [|cutechess-cli 0.6.0 released] by Ilari Pihlajisto, CCC, July 12, 2013
 * [|fast testing NIT algorithm] by Don Dailey, CCC, August 22, 2013
 * [|OICS: Computers Only ICS based Chess server for anyone] by Joshua Shriver, CCC, August 26, 2013 » OICS
 * 2014**
 * [|testing procedure] by Daniel José Queraltó, CCC, February 23, 2014

2015 ...
> [|Re: Static evaluation test posistions] by Ferdinand Mosca, CCC, November 26, 2015 » Python
 * [|Bullet vs regular time control, say 40/4m CCRL/CEGT] by Ed Schröder, CCC, August 29, 2015
 * [|Static evaluation test posistions] by Shawn Chidester, CCC, November 25, 2015
 * 2016**
 * [|Ordo 1.0.9 (new features for testers)] by Miguel A. Ballicora, CCC, January 25, 2016
 * [|cluster versus single server] by Folkert van Heusden, CCC, April 28, 2016
 * [|Testing using many computers and architectures] by Andrew Grant, CCC, September 14, 2016
 * [|command line engine match?] by Erin Dame, CCC, November 06, 2016 » CLI
 * [|Testing with different EPD suits for search vs eval changes] by Michael Sherwin, CCC, December 23, 2016
 * 2017**
 * [|sprt tourney manager] by Richard Delorme, CCC, January 24, 2017 » Amoeba Tournament Manager, SPRT
 * [|how to properly test the changes to the engine ?] by Mahmoud Uthman, CCC, February 01, 2017
 * [|How to go about chasing a bug like this?] by Colin Jenkins, CCC, February 09, 2017 » Debugging
 * [|How to find SMP bugs ?] by Lucas Braesch, CCC, March 15, 2017 » Debugging, Lazy SMP
 * [|Testing for Move Ordering Improvements] by Cheney Nattress, CCC, March 25, 2017 » Move Ordering, Search Statistics
 * [|Testing endgame strength] by Álvaro Begué, CCC, June 21, 2017 » Endgame, RuyDos
 * [|Opening testing suites efficiency] by Kai Laskos, CCC, June 21, 2017 » Opening, Match Statistics
 * [|Testing A against B by playing a pool of others] by Andrew Grant, CCC, June 24, 2017 » Match Statistics
 * [|Core behaviour] by Ed Schroder, CCC, June 28, 2017 » Process, Thread
 * [|Engine testing & error margin ?] by Mahmoud Uthman, CCC, July 05, 2017
 * [|Engines for testing (Linux, fast time control)] by Jon Dart, CCC, November 18, 2017 » Linux

=External Links= > media type="custom" key="23664954"
 * [|Cute Chess]
 * [|cutechess · GitHub]
 * [|Testing a chess engine from the ground up] from [|Home of the Dutch Rebel] by Ed Schröder » Match Statistics
 * [|Regression testing from Wikipedia]
 * [|SPCC] by Stefan Pohl
 * [|CHESS - Microsoft Research] a tool for finding and reproducing [|Heisenbugs] in concurrent programs.
 * [|Engine test stand from Wikipedia]
 * Terje Rypdal Group feat. Palle Mikkelborg, [|Håkon Graf], [|Sveinung Hovensjø] and Jon Christensen - [|Per Ulv], 1978, [|YouTube] Video

=References= =What links here?= include component="backlinks" page="Engine Testing" limit="100"
 * Up one Level**