Playing Strength, (Performance, Skill Level)
of a chess player, or chess playing entity, program or engine, reflects the ability to win against other players, given by a number or other element of an ordered set such as an Elo number.
The ability to solve test-positions, that is, finding the specified, likely one and only best move, might be an indicator for various particular engine skills, but does not necessarily correlate with playing strength. In his Parallelism and Selectivity in Game Tree Search lecture, Tord Romstad introduced the Worst Moves Observation (WMO), which states the practical playing strength is not primarily determined by the quality of the players best moves nor average moves, but by the quality of the players worst moves.
A statistical valid method to measure playing strength within a defined confidence interval is to play an appropriate huge number of games with both sides versus a wide range of different opponents [2] with symmetric time constraints, and to apply match statistics. Performance isn't measured absolutely; it is inferred from wins, losses, and draws against other players or engines. Players' rating depend on the ratings of their opponents, and the results scored against them [3]. While relative playing strength of chess engines is not strictly transmissive over various time controls, the number of games played is more relevant than their duration, the todays de facto standard in measuring playing strength is parallel playing fast chess with (ultra) short time control, such as blitz, bullet or even lightning chess, as for instance used in the Fishtest framework of Stockfish[4].
Peter W. Frey (1986). Algorithmic Strategies for Improving the Performance of Game-Playing Programs. In Doyne Farmer, Alan Lapedes, Norman Packard, Burton Wendroff (Ed.) (1986). Evolution, Games and Learning: Models for Adaptation in Machines and Nature. Proceedings of the Fifth Annual International Conference of the Center, Elsevier
of a chess player, or chess playing entity, program or engine, reflects the ability to win against other players, given by a number or other element of an ordered set such as an Elo number.
The ability to solve test-positions, that is, finding the specified, likely one and only best move, might be an indicator for various particular engine skills, but does not necessarily correlate with playing strength. In his Parallelism and Selectivity in Game Tree Search lecture, Tord Romstad introduced the Worst Moves Observation (WMO), which states the practical playing strength is not primarily determined by the quality of the players best moves nor average moves, but by the quality of the players worst moves.
Table of Contents
Measuring
A statistical valid method to measure playing strength within a defined confidence interval is to play an appropriate huge number of games with both sides versus a wide range of different opponents [2] with symmetric time constraints, and to apply match statistics. Performance isn't measured absolutely; it is inferred from wins, losses, and draws against other players or engines. Players' rating depend on the ratings of their opponents, and the results scored against them [3]. While relative playing strength of chess engines is not strictly transmissive over various time controls, the number of games played is more relevant than their duration, the todays de facto standard in measuring playing strength is parallel playing fast chess with (ultra) short time control, such as blitz, bullet or even lightning chess, as for instance used in the Fishtest framework of Stockfish [4].Strength
The strength of a chess program depends on many things, the quality and efficiency of the algorithms involved to determine the best move of a position, the balance of the so called search versus knowledge tradeoff to evaluate or compare leaf nodes of a search tree, how to shape that tree and to propagate a score up to the root, and time management, that is how to allocate time for searching a move under time control requirements. Time used is roughly proportional to the number of visited nodes of the common depth-first search inside an iterative deepening frame, which grows exponentially by its effective branching factor raised to the power of search depth. Playing strength might be improved over the (playing) time due to learning algorithms.See also
Publications
1970 ...
1980 ...
1990 ...
2000 ...
2005 ...
2010 ...
2015 ...
Forum Posts
1990 ...
1995 ...
2000 ...
2005 ...
- bayeselo: new Elo-rating tool, applied to CCT7 by Rémi Coulom, CCC, February 13, 2005 » CCT7
- a beat b,b beat c,c beat a question by Uri Blass, CCC, May 16, 2007
- A thought about ratings by Dave Dyer, Computer Go Archive, December 10, 2007 » Go
2008Re: A thought about ratings by Don Dailey, Computer Go Archive, December 10, 2007
Re: A thought about ratings by Edward de Grijs, Computer Go Archive, December 10, 2007
Re: A thought about ratings by Don Dailey, Computer Go Archive, December 10, 2007
- Glaurung Mac OS X: New GUI, now with adjustable strength. by Tord Romstad, CCC, May 18, 2008 » Glaurung
- Artificial stupidity - making a program play badly by Tord Romstad, CCC, May 20, 2008
- New testing thread by Robert Hyatt, CCC, August 07, 2008 » Engine Testing
- Hardware vs Software by Charles Roberson, CCC, December 02, 2008
20092010 ...
- How to reduce playing strenght to play against humans by Fermin Serrano, CCC, March 05, 2010
- Current skill command (Crafty) results by Robert Hyatt, CCC, July 21, 2010 » Crafty
- harware vs software advances by Don Dailey, CCC, September 08, 2010
- hardware advances - a different perspective by Robert Hyatt, CCC, September 09, 2010
- old crafty vs new crafty on new hardware by Robert Hyatt, CCC, September 11, 2010
- Crafty tests show that Software has advanced more by Don Dailey, CCC, September 12, 2010
- Final results - Crafty - hardware vs software by Robert Hyatt, CCC, September 13, 2010
- Deep Blue vs Rybka by Don Dailey, CCC, September 13, 2010 » Deep Blue, Rybka
2011- Ply vs ELO by Andriy Dzyben, CCC, June 28, 2011
- Increase in Elo ..Question For The Experts by Steve B, CCC, December 05, 2011
2012- StockFish LS with LimitStrength feature by Alexander Schmidt, CCC, January 01, 2012 » Stockfish
- Depth vs playing strength by David Whitten, CCC, January 09, 2012
- Reducing Strength by Ted Wong, CCC, March 15, 2012
- Human Elo ratings: averages and standard deviations by Jesús Muñoz, CCC, March 18, 2012
- Elo versus speed by Peter Österlund, CCC, April 02, 2012
- Rybka odds matches and the strength of engines by Kai Laskos, CCC, June 09, 2012 » Rybka
- Elo points gain from doubling time by Kai Laskos, CCC, December 10, 2012
- A word for casual testers by Don Dailey, CCC, December 25, 2012
2013- Noise in ELO estimators: a quantitative approach by Marco Costalba, CCC, January 06, 2013
- Scaling at 2x nodes (or doubling time control) by Kai Laskos, CCC, July 23, 2013 » Doubling TC, Diminishing Returns, Playing Strength, Houdini
- How much elo is pondering worth by Michel Van den Bergh, CCC, August 07, 2013 » Pondering
- Contempt and the ELO model by Michel Van den Bergh, CCC, September 05, 2013 » Contempt Factor
- ELO rating and thinking time by Henk van den Belt, CCC, November 22, 2013
20142015 ...
- How to dumb down/weaken/humanize an engine algorithmically? by Dominik Klein, CCC, January 18, 2015
- computing elo of multiple chess engines by Alexandru Mosoi, CCC, August 09, 2015
- Name for elo without draws by Marcel van Kervinck, CCC, September 02, 2015
- The future of chess and elo ratings by Larry Kaufman, CCC, September 20, 2015
- ELO error margin by Fabio Gobbato, CCC, October 17, 2015
- testing multiple versions & elo calculation by Folkert van Heusden, CCC, October 27, 2015
2016- a direct comparison of FIDE and CCRL rating systems by Erik Varend, CCC, February 22, 2016 » FIDE, CCRL
- Computer Chess Strength by John Fishburn, CCC, February 28, 2016
- How much benefit from opening book? by John Fishburn, CCC, March 06, 2016 » Opening Book
- Computer chess progress over say the last 20 years? by Martin Fierz, CCC, March 10, 2016
- Computer chess progress over the last 20 years! by Martin Fierz, CCC, March 13, 2016
- Computer Chess Progress: Stockfish 7 vs Ruffian 1.0.5 by Martin Fierz, CCC, March 17, 2016 » Stockfish, Ruffian
- skill levels by Alexandru Mosoi, CCC, April 28, 2016
- Strategies for weaker play levels by Evert Glebbeek, CCC, June 28, 2016
- ELO inflation ha ha ha by Henk van den Belt, CCC, September 16, 2016 » Delphil, Stockfish, Match Statistics, TCEC Season 9 [14]
- Perfect play by Patrik Karlsson, CCC, September 28, 2016
- Doubling of time control by Andreas Strangmüller, CCC, October 21, 2016 » Doubling TC, Diminishing Returns, Playing Strength, Komodo
- Stockfish 8 - Double time control vs. 2 threads by Andreas Strangmüller, CCC, November 15, 2016 » Doubling TC, Diminishing Returns, Playing Strength, Stockfish
- L3 cache, RAM and other performance factors by Nimzy, Rybka Forum, December 04, 2016 » Memory
- Absolute ELO scale by Nicu Ionita, CCC, December 17, 2016
- Diminishing returns and hyperthreading by Kai Laskos, CCC, December 27, 2016 » Diminishing Returns, Match Statistics, Thread
2017About expected scores and draw ratios by Jesús Muñoz, CCC, September 17, 2016
- Progress in 30 years by four intervals of 7-8 years by Kai Laskos, CCC, January 19, 2017 » Match Statistics
- Crafty Play By Elo ( Crafty v25.3) by Michael B, CCC, January 23, 2017 » Crafty
- 6-men Syzygy from HDD and USB 3.0 by Kai Laskos, CCC, April 04, 2017 » Komodo, Syzygy Bases, USB 3.0
- Scaling of engines from FGRL rating list by Kai Laskos, CCC, April 07, 2017 » FGRL
- Low impact of opening phase in engine play? by Kai Laskos, CCC, April 18, 2017 » Opening
- How to simulate a game outcome given Elo difference? by Nicu Ionita, CCC, April 25, 2017
- Wilo rating properties from FGRL rating lists by Kai Laskos, CCC, May 01, 2017 » FGRL
- RAM speed and engine strength by John Hartmann, CCC, May 03, 2017 » RAM
- Symmetric multiprocessing (SMP) scaling - SF8 and K10.4 by Andreas Strangmüller, CCC, May 05, 2017 » Lazy SMP, Komodo, Stockfish
- Symmetric multiprocessing (SMP) scaling - K10.4 Contempt=0 by Andreas Strangmüller, CCC, May 11, 2017 » SMP, Komodo, Contempt Factor
- Symmetric multiprocessing (SMP) scaling - SF8 Contempt=10 by Andreas Strangmüller, CCC, May 13, 2017 » SMP, Stockfish, Contempt Factor
- Another attempt at comparing Evals ELO-wise by Kai Laskos, CCC, May 22, 2017 » Evaluation
- Testing endgame strength by Álvaro Begué, CCC, June 21, 2017 » Endgame, Engine Testing, RuyDos
- Invariance with time control of rating schemes by Kai Laskos, CCC, July 22, 2017 [15]
- Ways to avoid "Draw Death" in Computer Chess by Kai Laskos, CCC, July 25, 2017
- ELO measurements by Peter Österlund, CCC, August 06, 2017 » Lazy SMP, Parallel Search
- an interesting study from Erik Varend by scandien, Hiarcs Forum, August 13, 2017 [16]
- Wall and Regression by Srdja Matovic, CCC, August 31, 2017 [17] [18]
- Scaling from FGRL results with top 3 engines by Kai Laskos, CCC, September 26, 2017 » FGRL, Houdini, Komodo, Stockfish
- "Intrinsic Chess Ratings" by Regan, Haworth -- seq by Kai Middleton, CCC, November 19, 2017
- ELO progression measured by year by Ed Schroder, CCC, December 13, 2017
2018Re: "Intrinsic Chess Ratings" by Regan, Haworth -- by Kenneth Regan, CCC, November 20, 2017 » Who is the Master?
External Links
Chess Player
Chess Engines
Analysis
Rating Systems
Misc
References
What links here?
Up one Level