Automated Tuning,
an automated adjustment of evaluation parameters or weights, and less commonly, search parameters [1], with the aim to improve the playing strength of a chess engine or game playing program. Evaluation tuning can be applied by mathematical optimization or machine learning, both fields with huge overlaps. Learning approaches are subdivided into supervised learning using labeled data, and reinforcement learning to learn from trying, facing the exploration (of uncharted territory) and exploitation (of current knowledge) dilemma. Johannes Fürnkranz gives a comprehensive overview in Machine Learning in Games: A Survey published in 2000 [2], covering evaluation tuning in chapter 4.
A difficulty in tuning and automated tuning of engine parameters is measuring playing strength. Using small sets of test-positions, which was quite common in former times to estimate relative strength of chess programs, lacks adequate diversity for a reliable strength predication. In particular, solving test-positions does not necessarily correlate with practical playing strength in matches against other opponents. Therefore, measuring strength requires to play many games against a reference opponent to determine the win rate with a certain confidence. The closer the strength of two opponents, the more games are necessary to determine whether changed parameters or weights in one of them are improvements or not, up to several tens of thousands. Playing many games with ultra short time controls has became de facto standard with todays strong programs, as for instance applied in Stockfish'sFishtest, using the sequential probability ratio test (SPRT) to possibly terminate a match early [4].
It is one of the best arts to find the right SMALL set of parameters and to tune them.
Some 12 years ago I had a technical article on this ("On telescoping linear evaluation functions") in the ICCA Journal, Vol. 16, No. 2, pp. 91-94, describing a theorem (of existence) which says that in case of linear evaluation functions with lots of terms there is always a small subset of the terms such that this set with the right parameters is almost as good as the full evaluation function.
One supervised learning method considers desired moves from a set of positions, likely from grandmaster games, and tries to adjust their evaluation weights so that for instance a one-ply search agrees with the desired move. Already pioneering in reinforcement learning some years before, move adaption was described by Arthur Samuel in 1967 as used in the second version of his checkers player [15], where a structure of stacked linear evaluation functions was trained by computing a correlation measure based on the number of times the feature rated an alternative move higher than the desired move played by an expert [16]. In chess, move adaption was first described by Thomas Nitsche in 1982 [17], and with some extensions by Tony Marsland in 1985 [18]. Eval Tuning in Deep Thought as mentioned by Feng-hsiung Hsu et al. in 1990 [19], and later published by Andreas Nowatzyk, is also based on an extended form of move adaption [20]. Jonathan Schaeffer's and Paul Lu's efforts to make Deep Thought's approach work for Chinook in 1990 failed [21] - nothing seemed to produce results that were as good than their hand-tuned effort [22].
Value Adaption
A second supervised learning approach used to tune evaluation weights is based on regression of the desired value, i.e. using the final outcome from huge sets of positions from quality games, or other information supplied by a supervisor, i.e. in form of annotations from position evaluation symbols. Often, value adaption is reinforced by determining an expected outcome by self play [23].
Advantages
Can modify any number of weights simultaneously - constant time complexity
Disadvantages
Requires a source for the labeled data
Can only be used for evaluation weights or anything else that can be labeled
Donald H. Mitchell (1984). Using Features to Evaluate Positions in Experts' and Novices' Othello Games. Masters thesis, Department of Psychology, Northwestern University, Evanston, IL
Jens Christensen, Richard Korf (1986). A Unified Theory of Heuristic Evaluation functions and Its Applications to Learning. Proceedings of the AAAI-86, pp. 148-152, pdf
Dap Hartmann (1987). How to Extract Relevant Knowledge from Grandmaster Games. Part 1: Grandmasters have Insights - the Problem is what to Incorporate into Practical Problems.ICCA Journal, Vol. 10, No. 1
Bruce Abramson (1988). Learning Expected-Outcome Evaluators in Chess. Proceedings of the 1988 AAAI Spring Symposium Series: Computer Game Playing, 26-28.
Bruce Abramson (1989). On Learning and Testing Evaluation Functions. Proceedings of the Sixth Israeli Conference on Artificial Intelligence, 1989, 7-16.
Bruce Abramson (1991). The Expected-Outcome Model of Two-Player Games. Part of the series, Research Notes in Artificial Intelligence (San Mateo: Morgan Kaufmann, 1991).
Johannes Fürnkranz (2007). Recent advances in machine learning and game playing. ÖGAI Journal, Vol. 26, No. 2, Computer Game Playing, pdf
2008
Omid David, Moshe Koppel, Nathan S. Netanyahu (2008). Genetic Algorithms for Mentor-Assisted Evaluation Function Optimization. ACM Genetic and Evolutionary Computation Conference (GECCO '08), pp. 1469-1475, Atlanta, GA, July 2008.
^Donald H. Mitchell (1984). Using Features to Evaluate Positions in Experts' and Novices' Othello Games. Masters thesis, Department of Psychology, Northwestern University, Evanston, IL
an automated adjustment of evaluation parameters or weights, and less commonly, search parameters [1], with the aim to improve the playing strength of a chess engine or game playing program. Evaluation tuning can be applied by mathematical optimization or machine learning, both fields with huge overlaps. Learning approaches are subdivided into supervised learning using labeled data, and reinforcement learning to learn from trying, facing the exploration (of uncharted territory) and exploitation (of current knowledge) dilemma. Johannes Fürnkranz gives a comprehensive overview in Machine Learning in Games: A Survey published in 2000 [2], covering evaluation tuning in chapter 4.
Table of Contents
Playing Strength
A difficulty in tuning and automated tuning of engine parameters is measuring playing strength. Using small sets of test-positions, which was quite common in former times to estimate relative strength of chess programs, lacks adequate diversity for a reliable strength predication. In particular, solving test-positions does not necessarily correlate with practical playing strength in matches against other opponents. Therefore, measuring strength requires to play many games against a reference opponent to determine the win rate with a certain confidence. The closer the strength of two opponents, the more games are necessary to determine whether changed parameters or weights in one of them are improvements or not, up to several tens of thousands. Playing many games with ultra short time controls has became de facto standard with todays strong programs, as for instance applied in Stockfish's Fishtest, using the sequential probability ratio test (SPRT) to possibly terminate a match early [4].Parameter
Quote by Ingo Althöfer [5] [6]:Some 12 years ago I had a technical article on this ("On telescoping linear evaluation functions") in the ICCA Journal, Vol. 16, No. 2, pp. 91-94, describing a theorem (of existence) which says that in case of linear evaluation functions with lots of terms there is always a small subset of the terms such that this set with the right parameters is almost as good as the full evaluation function.
Mathematical Optimization
Mathematical optimization methods in tuning consider the engine as a black box.Methods
Instances
Advantages
Disadvantages
Reinforment Learning
Reinforcement learning, in particular temporal difference learning, has a long history in tuning evaluation weights in game programming, first seeen in the late 50s by Arthur Samuel in his Checkers player [7]. In self play against a stable copy of itself, after each move, the weights of the evaluation function were adjusted in a way that the score of the root position after a quiescence search became closer to the score of the full search. This TD method was generalized and formalized by Richard Sutton in 1988 [8], who introduced the decay parameter λ, where proportions of the score came from the outcome of Monte Carlo simulated games, tapering between bootstrapping (λ = 0) and Monte Carlo (λ = 1). TD-λ was famously applied by Gerald Tesauro in his Backgammon program TD-Gammon [9] [10], its minimax adaption TD-Leaf was successful used in eval tuning of chess programs [11], with KnightCap [12] and CilkChess [13] as prominent samples.Instances
Engines
Supervised Learning
Move Adaption
One supervised learning method considers desired moves from a set of positions, likely from grandmaster games, and tries to adjust their evaluation weights so that for instance a one-ply search agrees with the desired move. Already pioneering in reinforcement learning some years before, move adaption was described by Arthur Samuel in 1967 as used in the second version of his checkers player [15], where a structure of stacked linear evaluation functions was trained by computing a correlation measure based on the number of times the feature rated an alternative move higher than the desired move played by an expert [16]. In chess, move adaption was first described by Thomas Nitsche in 1982 [17], and with some extensions by Tony Marsland in 1985 [18]. Eval Tuning in Deep Thought as mentioned by Feng-hsiung Hsu et al. in 1990 [19], and later published by Andreas Nowatzyk, is also based on an extended form of move adaption [20]. Jonathan Schaeffer's and Paul Lu's efforts to make Deep Thought's approach work for Chinook in 1990 failed [21] - nothing seemed to produce results that were as good than their hand-tuned effort [22].Value Adaption
A second supervised learning approach used to tune evaluation weights is based on regression of the desired value, i.e. using the final outcome from huge sets of positions from quality games, or other information supplied by a supervisor, i.e. in form of annotations from position evaluation symbols. Often, value adaption is reinforced by determining an expected outcome by self play [23].Advantages
Disadvantages
Regression
Regression analysis is a statistical process with a substantial overlap with machine learning to predict the value of an Y variable (output), given known value pairs of the X and Y variables. Parameter estimation in regression analysis can be formulated as the minimization of a cost or loss function over a training set [24], such as mean squared error or cross-entropy error function for binary classification [25]. The minimization is implemented by iterative optimization algorithms or metaheuristics such as Iterated local search, Gauss–Newton algorithm, or conjugate gradient method.Linear Regression
Logistic Regression
Instances
See also
Publications
1959
1960 ...
1970 ...
1980 ...
1985 ...
1990 ...
1995 ...
2000 ...
Gerald Tesauro (2001). Comparison Training of Chess Evaluation Functions. » SCP, Deep Blue
2005 ...
- Dave Gomboc, Michael Buro, Tony Marsland (2005). Tuning Evaluation Functions by Maximizing Concordance. Theoretical Computer Science, Vol. 349, No. 2, pdf
- Jeff Rollason (2005). Evaluation by Hill-climbing: Getting the right move by solving micro-problems. AI Factory, Autumn 2005
- Levente Kocsis, Csaba Szepesvári, Mark Winands (2005). RSPSA: Enhanced Parameter Optimization in Games. Advances in Computer Games 11, pdf
2006- Levente Kocsis, Csaba Szepesvári (2006). Universal Parameter Optimisation in Games Based on SPSA. Machine Learning, Special Issue on Machine Learning and Games, Vol. 63, No. 3
- Hallam Nasreddine, Hendra Suhanto Po, Graham Kendall (2006). Using an Evolutionary Algorithm for the Tuning of a Chess Evaluation Function Based on a Dynamic Boundary Strategy. Proceedings of the 2006 IEEE Conference on Cybernetics and Intelligent Systems, pdf
- Makoto Miwa, Daisaku Yokoyama, Takashi Chikayama (2006). Automatic Construction of Static Evaluation Functions for Computer Game Players. ALT ’06
- Borko Bošković, Sašo Greiner, Janez Brest, Viljem Žumer (2006). A Differential Evolution for the Tuning of a Chess Evaluation Function. IEEE Congress on Evolutionary Computation, 2006.
- Kunihito Hoki (2006). Optimal control of minimax search result to learn positional evaluation. 11th Game Programming Workshop (Japanese)
2007- Shogo Takeuchi, Tomoyuki Kaneko, Kazunori Yamaguchi, Satoru Kawai (2007). Visualization and Adjustment of Evaluation Functions Based on Evaluation Values and Win Probability. AAAI 2007, pdf
- Makoto Miwa, Daisaku Yokoyama, Takashi Chikayama (2007). Automatic Generation of Evaluation Features for Computer Game Players. pdf
- Johannes Fürnkranz (2007). Recent advances in machine learning and game playing. ÖGAI Journal, Vol. 26, No. 2, Computer Game Playing, pdf
2008- Omid David, Moshe Koppel, Nathan S. Netanyahu (2008). Genetic Algorithms for Mentor-Assisted Evaluation Function Optimization. ACM Genetic and Evolutionary Computation Conference (GECCO '08), pp. 1469-1475, Atlanta, GA, July 2008.
- Borko Bošković, Sašo Greiner, Janez Brest, Aleš Zamuda, Viljem Žumer (2008). An Adaptive Differential Evolution Algorithm with Opposition-Based Mechanisms, Applied to the Tuning of a Chess Program. Advances in Differential Evolution, Studies in Computational Intelligence, ISBN: 978-3-540-68827-3
20092010 ...
- Amine Bourki, Matthieu Coulm, Philippe Rolet, Olivier Teytaud, Paul Vayssière (2010). Parameter Tuning by Simple Regret Algorithms and Multiple Simultaneous Hypothesis Testing. pd
- Omid David, Moshe Koppel, Nathan S. Netanyahu (2010). Genetic Algorithms for Automatic Search Tuning. ICGA Journal, Vol. 33, No. 2
- Borko Bošković (2010). Differential evolution for the Tuning of a Chess Evaluation Function. Ph.D. thesis, University of Maribor
2011- Omid David, Moshe Koppel, Nathan S. Netanyahu (2011). Expert-Driven Genetic Algorithms for Simulating Evaluation Functions. Genetic Programming and Evolvable Machines, Vol. 12, No. 1
- Borko Bošković, Janez Brest (2011). Tuning Chess Evaluation Function Parameters using Differential Evolution. Algorithm. Informatica, 35, No. 2, pdf
- Borko Bošković, Janez Brest, Aleš Zamuda, Sašo Greiner, Viljem Žumer (2011). History mechanism supported differential evolution for chess evaluation function tuning. Soft Computing, Vol. 15, No. 4
- Eduardo Vázquez-Fernández, Carlos Artemio Coello Coello, Feliú Davino Sagols Troncoso (2011). An Evolutionary Algorithm for Tuning a Chess Evaluation Function. CEC 2011, pdf
- Eduardo Vázquez-Fernández, Carlos Artemio Coello Coello, Feliú Davino Sagols Troncoso (2011). An Adaptive Evolutionary Algorithm Based on Typical Chess Problems for Tuning a Chess Evaluation Function. GECCO 2011, pdf
- Rémi Coulom (2011). CLOP: Confident Local Optimization for Noisy Black-Box Parameter Tuning. Advances in Computer Games 13 [42] [43]
- Kunihito Hoki, Tomoyuki Kaneko (2011). The Global Landscape of Objective Functions for the Optimization of Shogi Piece Values with a Game-Tree Search. Advances in Computer Games 13 » Shogi
2012- Amir Ban (2012). Automatic Learning of Evaluation, with Applications to Computer Chess. Discussion Paper 613, The Hebrew University of Jerusalem - Center for the Study of Rationality, Givat Ram
- Kanjanapa Thitipong, Komiya Kanako, Yoshiyuki Kotani (2012). Design and Implementation of Bonanza Method for the Evaluation in the Game of Arimaa. IPSJ SIG Technical Report, Vol. 2012-GI-27, No. 4, pdf » Arimaa
2013- Wen-Jie Tseng, Jr-Chang Chen, I-Chen Wu, Ching-Hua Kuo, Bo-Han Lin (2013). A Supervised Learning Method for Chinese Chess Programs. JSAI2013, pdf
- Akira Ura, Makoto Miwa, Yoshimasa Tsuruoka, Takashi Chikayama (2013). Comparison Training of Shogi Evaluation Functions with Self-Generated Training Positions and Moves. CG 2013, slides as pdf
- Yoshikuni Sato, Makoto Miwa, Shogo Takeuchi, Daisuke Takahashi (2013). Optimizing Objective Function Parameters for Strength in Computer Game-Playing. AAAI 2013
- Shalabh Bhatnagar, H. L. Prasad, L.A. Prashanth (2013). Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods. Lecture Notes in Control and Information Sciences, Vol. 434, Springer » SPSA
- Tomáš Hřebejk (2013). Arimaa challenge - Static Evaluation Function. Master Thesis, Charles University in Prague, pdf » Arimaa [44]
20142015 ...
- Diederik P. Kingma, Jimmy Lei Ba (2015). Adam: A Method for Stochastic Optimization. arXiv:1412.6980v8, ICLR 2015 [47]
- Aryan Mokhtari, Alejandro Ribeiro (2015). Global Convergence of Online Limited Memory BFGS. Journal of Machine Learning Research, Vol. 16, pdf [48] [49]
2017Forum Posts
1997 ...
2000 ...
2005 ...
Re: Insanity... or Tal style? by Miguel A. Ballicora, CCC, April 02, 2009 [52]
2010 ...
- Revisiting GA's for tuning evaluation weights by Ilari Pihlajisto, CCC, January 03, 2010
- Idea for Automatic Calibration of Evaluation Function... by Steve Maughan, CCC, January 22, 2010
- Re: TEST position TCEC5- Houdini 1.03a-DRybka4 1-0 by Milos Stanisavljevic, CCC, November 30, 2010
- Parameter tuning by Onno Garms, CCC, March 13, 2011 » Onno
- Ahhh... the holy grail of computer chess by Marcel van Kervinck, CCC, August 23, 2011
- CLOP for Noisy Black-Box Parameter Optimization by Rémi Coulom, CCC, September 01, 2011 [53]
- Tuning again by Ed Schroder, CCC, November 01, 2011
- Ban: Automatic Learning of Evaluation [...] by BB+, OpenChess Forum, May 10, 2012 [54]
2014Re: How Do You Automatically Tune Your Evaluation Tables by Álvaro Begué, CCC, January 08, 2014
The texel evaluation function optimization algorithm by Peter Österlund, CCC, January 31, 2014 » Texel's Tuning Method
Re: The texel evaluation function optimization algorithm by Álvaro Begué, CCC, January 31, 2014 » Cross-entropy
2015 ...
- MMTO for evaluation learning by Jon Dart, CCC, January 25, 2015 [55]
- Experiments with eval tuning by Jon Dart, CCC, March 10, 2015 » Arasan, Texel's Tuning Method
- txt: automated chess engine tuning by Alexandru Mosoi, CCC, March 18, 2015 » Zurichess, Texel's Tuning Method [56]
- Piece weights with regression analysis (in Russian) by Vladimir Medvedev, CCC, April 30, 2015 » Point Value by Regression Analysis
- New Idea For Automated Tuning by Jordan Bray, CCC, May 16, 2015
- Evaluation Tuning by Michael Hoffmann, CCC, August 09, 2015
- Genetical tuning by Stefano Gemma, CCC, August 11, 2015 » Genetic Programming
- Some musings about search by Ed Schroder, CCC, August 14, 2015 » Search
- td-leaf by Alexandru Mosoi, CCC, October 06, 2015
- tensorflow by Alexandru Mosoi, CCC, November 10, 2015 [57]
2016Re: txt: automated chess engine tuning by Sergei S. Markoff, CCC, February 15, 2016 » SmarThink
Re: Piece weights with regression analysis (in Russian) by Fabien Letouzey, CCC, May 04, 2015
Re: Genetical tuning by Ferdinand Mosca, CCC, August 20, 2015
- pawn hash and eval tuning by J. Wesley Cleveland, CCC, February 21, 2016 » Pawn Hash Table
- Tuning by ppyvabw, OpenChess Forum, June 11, 2016 » Texel's Tuning Method
- GreKo 2015 ML: tuning evaluation (article in Russian) by Vladimir Medvedev, CCC, July 22, 2016 » GreKo, Texel's Tuning Method
- A database for learning evaluation functions by Álvaro Begué, CCC, October 28, 2016 » Evaluation, Learning, Texel's Tuning Method
- CLOP: when to stop? by Erin Dame, CCC, November 07, 2016 » CLOP
- C++ code for tuning evaluation function parameters by Álvaro Begué, CCC, November 10, 2016 » RuyTune [59]
2017Re: CLOP: when to stop? by Álvaro Begué, CCC, November 08, 2016 [58]
- improved evaluation function by Alexandru Mosoi, CCC, March 11, 2017 » Texel's Tuning Method, Zurichess
- automated tuning by Stuart Cracraft, CCC, March 13, 2017
- Parameter tuning with multi objective optimization by Marco Pampaloni, CCC, May 07, 2017 » Napoleon
- Evaluation Tuning: When To Stop? by Cheney Nattress, CCC, May 29, 2017
- Texel tuning method question by Sander Maassen vd Brink, CCC, June 05, 2017 » Texel's Tuning Method
- Approximating Stockfish's Evaluation by PSQTs by Thomas Dybdahl Ahle, CCC, August 23, 2017 » Regression, Piece-Square Tables, Stockfish
- Ab-initio evaluation tuning by Evert Glebbeek, CCC, August 30, 2017
- ROCK* black-box optimizer for chess by Jon Dart, CCC, August 31, 2017 » ROCK*, Rockstar
- tuning via maximizing likelihood by Daniel Shawul, CCC, October 04, 2017 [60]
- tool to create derivates of a given function by Alexandru Mosoi, CCC, November 07, 2017
- tuning for the uninformed by Folkert van Heusden, CCC, November 23, 2017
2018Re: Texel tuning method question by Peter Österlund, CCC, June 07, 2017
Re: Texel tuning method question by Ferdinand Mosca, CCC, July 20, 2017 » Python
Re: Texel tuning method question by Jon Dart, CCC, July 23, 2017
Re: tool to create derivates of a given function by Daniel Shawul, CCC, November 07, 2017 [61]
External Links
Engine tuning from Wikipedia
Self-tuning from Wikipedia
Optimization
optimize - Wiktionary
Entropy maximization from Wikipedia
Linear programming from Wikipedia
Simplex algorithm from Wikipedia
Simultaneous perturbation stochastic approximation (SPSA) - Wikipedia
SPSA Algorithm
Stochastic approximation from Wikipedia
Stochastic gradient descent from Wikipedia
Machine Learning
reinforcement - Wiktionary
reinforce - Wiktionary
supervisor - Wiktionary
temporal - Wiktionary
Statistics/Regression Analysis
regression - Wiktionary
regress - Wiktionary
Code
Misc
References
What links here?
Up one Level