Home * Automated Tuning
320px-Sun_800_tester-001.jpg

Automated Tuning,
an automated adjustment of evaluation parameters or weights, and less commonly, search parameters [1], with the aim to improve the playing strength of a chess engine or game playing program. Evaluation tuning can be applied by mathematical optimization or machine learning, both fields with huge overlaps. Learning approaches are subdivided into supervised learning using labeled data, and reinforcement learning to learn from trying, facing the exploration (of uncharted territory) and exploitation (of current knowledge) dilemma. Johannes Fürnkranz gives a comprehensive overview in Machine Learning in Games: A Survey published in 2000 [2], covering evaluation tuning in chapter 4.
Engine Tuner [3]

Playing Strength

A difficulty in tuning and automated tuning of engine parameters is measuring playing strength. Using small sets of test-positions, which was quite common in former times to estimate relative strength of chess programs, lacks adequate diversity for a reliable strength predication. In particular, solving test-positions does not necessarily correlate with practical playing strength in matches against other opponents. Therefore, measuring strength requires to play many games against a reference opponent to determine the win rate with a certain confidence. The closer the strength of two opponents, the more games are necessary to determine whether changed parameters or weights in one of them are improvements or not, up to several tens of thousands. Playing many games with ultra short time controls has became de facto standard with todays strong programs, as for instance applied in Stockfish's Fishtest, using the sequential probability ratio test (SPRT) to possibly terminate a match early [4].

Parameter

Quote by Ingo Althöfer [5] [6]:
It is one of the best arts to find the right SMALL set of parameters and to tune them.

Some 12 years ago I had a technical article on this ("On telescoping linear evaluation functions") in the ICCA Journal, Vol. 16, No. 2, pp. 91-94, describing a theorem (of existence) which says that in case of linear evaluation functions with lots of terms there is always a small subset of the terms such that this set with the right parameters is almost as good as the full evaluation function.


Mathematical Optimization

Mathematical optimization methods in tuning consider the engine as a black box.

Methods


Instances


Advantages

  • Works with all engine parameters, including search
  • Takes search-eval interaction into account

Disadvantages


Reinforment Learning

Reinforcement learning, in particular temporal difference learning, has a long history in tuning evaluation weights in game programming, first seeen in the late 50s by Arthur Samuel in his Checkers player [7]. In self play against a stable copy of itself, after each move, the weights of the evaluation function were adjusted in a way that the score of the root position after a quiescence search became closer to the score of the full search. This TD method was generalized and formalized by Richard Sutton in 1988 [8], who introduced the decay parameter λ, where proportions of the score came from the outcome of Monte Carlo simulated games, tapering between bootstrapping (λ = 0) and Monte Carlo (λ = 1). TD-λ was famously applied by Gerald Tesauro in his Backgammon program TD-Gammon [9] [10], its minimax adaption TD-Leaf was successful used in eval tuning of chess programs [11], with KnightCap [12] and CilkChess [13] as prominent samples.

Instances


Engines


Supervised Learning

Move Adaption

One supervised learning method considers desired moves from a set of positions, likely from grandmaster games, and tries to adjust their evaluation weights so that for instance a one-ply search agrees with the desired move. Already pioneering in reinforcement learning some years before, move adaption was described by Arthur Samuel in 1967 as used in the second version of his checkers player [15], where a structure of stacked linear evaluation functions was trained by computing a correlation measure based on the number of times the feature rated an alternative move higher than the desired move played by an expert [16]. In chess, move adaption was first described by Thomas Nitsche in 1982 [17], and with some extensions by Tony Marsland in 1985 [18]. Eval Tuning in Deep Thought as mentioned by Feng-hsiung Hsu et al. in 1990 [19], and later published by Andreas Nowatzyk, is also based on an extended form of move adaption [20]. Jonathan Schaeffer's and Paul Lu's efforts to make Deep Thought's approach work for Chinook in 1990 failed [21] - nothing seemed to produce results that were as good than their hand-tuned effort [22].

Value Adaption

A second supervised learning approach used to tune evaluation weights is based on regression of the desired value, i.e. using the final outcome from huge sets of positions from quality games, or other information supplied by a supervisor, i.e. in form of annotations from position evaluation symbols. Often, value adaption is reinforced by determining an expected outcome by self play [23].

Advantages

Disadvantages

  • Requires a source for the labeled data
  • Can only be used for evaluation weights or anything else that can be labeled
  • Works not optimal when combined with search

Regression

Regression analysis is a statistical process with a substantial overlap with machine learning to predict the value of an Y variable (output), given known value pairs of the X and Y variables. Parameter estimation in regression analysis can be formulated as the minimization of a cost or loss function over a training set [24], such as mean squared error or cross-entropy error function for binary classification [25]. The minimization is implemented by iterative optimization algorithms or metaheuristics such as Iterated local search, Gauss–Newton algorithm, or conjugate gradient method.

Linear Regression

320px-Linear_regression.svg.png

The supervised problem of regression applied to move adaption was used by Thomas Nitsche in 1982, minimizing the mean squared error of a cost function considering the program’s and a grandmaster’s choice of moves, as mentioned, extended by Tony Marsland in 1985, and later by the Deep Thought team. Regression used to adapt desired values was described by Donald H. Mitchell in his 1984 masters thesis on evaluation features in Othello, cited by Michael Buro [26] [27]. Jens Christensen applied linear regression to chess in 1986 to learn point values in the domain of temporal difference learning [28].
Linear Regression [29]


Logistic Regression

SigmoidTexelTune.gif

Since the relationship between win percentage and pawn advantage is assumed to follow a logistic model, one may treat static evaluation as single-layer perceptron or single neuron ANN with the common logistic activation function, performing the perceptron algorithm to train it [30]. Logistic regression in evaluation tuning was first elaborated by Michael Buro in 1995 [31], and proved successful in the game of Othello in comparison with Fisher's linear discriminant and quadratic discriminant function for normally distributed features, and served as eponym of his Othello program Logistello [32]. In computer chess, logistic regression was proposed by Miguel A. Ballicora in a 2009 CCC post, as applied to Gaviota [33], was independently described by Amir Ban in 2012 for Junior's evaluation learning [34], and explicitly mentioned by Álvaro Begué in a January 2014 CCC discussion [35], when Peter Österlund explained Texel's Tuning Method [36], which subsequently popularized logistic regression tuning in computer chess. Vladimir Medvedev's Point Value by Regression Analysis [37] [38] experiments showed why the logistic function is appropriate, and further used cross-entropy and regularization.
Logistic function [39]


Instances


See also


Publications

1959

1960 ...

1970 ...

1980 ...

1985 ...

1990 ...

1995 ...

2000 ...

2005 ...

2006
2007
2008
2009

2010 ...

2011
2012
2013
2014
2015

Forum Posts

1997 ...

2000 ...

2005 ...

2010 ...

2014

2015 ...

2016

External Links

Optimization

Machine Learning

Regression

Code

Misc


References

  1. ^ Yngvi Björnsson, Tony Marsland (2001). Learning Search Control in Adversary Games. Advances in Computer Games 9, pp. 157-174. pdf
  2. ^ Johannes Fürnkranz (2000). Machine Learning in Games: A Survey. Austrian Research Institute for Artificial Intelligence, OEFAI-TR-2000-3, pdf - Chapter 4, Evaluation Function Tuning
  3. ^ A vintage motor engine tester located at the James Hall museum of Transport, Johannesburg, South Africa - Engine tuning from Wikipedia
  4. ^ Fishtest Distributed Testing Framework by Marco Costalba, CCC, May 01, 2013
  5. ^ Re: Zappa Report by Ingo Althöfer, CCC, December 30, 2005 » Zappa
  6. ^ Ingo Althöfer (1993). On Telescoping Linear Evaluation Functions. ICCA Journal, Vol. 16, No. 2, pp. 91-94
  7. ^ Arthur Samuel (1959). Some Studies in Machine Learning Using the Game of Checkers. IBM Journal July 1959
  8. ^ Richard Sutton (1988). Learning to Predict by the Methods of Temporal Differences. Machine Learning, Vol. 3, No. 1, pdf
  9. ^ Gerald Tesauro (1992). Temporal Difference Learning of Backgammon Strategy. ML 1992
  10. ^ Gerald Tesauro (1994). TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play. Neural Computation Vol. 6, No. 2
  11. ^ Don Beal, Martin C. Smith (1999). Learning Piece-Square Values using Temporal Differences. ICCA Journal, Vol. 22, No. 4
  12. ^ Jonathan Baxter, Andrew Tridgell, Lex Weaver (1998). Experiments in Parameter Learning Using Temporal Differences. ICCA Journal, Vol. 21, No. 2, pdf
  13. ^ The Cilkchess Parallel Chess Program
  14. ^ EXchess also uses CLOP
  15. ^ Arthur Samuel (1967). Some Studies in Machine Learning. Using the Game of Checkers. II-Recent Progress. pdf
  16. ^ Johannes Fürnkranz (2000). Machine Learning in Games: A Survey. Austrian Research Institute for Artificial Intelligence, OEFAI-TR-2000-3, pdf
  17. ^ Thomas Nitsche (1982). A Learning Chess Program. Advances in Computer Chess 3
  18. ^ Tony Marsland (1985). Evaluation-Function Factors. ICCA Journal, Vol. 8, No. 2, pdf
  19. ^ Feng-hsiung Hsu, Thomas Anantharaman, Murray Campbell, Andreas Nowatzyk (1990). A Grandmaster Chess Machine. Scientific American, Vol. 263, No. 4, pp. 44-50. ISSN 0036-8733.
  20. ^ see 2.1 Learning from Desired Moves in Chess in Kunihito Hoki, Tomoyuki Kaneko (2014). Large-Scale Optimization for Evaluation Functions with Minimax Search. JAIR Vol. 49
  21. ^ Jonathan Schaeffer, Joe Culberson, Norman Treloar, Brent Knight, Paul Lu, Duane Szafron (1992). A World Championship Caliber Checkers Program. Artificial Intelligence, Vol. 53, Nos. 2-3,ps
  22. ^ Jonathan Schaeffer (1997, 2009). One Jump Ahead. 7. The Case for the Prosecution, pp. 111-114
  23. ^ Bruce Abramson (1990). Expected-Outcome: A General Model of Static Evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12, No. 2
  24. ^ Loss function - Use in statistics - Wkipedia
  25. ^ "Using cross-entropy error function instead of sum of squares leads to faster training and improved generalization", from Sargur Srihari, Neural Network Training (pdf)
  26. ^ Michael Buro (1995). Statistical Feature Combination for the Evaluation of Game Positions. JAIR, Vol. 3
  27. ^ Donald H. Mitchell (1984). Using Features to Evaluate Positions in Experts' and Novices' Othello Games. Masters thesis, Department of Psychology, Northwestern University, Evanston, IL
  28. ^ Jens Christensen (1986). Learning Static Evaluation Functions by Linear Regression. in Tom Mitchell, Jaime Carbonell, Ryszard Michalski (1986). Machine Learning: A Guide to Current Research. The Kluwer International Series in Engineering and Computer Science, Vol. 12
  29. ^ Random data points and their linear regression. Created with Sage by Sewaqu, November 5, 2010, Wikimedia Commons
  30. ^ Re: Piece weights with regression analysis (in Russian) by Fabien Letouzey, CCC, May 04, 2015
  31. ^ Michael Buro (1995). Statistical Feature Combination for the Evaluation of Game Positions. JAIR, Vol. 3
  32. ^ LOGISTELLO's Homepage
  33. ^ Re: Insanity... or Tal style? by Miguel A. Ballicora, CCC, April 02, 2009
  34. ^ Amir Ban (2012). Automatic Learning of Evaluation, with Applications to Computer Chess. Discussion Paper 613, The Hebrew University of Jerusalem - Center for the Study of Rationality, Givat Ram
  35. ^ Re: How Do You Automatically Tune Your Evaluation Tables by Álvaro Begué, CCC, January 08, 2014
  36. ^ The texel evaluation function optimization algorithm by Peter Österlund, CCC, January 31, 2014
  37. ^ Определяем веса шахматных фигур регрессионным анализом / Хабрахабр by WinPooh, April 27, 2015 (Russian)
  38. ^ Piece weights with regression analysis (in Russian) by Vladimir Medvedev, CCC, April 30, 2015
  39. ^ log-linear 1 / (1 + 10^(-s/4)) , s=-10 to 10 from Wolfram|Alpha
  40. ^ SPSA Tuner for Stockfish Chess Engine by Joona Kiiski
  41. ^ MATLAB from Wikipedia
  42. ^ CLOP for Noisy Black-Box Parameter Optimization by Rémi Coulom, CCC, September 01, 2011
  43. ^ CLOP slides by Rémi Coulom, CCC, November 03, 2011
  44. ^ thesis on eval function learning in Arimaa by Jon Dart, CCC, December 04, 2015
  45. ^ MMTO for evaluation learning by Jon Dart, CCC, January 25, 2015
  46. ^ Broyden–Fletcher–Goldfarb–Shanno algorithm from Wikipedia
  47. ^ Arasan 19.2 by Jon Dart, CCC, November 03, 2016 » Arasan's Tuning
  48. ^ Limited-memory BFGS from Wikipedia
  49. ^ Re: CLOP: when to stop? by Álvaro Begué, CCC, November 08, 2016
  50. ^ Thomas Anantharaman (1997). Evaluation Tuning for Computer Chess: Linear Discriminant Methods. ICCA Journal, Vol. 20, No. 4
  51. ^ The texel evaluation function optimization algorithm by Peter Österlund, CCC, January 31, 2014
  52. ^ Rémi Coulom (2011). CLOP: Confident Local Optimization for Noisy Black-Box Parameter Tuning. Advances in Computer Games 13
  53. ^ Amir Ban (2012). Automatic Learning of Evaluation, with Applications to Computer Chess. Discussion Paper 613, The Hebrew University of Jerusalem - Center for the Study of Rationality, Givat Ram
  54. ^ Kunihito Hoki, Tomoyuki Kaneko (2014). Large-Scale Optimization for Evaluation Functions with Minimax Search. JAIR Vol. 49, pdf
  55. ^ brtzsnr / txt — Bitbucket by Alexandru Mosoi
  56. ^ Home — TensorFlow
  57. ^ Limited-memory BFGS from Wikipedia
  58. ^ alonamaloh / ruy_tune — Bitbucket by Álvaro Begué
  59. ^ Tool for automatic black-box parameter optimization released by Rémi Coulom, CCC, June 20, 2010
  60. ^ CLOP for Noisy Black-Box Parameter Optimization by Rémi Coulom, CCC, September 01, 2011
  61. ^ Re: CLOP: when to stop? by Álvaro Begué, CCC, November 08, 2016
  62. ^ Re: Eval tuning - any open source engines with GA or PBIL? by Jon Dart, CCC, December 06, 2014
  63. ^ Re: The texel evaluation function optimization algorithm by Jon Dart, CCC, March 12, 2014
  64. ^ Eval tuning - any open source engines with GA or PBIL? by Hrvoje Horvatic, CCC, December 04, 2014

What links here?


Up one Level