Home * Learning * Neural Networks

Neural Networks,
a series of connected neurons which communicate due to neurotransmission. The interface through which neurons interact with their neighbors consists of axon terminals connected via synapses to dendrites on other neurons. If the sum of the input signals into one neuron surpasses a certain threshold, the neuron sends an action potential at the axon hillock and transmits this electrical signal along the axon.

In 1949, Donald O. Hebb introduced his theory in The Organization of Behavior, stating that learning is about to adapt weight vectors (persistent synaptic plasticity) of the neuron pre-synaptic inputs, whose dot-product activates or controls the post-synaptic output, which is the base of Neural network learning [1].
Artificial Neural Network [2]


Already in the early 40s, Warren S. McCulloch and Walter Pitts introduced the artificial neuron as a logical element with multiple analogue inputs and a single digital output with a boolean result. The output fired "true", if the sum of the inputs exceed a threshold. In their 1943 paper A Logical Calculus of the Ideas Immanent in Nervous Activity [3], they attempted to demonstrate that a Turing machine program could be implemented in a finite network of such neurons of combinatorial logic functions of AND, OR and NOT.


Artificial Neural Networks (ANNs) are a family of statistical learning devices or algorithms used in regression, and binary or multiclass classification, implemented in hardware or software inspired by their biological counterparts. The artificial neurons of one or more layers receive one or more inputs (representing dendrites), and after being weighted, sum them to produce an output (representing a neuron's axon). The sum is passed through a nonlinear function known as an activation function or transfer function. The transfer functions usually have a sigmoid shape, but they may also take the form of other non-linear functions, piecewise linear functions, or step functions [4]. The weights of the inputs of each layer are tuned to minimize a cost or loss function, which is a task in mathematical optimization and machine learning.



The perceptron is an algorithm for supervised learning of binary classifiers. It was the first artificial neural network, introduced in 1957 by Frank Rosenblatt [5], implemented in custom hardware. In its basic form it consists of a single neuron with multiple inputs and associated weights.

Perceptron [6]

Supervised learning is applied using a set D of labeled training data with pairs of feature vectors (x) and given results as desired output (d), usually started with cleared or randomly initialized weight vector w. The output is calculated by all inputs of a sample, multiplied by its corresponding weights, passing the sum to the activation function f. The difference of desired and actual value is then immediately used modify the weights for all features using a learning rate 0.0 < α <= 1.0:
   for (j=0, Σ = 0.0; j < nSamples; ++j) {
    for (i=0, X = bias; i < nFeatures; ++i) 
      X += w[i]*x[j][i];
    y = f ( X );
    Σ += abs(Δ = d[j] - y);
    for (i=0; i < nFeatures; ++i) 
      w[i] += α*Δ*x[j][i];

AI Winter


Although the perceptron initially seemed promising, it was proved that perceptrons could not be trained to recognise many classes of patterns. This led to neural network research stagnating for many years, the AI-winter, before it was recognised that a feedforward neural network with two or more layers had far greater processing power than with one layer. Single layer perceptrons are only capable of learning linearly separable patterns. In their 1969 book Perceptrons, Marvin Minsky and Seymour Papert wrote that it was impossible for these classes of network to learn the XOR function. It is often believed that they also conjectured (incorrectly) that a similar result would hold for a multilayer perceptron [7]. However, this is not true, as both Minsky and Papert already knew that multilayer perceptrons were capable of producing an XOR function [8]-
Three layer, XOR capable Perceptron [9]


In 1974, Paul Werbos started to end the AI winter concerning neural networks, when he first descibed the mathematical process of training multilayer perceptrons through backpropagation of errors [10], derived in the context of control theory by Henry J. Kelley in 1960 [11] and by Arthur E. Bryson in 1961 [12] using principles of dynamic programming, simplified by Stuart Dreyfus in 1961 applying the chain rule [13]. It was in 1982, when Werbos applied a automatic differentiation method described in 1970 by Seppo Linnainmaa [14] to neural networks in the way that is widely used today [15] [16] [17] [18]. Backpropagation is a generalization of the delta rule to multilayered feedforward networks, made possible by using the chain rule to iteratively compute gradients for each layer. Backpropagation requires that the activation function used by the artificial neurons be differentiable, which is true for the common sigmoid logistic function or its softmax generalization in multiclass classification. Along with an optimization method such as gradient descent, it calculates the gradient of a cost or loss function with respect to all the weights in the neural network. The gradient is fed to the optimization method which in turn uses it to update the weights, in an attempt to minimize the loss function, which choice depends on the learning type (supervised, unsupervised, reinforcement) and the activation function - mean squared error or cross-entropy error function are used in binary classification [19]. The gradient is almost always used in a simple stochastic gradient descent algorithm. In 1983, Yurii Nesterov contributed an accelerated version of gradient descent that converges considerably faster than ordinary gradient descent [20] [21] [22] [23].

Backpropagation algorithm for a 3-layer network [24]:
   initialize the weights in the network (often small random values)
      for each example e in the training set
         O = neural-net-output(network, e)  // forward pass
         T = teacher output for e
         compute error (T - O) at the output units
         compute delta_wh for all weights from hidden layer to output layer  // backward pass
         compute delta_wi for all weights from input layer to hidden layer   // backward pass continued
         update the weights in the network
   until all examples classified correctly or stopping criterion satisfied
   return the network

Deep Learning

Deep learning has been characterized as a buzzword, or a rebranding of neural networks. A deep neural network (DNN) is an ANN with multiple hidden layers of units between the input and output layers which can be discriminatively trained with the standard backpropagation algorithm. Two common issues if naively trained are overfitting and computation time.

Convolutional NNs

Convolutional neural networks form a subclass of feedforward neural networks that have special weight constraints, individual neurons are tiled in such a way that they respond to overlapping regions. Convolutional NNs are suited for deep learning and are highly suitable for parallelization on GPUs [25]. They are research topic applied to the game of Go [26].

ANNs in Games

Applications of neural networks in computer games and chess are learning of evaluation and search control. Evaluation topics include feature selection and automated tuning, search control move ordering, selectivity and time management. The perceptron looks like the ideal learning algorithm for automated evaluation tuning.


In the late 80s, Gerald Tesauro pioneered in applying ANNs to the game of Backgammon. His program Neurogammon won the Gold medal at the 1st Computer Olympiad 1989 - and was further improved by TD-Lambda based Temporal Difference Learning within TD-Gammon [27]. Today all strong backgammon programs rely on heavily trained neural networks.


Logistic regression as applied in Texel's Tuning Method may be interpreted as supervised learning application of the single-layer perceptron with one neuron. This is also true for reinforcement learning approaches, such as TD-Leaf in KnightCap or Meep's TreeStrap, where the evaluation consists of a weighted linear combination of features. Despite these similarities with the perceptron, these engines are not considered using ANNs - since they use manually selected chess specific feature construction concepts like material, piece square tables, pawn structure, mobility etc..

More sophisticated attempts to replace static evaluation by neural networks and perceptrons feeding in more unaffiliated feature sets like board representation and attack tables etc., where not yet that successful like in other games. Chess evaluation seems not that well suited for neural nets, but there are also aspects of too weak models and feature recognizers as addressed by Gian-Carlo Pascutto with Stoofvlees [28], huge training effort, and weak floating point performance - but there is still hope due to progress in hardware and parallelization using SIMD instructions and GPUs, and deeper and more powerful neural network structures and methods successful in other domains.

Move Ordering

Concerning move ordering - there were interesting NN proposals like the Chessmaps Heuristic by Kieran Greer et al. [29], and the Neural MoveMap Heuristic by Levente Kocsis et al. [30].

Giraffe & Zurichess

In 2015, Matthew Lai trained Giraffe's deep neural network by TD-Leaf [31]. Zurichess by Alexandru Moșoi uses the TensorFlow library for automated tuning - in a two layers neural network, the second layer is responsible for a tapered eval to phase endgame and middlegame scores [32].


In 2016, Omid E. David, Nathan S. Netanyahu, and Lior Wolf introduced DeepChess obtaining a grandmaster-level chess playing performance using a learning method incorporating two deep neural networks, which are trained using a combination of unsupervised pretraining and supervised training. The unsupervised training extracts high level features from a given chess position, and the supervised training learns to compare two chess positions to select the more favorable one. In order to use DeepChess inside a chess program, a novel version of alpha-beta is used that does not require bounds but positions αpos and βpos [33].


In 2014, two teams independently investigated whether deep convolutional neural networks could be used to directly represent and learn a move evaluation function for the game of Go. Christopher Clark and Amos Storkey trained an 8-layer convolutional neural network by supervised learning from a database of human professional games, which without any search, defeated the traditional search program Gnu Go in 86% of the games [34] [35] [36] [37]. In their paper Move Evaluation in Go Using Deep Convolutional Neural Networks [38], Chris J. Maddison, Aja Huang, Ilya Sutskever, and David Silver report they trained a large 12-layer convolutional neural network in a similar way, to beat Gnu Go in 97% of the games, and matched the performance of a state-of-the-art Monte-Carlo Tree Search that simulates a million positions per move [39].

In 2015, a team affiliated with Google DeepMind around David Silver and Aja Huang, supported by Google researchers John Nham and Ilya Sutskever, build a Go playing program dubbed AlphaGo [40], combining Monte-Carlo tree search with their 12-layer networks [41].

See also

NN Chess Programs

Selected Publications

1940 ...

1950 ...

1960 ...

1970 ...

1980 ...


1990 ...

  • Paul Werbos (1990). Backpropagation Through Time: What It Does and How to Do It. Proceedings of the IEEE, Vol. 78, No. 10, pdf
  • Gordon Goetsch (1990). Maximization of Mutual Information in a Context Sensitive Neural Network. Ph.D. thesis
  • Vadim Anshelevich (1990). Neural Networks. Review. in Multi Component Systems (Russian)
  • Eric B. Baum (1990). Polynomial Time Algorithms for Learning Neural Nets. COLT 1990

2000 ...

  • Daniel Abdi, Simon Levine, Girma T. Bitsuamlak (2009). Application of an Artificial Neural Network Model for Boundary Layer Wind Tunnel Profile Development. 11th Americas conference on wind Engineering, pdf

2010 ...


Blog & Forum Posts

1996 ...

2000 ...

2005 ...

2010 ...

2015 ...


External Links

Activation Functions


  1. ^ Biological neural network - Early study - from Wikipedia
  2. ^ An example artificial neural network with a hidden layer, Image by Colin M.L. Burnett with Inkscape, December 27, 2006, CC BY-SA 3.0, Artificial Neural Networks/Neural Network Basics - Wikibooks, Wikimedia Commons
  3. ^ Warren S. McCulloch, Walter Pitts (1943). A Logical Calculus of the Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biology, Vol. 5, No. 1, pdf
  4. ^ Artificial neuron from Wikipedia
  5. ^ Frank Rosenblatt (1957). The Perceptron - a Perceiving and Recognizing Automaton. Report 85-460-1, Cornell Aeronautical Laboratory
  6. ^ The appropriate weights are applied to the inputs, and the resulting weighted sum passed to a function that produces the output y, image created by mat_the_w, based on raster image Perceptron.gif by 'Paskari', using Inkscape 0.46 for OSX, Wikimedia Commons, Perceptron from Wikipedia
  7. ^ multilayer perceptron is a misnomer for a more complicated neural network
  8. ^ Perceptron from Wikipedia
  9. ^ A two-layer neural network capable of calculating XOR. The numbers within the neurons represent each neuron's explicit threshold (which can be factored out so that all neurons have the same threshold, usually 1). The numbers that annotate arrows represent the weight of the inputs. This net assumes that if the threshold is not reached, zero (not -1) is output. Note that the bottom layer of inputs is not always considered a real neural network layer, Feedforward neural network from Wikipedia
  10. ^ Paul Werbos (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph. D. thesis, Harvard University
  11. ^ Henry J. Kelley (1960). Gradient Theory of Optimal Flight Paths. [[http://arc.aiaa.org/loi/arsj|ARS Journal, Vol. 30, No. 10
  12. ^ Arthur E. Bryson (1961). A gradient method for optimizing multi-stage allocation processes. In Proceedings of the Harvard University Symposium on digital computers and their applications
  13. ^ Stuart Dreyfus (1961). The numerical solution of variational problems. RAND paper P-2374
  14. ^ Seppo Linnainmaa (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's thesis, University of Helsinki
  15. ^ Paul Werbos (1982). Applications of advances in nonlinear sensitivity analysis. System Modeling and Optimization, Springer, pdf
  16. ^ Paul Werbos (1994). The Roots of Backpropagation. From Ordered Derivatives to Neural Networks and Political Forecasting. John Wiley & Sons
  17. ^ Deep Learning - Scholarpedia | Backpropagation by Jürgen Schmidhuber
  18. ^ Who Invented Backpropagation? by Jürgen Schmidhuber (2014, 2015)
  19. ^ "Using cross-entropy error function instead of sum of squares leads to faster training and improved generalization", from Sargur Srihari, Neural Network Training (pdf)
  20. ^ Yurii Nesterov from Wikipedia
  21. ^ ORF523: Nesterov’s Accelerated Gradient Descent by Sébastien Bubeck, I’m a bandit, April 1, 2013
  22. ^ Nesterov’s Accelerated Gradient Descent for Smooth and Strongly Convex Optimization by Sébastien Bubeck, I’m a bandit, March 6, 2014
  23. ^ Revisiting Nesterov’s Acceleration by Sébastien Bubeck, I’m a bandit, June 30, 2015
  24. ^ Backpropagation algorithm from Wikipedia
  25. ^ PARsE | Education | GPU Cluster | Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster
  26. ^ Ilya Sutskever, Vinod Nair (2008). Mimicking Go Experts with Convolutional Neural Networks. ICANN 2008, pdf
  27. ^ Richard Sutton, Andrew Barto (1998). Reinforcement Learning: An Introduction. MIT Press, 11.1 TD-Gammon
  28. ^ Re: Chess program with Artificial Neural Networks (ANN)? by Gian-Carlo Pascutto, CCC, January 07, 2010
  29. ^ Kieran Greer, Piyush Ojha, David A. Bell (1999). A Pattern-Oriented Approach to Move Ordering: the Chessmaps Heuristic. ICCA Journal, Vol. 22, No. 1
  30. ^ Levente Kocsis, Jos Uiterwijk, Eric Postma, Jaap van den Herik (2002). The Neural MoveMap Heuristic in Chess. CG 2002
  31. ^ *First release* Giraffe, a new engine based on deep learning by Matthew Lai, CCC, July 08, 2015
  32. ^ Re: Deep Learning Chess Engine ? by Alexandru Mosoi, CCC, July 21, 2016
  33. ^ Omid E. David, Nathan S. Netanyahu, Lior Wolf (2016). DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess. ICAAN 2016, Lecture Notes in Computer Science, Vol. 9887, Springer, pdf preprint
  34. ^ Christopher Clark, Amos Storkey (2014). Teaching Deep Convolutional Neural Networks to Play Go. arXiv:1412.3409
  35. ^ Teaching Deep Convolutional Neural Networks to Play Go by Hiroshi Yamashita, The Computer-go Archives, December 14, 2014
  36. ^ Why Neural Networks Look Set to Thrash the Best Human Go Players for the First Time | MIT Technology Review, December 15, 2014
  37. ^ Teaching Deep Convolutional Neural Networks to Play Go by Michel Van den Bergh, CCC, December 16, 2014
  38. ^ Chris J. Maddison, Aja Huang, Ilya Sutskever, David Silver (2014). Move Evaluation in Go Using Deep Convolutional Neural Networks. arXiv:1412.6564v1
  39. ^ Move Evaluation in Go Using Deep Convolutional Neural Networks by Aja Huang, The Computer-go Archives, December 19, 2014
  40. ^ AlphaGo | Google DeepMind
  41. ^ David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, Demis Hassabis (2016). Mastering the game of Go with deep neural networks and tree search. Nature, Vol. 529
  42. ^ Rosenblatt's Contributions
  43. ^ The abandonment of connectionism in 1969 - Wikipedia
  44. ^ Frank Rosenblatt (1962). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books
  45. ^ Seppo Linnainmaa (1976). Taylor expansion of the accumulated rounding error. BIT Numerical Mathematics, Vol. 16, No. 2
  46. ^ Backpropagation from Wikipedia
  47. ^ Paul Werbos (1994). The Roots of Backpropagation. From Ordered Derivatives to Neural Networks and Political Forecasting. John Wiley & Sons
  48. ^ Neocognitron - Scholarpedia by Kunihiko Fukushima
  49. ^ Sepp Hochreiter's Fundamental Deep Learning Problem (1991) by Jürgen Schmidhuber, 2013
  50. ^ Nici Schraudolph’s go networks, review by Jay Scott
  51. ^ Re: Evaluation by neural network ? by Jay Scott, CCC, November 10, 1997
  52. ^ Long short term memory from Wikipedia
  53. ^ Tsumego from Wikipedia
  54. ^ Helmholtz machine from Wikipedia
  55. ^ Who introduced the term “deep learning” to the field of Machine Learning by Jürgen Schmidhuber, Google+, March 18, 2015
  56. ^ Presentation for a neural net learning chess program by Dann Corbit, CCC, April 06, 2004
  57. ^ Clément Farabet | Code
  58. ^ Demystifying Deep Reinforcement Learning by Tambet Matiisen, Nervana, December 21, 2015
  59. ^ Generative adversarial networks from Wikipedia
  60. ^ Teaching Deep Convolutional Neural Networks to Play Go by Hiroshi Yamashita, The Computer-go Archives, December 14, 2014
  61. ^ Teaching Deep Convolutional Neural Networks to Play Go by Michel Van den Bergh, CCC, December 16, 2014
  62. ^ How Facebook’s AI Researchers Built a Game-Changing Go Engine | MIT Technology Review, December 04, 2015
  63. ^ Combining Neural Networks and Search techniques (GO) by Michael Babigian, CCC, December 08, 2015
  64. ^ Arasan 19.2 by Jon Dart, CCC, November 03, 2016 » Arasan's Tuning
  65. ^ GitHub - BarakOshri/ConvChess: Predicting Moves in Chess Using Convolutional Neural Networks
  66. ^ ConvChess CNN by Brian Richardson, CCC, March 15, 2017
  67. ^ Jürgen Schmidhuber (2015) Critique of Paper by "Deep Learning Conspiracy" (Nature 521 p 436).
  68. ^ DeepChess: Another deep-learning based chess program by Matthew Lai, CCC, October 17, 2016
  69. ^ ICANN 2016 | Recipients of the best paper awards
  70. ^ Jigsaw puzzle from Wikipedia
  71. ^ Using GAN to play chess by Evgeniy Zheltonozhskiy, CCC, February 23, 2017
  72. ^ Alois Heinz (1994). Efficient Neural Net α-β-Evaluators. pdf
  73. ^ Mathieu Autonès, Aryel Beck, Phillippe Camacho, Nicolas Lassabe, Hervé Luga, François Scharffe (2004). Evaluation of Chess Position by Modular Neural network Generated by Genetic Algorithm. EuroGP 2004
  74. ^ Naive Bayes classifier from Wikipedia
  75. ^ GitHub - pluskid/Mocha.jl: Deep Learning framework for Julia
  76. ^ Rectifier (neural networks) from Wikipedia
  77. ^ Muthuraman Chidambaram, Yanjun Qi (2017). Style Transfer Generative Adversarial Networks: Learning to Play Chess Differently. arXiv:1702.06762v1
  78. ^ erikbern/deep-pink · GitHub

What links here?

Up one Level