Parallel Search, also known as Multithreaded Search or SMP Search, is a way to increase search speed by using additional processors. This topic that has been gaining popularity recently with multiprocessor computers becoming widely available. Utilizing these additional processors is an interesting domain of research, as traversing a search tree is inherently serial. Several approaches have been devised, with the most popular today being Young Brothers Wait Concept and Shared Hash Table aka Lazy SMP.

This page gives a brief summary of the different types. SMP algorithms are classified by their scalability (trend in search speed as the number of processors becomes large) and their speedup (change in time to complete a search). Typically, programmers use scaling to mean change in nodes per second (NPS) rates, and speedup to mean change in time to depth. Scaling and scalability are thus two different concepts.

This technique is a very simple approach to SMP. The implementation requires little more than starting additional processors. Processors are simply fed the root position at the beginning of the search, and each searches the same tree with the only communication being the transposition table. The gains come from the effect of nondeterminism. Each processor will finish the various subtrees in varying amounts of time, and as the search continues, these effects grow making the search trees diverge. The speedup is then based on how many nodes the main processor is able to skip from transposition table entries. Many programs used this if a "quick and dirty" approach to SMP is needed. It had the reputation of little speedup on a mere 2 processors, and to scale quite badly after this.

Recent improvements by Dan Homan^{[2]} , Martin Sedlak^{[3]} and others on Lazy SMP indicate that the algorithm scales quite well up to 8 cores and beyond ^{[4]} .

ABDADA, Alpha-Bêta Distribué avec Droit d'Aînesse (Distributed Alpha-Beta Search with Eldest Son Right) is a loosely synchronized, distributed search algorithm by Jean-Christophe Weill^{[5]} . It is based on the Shared Hash Table, and adds the number of processors searching this node inside the hash-table entry for better utilization - considering the Young Brothers Wait Concept.

Parallel Alpha-Beta

These algorithms divide the Alpha-Beta tree, giving different subtrees to different processors. Because alpha-beta is a serial algorithm, this approach is much more complex. However, these techniques also provide for the greatest gains from additional processors.

Principal Variation Splitting (PVS)

In Principal Variation Splitting (PVS), each node is expressed by a thread. A thread will spawn one child thread for each legal move. But data dependency specified by the algorithm exists among these threads: After getting a tighter bound from the thread corresponding to the PV node, the remaining threads are ready to run.

PV Splitting ^{[6]}

YBWC and Jamboree

The idea in Feldmann'sYoung Brothers Wait Concept (YBWC) ^{[7]}^{[8]} as well in Kuszmaul'sJamboree Search^{[9]}^{[10]}^{[11]} , is to search the first sibling node first before spawning the remaining siblings in parallel. This is based on the observations that the first move is either going to produce a cutoff (in which case processing sibling nodes is wasted effort) or return much better bounds. If the first move does not produce a cut-off, then the remaining moves are searched in parallel. This process is recursive.

Since the number of processors is not infinite the process of "spawning" work normally consists in putting it on some kind of "work to be done stack" where processors are free to grab work in FIFO fashion when there is no work to do. In YBW you would not "spawn" or place work on the stack until the first sibling is searched.

In their 1983 paper Improved Speedup Bounds for Parallel Alpha-Beta Search^{[12]} , Raphael Finkel and John Philip Fishburn already gave the theoretical confirmation to the common sense wisdom that parallel resources should first be thrown into searching the first child. Assuming the tree is already in an approximation to best-first order, this establishes a good alpha value that can then be used to parallel search the later children. The algorithm in the 1982 Artificial Intelligence paper ^{[13]} , which Fishburn called the "dumb algorithm" in his 1981 thesis presentation ^{[14]} gives p^0.5 speedup with p processors, while the 1983 PAMI algorithm (the "smart algorithm") gives p^0.8 speedup for lookahead trees with the branching factor of chess.

This algorithm, invented by the Cray Blitz team (including Robert Hyatt^{[15]} ), is the most complex. Though this gives the best known scalability for any SMP algorithm, there are very few programs using it because of its difficulty of implementation.

Other Approaches

Many different approaches have been tried that do not directly split the search tree. These algorithms have not enjoyed popular success due to the fact that they are not scalable. Typical examples include one processor that evaluates positions fed to it by a searching processor, or a tactical search that confirms or refutes a positional search.

During an parallel search, certain areas of memory must be protected to make sure processors do not write simultaneously and corrupt the data. Some type of semaphore system must be used. Semaphores access a piece of shared memory, typically an integer. When a processor wants to access protected memory, it reads the integer. If it is zero, meaning no other process is accessing the memory, then the processor attempts to make the integer nonzero. This whole process must be done atomically, meaning that the read, compare, and write are all done at the same time. Without this atomicity another processor could read the integer at the same time and both would see that they can freely access the memory.

In chess programs that use parallel alpha-beta, usually spinlocks are used. Because the semaphores are held for such short periods of time, processors want to waste as little time as possible after the semaphore is released before acquiring access. To do this, if the semaphore is held when a processor reaches it, the processor continuously reads the semaphore. This technique can waste a lot of processing time in applications with high contention, meaning that many processes try to access the same semaphore simultaneously. In chess programs, however, we want as much processing power as possible.

Spinlocks are sometimes implemented in assembly language because the operating system does not have an API for them.

Threads vs. Processes

There are two ways of utilizing the extra processing power of multiple CPUs, threads and processes. The difference between them is that threads share all memory in the program, but there are multiple threads of execution. In processes, all memory is local to each processor except memory that is explicitly shared. This means that in a threaded program, functions must pass around an extra argument specifying which thread is active, thus which board structure to use. When using processes, a single global board can be used that will be duplicated for each process.

Threads are more common, because they are easier to debug as well as implement, provided the program does not already have lots of global variables. Processes are favored by some because the need to explicitly share memory makes subtle bugs easier to avoid. Also, in processes, the extra argument to most functions is not needed.

Selim Akl, David T. Barnard, R.J. Doran, (1980). Design, analysis and implementation of a parallel alpha-beta algorithm, Department of Computing and Information Science, Queen's University, Kingston, Ontario.

Selim Akl, David T. Barnard, R.J. Doran (1980). Simulation and Analysis in Deriving Time and Storage Requirements for a Parallel Alpha-Beta Pruning Algorithm. IEEE International Conference on Parallel Processing, pp. 231-234.

Selim Akl, David T. Barnard, R.J. Doran (1980). Searching game trees in parallel, Proceedings of the Third Biennial Conference of the Canadian Society for Computational Studies of Intelligence, Victoria, B.C.

1981

John Philip Fishburn (1981). Analysis of Speedup in Distributed Algorithms. Ph.D. Thesis, pdf

Selim Akl, R.J. Doran (1981). A comparison of parallel implementations of the alpha-beta and Scout tree search algorithms using the game of checkers, Department of Computing and Information Science, Queen's University, Kingston, Ontario.

Jonathan Schaeffer (1988). Distributed Game-Tree Searching. Journal of Parallel and Distributed Computing, Vol. 6, No. 2, pp. 90-114.

Chris Ferguson, Richard Korf (1988). Distributed Tree Search and its Application to Alpha-Beta Pruning. Proceedings of AAAI-88, Vol. I, pp. 128-132. Saint Paul, MN, pdf

Monroe Newborn (1988). Unsynchronized Iterative Deepening Parallel Alpha-Beta Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI, Vol. 10, No. 5, pp. 687-694. ISSN 0162-8828.

Robert Hyatt, Bruce W. Suter, Harry Nelson (1989). A Parallel Alpha-Beta Tree Searching Algorithm. Parallel Computing, Vol. 10, No. 3, pp. 299-308. ISSN 0167-8191.

Feng-hsiung Hsu (1989). Large Scale Parallelization of Alpha-beta Search: An Algorithmic and Architectural Study with Computer Chess. Ph.D. thesis, Technical report CMU-CS-90-108, Carnegie Mellon University, advisor Hsiang-Tsung Kung

Rainer Feldmann, Peter Mysliwietz, Burkhard Monien (1992). Distributed Game Tree Search on a Massively Parallel System. Data Structures and Efficient Algorithms, B. Monien, Th. Ottmann (eds.), Springer, Lecture Notes in Computer Science, 594, 1992, 270-288

1993

Rainer Feldmann (1993). Game Tree Search on Massively Parallel Systems. Ph.D. Thesis, pdf

Valavan Manohararajah (2001). Parallel Alpha-Beta Search on Shared Memory Multiprocessors. Masters Thesis, pdf

Florian Schintke, Jens Simon, Alexander Reinefeld (2001). A Cache Simulator for Shared Memory Systems. International Conference on Computational Science ICCS 2001, San Francisco, CA, Springer LNCS 2074, vol. 2, pp. 569-578. zipped ps

2002

Akihiro Kishimoto, Jonathan Schaeffer. (2002). Distributed Game-Tree Search Using Transposition Table Driven Work Scheduling, In Proc. of 31st International Conference on Parallel Processing (ICPP'02), pages 323-330, IEEE Computer Society Press. pdf via CiteSeerX

Akihiro Kishimoto, Jonathan Schaeffer. (2002). Transposition Table Driven Work Scheduling in Distributed Game-Tree Search (Best Paper Prize), In Proc. of Fifteenth Canadian Conference on Artificial Intelligence (AI'2002), volume 2338 of Lecture Notes in Artificial Intelligence (LNAI), pages 56-68, Springer

^John Romein, Henri Bal, Jonathan Schaeffer, Aske Plaat (2002). A Performance Analysis of Transposition-Table-Driven Scheduling in Distributed Search. IEEE Transactions on Parallel and Distributed Systems, Vol. 13, No. 5, pp. 447–459. pdf

Home * Search * Parallel SearchParallel Search, also known asMultithreaded Searchor SMP Search, is a way to increase search speed by using additional processors. This topic that has been gaining popularity recently with multiprocessor computers becoming widely available. Utilizing these additional processors is an interesting domain of research, as traversing a search tree is inherently serial. Several approaches have been devised, with the most popular today being Young Brothers Wait Concept and Shared Hash Table aka Lazy SMP.This page gives a brief summary of the different types. SMP algorithms are classified by their scalability (trend in search speed as the number of processors becomes large) and their speedup (change in time to complete a search). Typically, programmers use

scalingto mean change in nodes per second (NPS) rates, and speedup to mean change in time to depth. Scaling and scalability are thus two different concepts.A subtype of parallel algorithms, distributed algorithms are algorithms designed to work in cluster computing and distributed computing environments, where additional concerns beyond the scope of "classical" parallel algorithms need to be addressed.

^{[1]}## Table of Contents

## Shared Hash Table

see Main page: Shared Hash TableThis technique is a very simple approach to SMP. The implementation requires little more than starting additional processors. Processors are simply fed the root position at the beginning of the search, and each searches the same tree with the only communication being the transposition table. The gains come from the effect of nondeterminism. Each processor will finish the various subtrees in varying amounts of time, and as the search continues, these effects grow making the search trees diverge. The speedup is then based on how many nodes the main processor is able to skip from transposition table entries. Many programs used this if a "quick and dirty" approach to SMP is needed. It had the reputation of little speedup on a mere 2 processors, and to scale quite badly after this.

## Lazy SMP

see Main page: Lazy SMPRecent improvements by Dan Homan

^{[2]}, Martin Sedlak^{[3]}and others onLazySMP indicate that the algorithm scales quite well up to 8 cores and beyond^{[4]}.## ABDADA

see Main page: ABDADAABDADA, Alpha-Bêta Distribué avec Droit d'Aînesse (Distributed Alpha-Beta Search with Eldest Son Right) is a loosely synchronized, distributed search algorithm by Jean-Christophe Weill

^{[5]}. It is based on the Shared Hash Table, and adds the number of processors searching this node inside the hash-table entry for better utilization - considering the Young Brothers Wait Concept.## Parallel Alpha-Beta

These algorithms divide the Alpha-Beta tree, giving different subtrees to different processors. Because alpha-beta is a serial algorithm, this approach is much more complex. However, these techniques also provide for the greatest gains from additional processors.## Principal Variation Splitting (PVS)

In Principal Variation Splitting (PVS), each node is expressed by a thread. A thread will spawn one child thread for each legal move. But data dependency specified by the algorithm exists among these threads: After getting a tighter bound from the thread corresponding to the PV node, the remaining threads are ready to run.^{[6]}## YBWC and Jamboree

The idea in Feldmann's Young Brothers Wait Concept (YBWC)^{[7]}^{[8]}as well in Kuszmaul's Jamboree Search^{[9]}^{[10]}^{[11]}, is to search the first sibling node first before spawning the remaining siblings in parallel. This is based on the observations that the first move is either going to produce a cutoff (in which case processing sibling nodes is wasted effort) or return much better bounds. If the first move does not produce a cut-off, then the remaining moves are searched in parallel. This process is recursive.Since the number of processors is not infinite the process of "spawning" work normally consists in putting it on some kind of "work to be done stack" where processors are free to grab work in FIFO fashion when there is no work to do. In YBW you would not "spawn" or place work on the stack until the first sibling is searched.

In their 1983 paper

Improved Speedup Bounds for Parallel Alpha-Beta Search^{[12]}, Raphael Finkel and John Philip Fishburn already gave the theoretical confirmation to the common sense wisdom that parallel resources should first be thrown into searching the first child. Assuming the tree is already in an approximation to best-first order, this establishes a good alpha value that can then be used to parallel search the later children. The algorithm in the 1982 Artificial Intelligence paper^{[13]}, which Fishburn called the "dumb algorithm" in his 1981 thesis presentation^{[14]}gives p^0.5 speedup with p processors, while the 1983 PAMI algorithm (the "smart algorithm") gives p^0.8 speedup for lookahead trees with the branching factor of chess.## Dynamic Tree Splitting (DTS)

Main page: Dynamic Tree SplittingThis algorithm, invented by the Cray Blitz team (including Robert Hyatt

^{[15]}), is the most complex. Though this gives the best known scalability for any SMP algorithm, there are very few programs using it because of its difficulty of implementation.## Other Approaches

Many different approaches have been tried that do not directly split the search tree. These algorithms have not enjoyed popular success due to the fact that they are not scalable. Typical examples include one processor that evaluates positions fed to it by a searching processor, or a tactical search that confirms or refutes a positional search.## Taxonomy

Overview and taxonomy of parallel algorithms based on alpha-beta, given by Mark Brockington, ICCA Journal, Vol. 19: No. 3 in 1996^{[16]}Described

Control Distribution

At These Nodes

At These Nodes

Leftmost child of 3

John Philip Fishburn

Murray Campbell

Leftmost child of 3

^{[17]}Bruce W. Suter,

Harry Nelson

Steve Otto

with no TT-Entry

& no cutoff

Richard Korf

Tree Expansion

^{[18]}Yaoqing Gao

Jonathan Schaeffer

## Other Considerations

## Memory Design

## Semaphores

During an parallel search, certain areas of memory must be protected to make sure processors do not write simultaneously and corrupt the data. Some type of semaphore system must be used. Semaphores access a piece of shared memory, typically an integer. When a processor wants to access protected memory, it reads the integer. If it is zero, meaning no other process is accessing the memory, then the processor attempts to make the integer nonzero. This whole process must be done atomically, meaning that the read, compare, and write are all done at the same time. Without this atomicity another processor could read the integer at the same time and both would see that they can freely access the memory.In chess programs that use parallel alpha-beta, usually spinlocks are used. Because the semaphores are held for such short periods of time, processors want to waste as little time as possible after the semaphore is released before acquiring access. To do this, if the semaphore is held when a processor reaches it, the processor continuously reads the semaphore. This technique can waste a lot of processing time in applications with high contention, meaning that many processes try to access the same semaphore simultaneously. In chess programs, however, we want as much processing power as possible.

Spinlocks are sometimes implemented in assembly language because the operating system does not have an API for them.

## Threads vs. Processes

There are two ways of utilizing the extra processing power of multiple CPUs, threads and processes. The difference between them is that threads share all memory in the program, but there are multiple threads of execution. In processes, all memory is local to each processor except memory that is explicitly shared. This means that in a threaded program, functions must pass around an extra argument specifying which thread is active, thus which board structure to use. When using processes, a single global board can be used that will be duplicated for each process.Threads are more common, because they are easier to debug as well as implement, provided the program does not already have lots of global variables. Processes are favored by some because the need to explicitly share memory makes subtle bugs easier to avoid. Also, in processes, the extra argument to most functions is not needed.

Some programs that use threads:

^{[19]}^{[20]}Some programs that use processes:

## Didactic Programs

## See also

## Publications

## 1950 ...

1958).Parallel Programming. The Computer Journal, Vol. 1, No. 1## 1970 ...

1978).The Design and Analysis of Algorithms for Asynchronous Multiprocessors. Ph.D. thesis, Carnegie Mellon University, advisor Hsiang-Tsung Kung## 1980 ...

1980).Parallel Search of Game Trees.Technical Report TR 80-7, Computing Science Department, University of Alberta, pdf1980).The Arachne Kernel.Version 1.2 Technical Report 380, University of Wisconsin-Madison, pdf1980).Parallel Alpha-Beta Search on Arachne.IEEE International Conference on Parallel Processing, pp. 235-243.1980).Parallel Execution of a Sequence of tasks on a Asynchronous Multiprocessor. Australian Computer Journal 12(3): 105-112, pdf1980).Design, analysis and implementation of a parallel alpha-beta algorithm, Department of Computing and Information Science, Queen's University, Kingston, Ontario.1980).Simulation and Analysis in Deriving Time and Storage Requirements for a Parallel Alpha-Beta Pruning Algorithm. IEEE International Conference on Parallel Processing, pp. 231-234.1980).Searching game trees in parallel, Proceedings of the Third Biennial Conference of the Canadian Society for Computational Studies of Intelligence, Victoria, B.C.19811981).Analysis of Speedup in Distributed Algorithms. Ph.D. Thesis, pdf1981).A comparison of parallel implementations of the alpha-beta and Scout tree search algorithms using the game of checkers, Department of Computing and Information Science, Queen's University, Kingston, Ontario.1981).Parallel Search of Strongly Ordered Game Trees. Technical Report TR 81-9, Department of Computing Science , University of Alberta, pdf19821982).Parallel Search of Strongly Ordered Game Trees.ACM Computing Surveys, Vol. 14, No. 4, pp. 533-551. pdf1982).A Study of Parallel Tree Search Algorithms. Technical Report TR 82-4, Computing Science Department, University of Alberta, pdf1982).Parallelism in Alpha-Beta Search.Artificial Intelligence, Vol. 19, No. 11982).OSTRICH/P - a parallel search chess program, SOCS-82.3,McGill University, School of Computer Science, Montreal.1982).Design, Analysis, and Implementation of a Parallel Tree Search Algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 4, No. 2, pp. 192-203. ISSN 0162-882819831983).Improved Speedup Bounds for Parallel Alpha-Beta Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 5, No. 1, pp. 89 - 921983).Searching, Merging, and Sorting in Parallel Computation. IEEE Transactions on Computers1983).The Key Node Method: A Highly-Parallel Alpha-Beta Algorithm. Technical Report UUCCS 83-101, University of Utah, pdf19841984).An Experiment in Distributed Game Tree Searching, M.Sc. thesis, University of Waterloo^{[21]}^{[22]}## 1985 ...

1985).Parallel Game-Tree Search.IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 7, No. 4, pp. 442-452. 1984 pdf (Draft)1985).A New Distributed Depth-First Search Algorithm. Information Processing Letters, Vol. 20, No. 31985).A Parallel Search Chess Program. Proceedings of the ACM Annual Conference, pp. 272-277. Denver, Co.1985).Allocating Independent Subtasks on Parallel Processors. IEEE Transactions on Software Engineering, Vol. 111985).A Hybrid SSS*/Alpha-Beta Algorithm for Parallel Search of Game Trees. IJCAI'85 » SSS* and Dual*19861986).Multiprocessor Tree-Search Experiments. Advances in Computer Chess 41986).Parallel Alpha-Beta Search. 4th NGI-SION Symposium Stimulerende Informatica. Jaarbeurs Utrecht, The Netherlands1986).A Summary of Parallel Alpha-Beta Search Results. ICCA Journal, Vol 9, No. 31986).Improved Parallel Alpha-Beta Searching. Proceedings ACM/IEEE Fall Joint Computer Conference, pp. 519-527.1986).A Local Area Network Used as a Parallel Architecture. Technical Report 31, University of Paderborn19871987).Parallel Searches of Game Tree. Systems and Computer in Japan, Vol. 18, No. 8, pp. 97-109.19881988).Distributed Game-Tree Searching. Journal of Parallel and Distributed Computing, Vol. 6, No. 2, pp. 90-114.1988).Distributed Tree Search and its Application to Alpha-Beta Pruning.Proceedings of AAAI-88, Vol. I, pp. 128-132. Saint Paul, MN, pdf1988).Unsynchronized Iterative Deepening Parallel Alpha-Beta Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI, Vol. 10, No. 5, pp. 687-694. ISSN 0162-8828.1988).Chess on a Hypercube. The Third Conference on Hypercube Concurrent Computers and Applications, Vol. II-Applications1988).A High-Performance Parallel Algorithm to Search Depth-First Game Trees. Ph.D. Thesis, Department of Computer Science, University of Alabama at Birmingham1988).Parallel Conspiracy-Number Search. M.Sc. thesis, Faculty of Mathematics and Computer Science, Vrije Universteit, Amsterdam1988).Alpha-beta search on virtual tree machines. Information Sciences, Vol. 44, No. 119891989).Load Balancing Search Problems on General-Purpose Multi-Computers. Workshop on New Directions in Game-Tree Search1989).Distributed Game-Tree Search. ICCA Journal, Vol. 12, No. 21989).The shared data-object model as a paradigm for programming distributed systems. Ph.D. thesis, Vrije Universiteit1989).A Parallel Alpha-Beta Tree Searching Algorithm. Parallel Computing, Vol. 10, No. 3, pp. 299-308. ISSN 0167-8191.1989).Searching Game Trees in Parallel. Technical report 877, pdf1989).On parallel evaluation of game trees. SPAA '891989).Large Scale Parallelization of Alpha-beta Search: An Algorithmic and Architectural Study with Computer Chess. Ph.D. thesis, Technical report CMU-CS-90-108, Carnegie Mellon University, advisor Hsiang-Tsung Kung## 1990 ...

1991)A Fully Distributed Chess Program. Advances in Computer Chess 6, pdf19921992).A comparison of some parallel game-tree search algorithms, zipped postscript1992).Experiments with a Fully Distributed Chess Program. Heuristic Programming in AI 31992).Distributed Game Tree Search on a Massively Parallel System. Data Structures and Efficient Algorithms, B. Monien, Th. Ottmann (eds.), Springer, Lecture Notes in Computer Science, 594, 1992, 270-28819931993).Game Tree Search on Massively Parallel Systems. Ph.D. Thesis, pdf1993).Algorithmique parallèle sur les arbres de décision et raisonnement en temps contraint. Etude et application au Minimax= Parallel algorithm for heuristic tree searching and real-time reasoning. Study and application to the Minimax, Ph.D. Thesis, École nationale supérieure de l'aéronautique et de l'espace, Toulouse, France1993).Parallel Search of Narrow Game Trees. M.Sc. Thesis, University of Alberta1993).Parallelizations of Game-Tree Search. PARCO 1993, pdf hosted by CiteSeerX19941994).The DTS high-performance parallel tree search algorithm.1994).Synchronized MIMD Computing. Ph. D. Thesis, Massachusetts Institute of Technology, pdf1994).Massively Parallel Chess, pdf1994).Contribution à l'Algorithmique Non Numérique Parallèle : Exploration d'Espaces de Recherche. Ph.D. thesis, University of Paris VI1994).Distributed Searches: a Basis for Comparison.ICCA Journal, Vol. 17, No. 4, zipped postscript1994).An Implementation of the Young Brothers Wait Concept. Internal report, University of Alberta1994).Improvements to Parallel Alpha-Beta Algorithms. Technical report, Department of Computing Science, University of Alberta## 1995 ...

1995).The StarTech Massively Parallel Chess Program. pdf1995).Parallel Retrograde Analysis on a Distributed System. Supercomputing ’95, San Diego, CA.1995).Programmes d'Échecs de Championnat: Architecture Logicielle Synthèse de Fonctions d'Évaluations, Parallélisme de Recherche. Ph.D. Thesis. Université Paris 8, Saint-Denis, zipped ps (French)1995).Parallel Game Tree Search on SIMD Machines. IRREGULAR 1995, from CiteSeerX1995).Speculative Parallelism Improves Search?Technical Report 95-05, Department of Computing Science, University of Alberta, CiteSeerX19961996).The ABDADA Distributed Minimax Search Agorithm. Proceedings of the 1996 ACM Computer Science Conference, pp. 131-138. ACM, New York, N.Y, reprinted ICCA Journal, Vol. 19, No. 1, zipped postscript1996)Parallel Controlled Conspiracy-Number Search- Advances in Computer Chess 81996).Multithreaded Pruned Tree Search in Distributed Systems. Journal of Computing and Information, 2(1), 482-492, pdf1996).The APHID Parallel αβ Search Algorithm. Technical Report 96-07, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada. as pdf from CiteSeerX1996).A Taxonomy of Parallel Game-Tree Search Algorithms. ICCA Journal, Vol. 19, No. 31996).Problems in Sequential and Parallel Game Tree Search. Bielefeld University, zipped ps1996).How to Share Concurrent Wait-Free Variables. Journal of the ACM, Vol. 43, No. 419971997).The Dynamic Tree-Splitting Parallel Search Algorithm, ICCA Journal, Vol. 20, No. 11997).KnightCap — a parallel chess program on the AP1000+. zipped ps » KnightCap1997).APHID Game-Tree Search. Advances in Computer Chess 81997).Nagging: A Distributed, Adversarial Search-Pruning Technique Applied to First-Order Inference. Journal of Automated Reasoning, Vol. 19, No. 3^{[23]}19981998).Asynchronous Parallel Game-Tree Search. Ph.D. Thesis, University of Alberta, zipped postscript1998).Selective Game Tree Search on a Cray T3E. ps1998).Performance Optimization for Distributed-Shared-Data Systems. Ph.D. thesis, University of Waterloo19991999).Using Cilk to Write Multiprocessor Chess Programs, pdf1999).Parallel Alpha-Beta Pruning of Game Decision Trees: A Chess Implementation. CS 584 Fall 1999 Semester Project Report, Brigham Young University1999).APHID: Asynchronous Parallel Game-Tree Search. Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada. as pdf from CiteSeerX1999).Transposition Table Driven Work Scheduling in Distributed Search. AAAI-99, pdf^{[24]}^{[25]}## 2000 ...

2000).The Multigame Reference Manual. Vrije Universiteit, pdf20012001).Multigame - An Environment for Distributed Game-Tree Search. Ph.D. thesis, Vrije Universiteit, supervisor Henri Bal, pdf2001).Parallel randomized best-first minimax search. School of Computer Science, Tel-Aviv University, pdf2001).Parallel Alpha-Beta Search on Shared Memory Multiprocessors. Masters Thesis, pdf2001).A Cache Simulator for Shared Memory Systems. International Conference on Computational Science ICCS 2001, San Francisco, CA, Springer LNCS 2074, vol. 2, pp. 569-578. zipped ps20022002).Distributed Game-Tree Search Using Transposition Table Driven Work Scheduling, In Proc. of 31st International Conference on Parallel Processing (ICPP'02), pages 323-330, IEEE Computer Society Press. pdf via CiteSeerX2002).Transposition Table Driven Work Scheduling in Distributed Game-Tree Search(Best Paper Prize), In Proc. of Fifteenth Canadian Conference on Artificial Intelligence (AI'2002), volume 2338 of Lecture Notes in Artificial Intelligence (LNAI), pages 56-68, Springer2002).A Performance Analysis of Transposition-Table-Driven Scheduling in Distributed Search. IEEE Transactions on Parallel and Distributed Systems, Vol. 13, No. 5, pp. 447–459. pdf » Transposition Table^{[26]}2002).Nagging: A Scalable Fault-Tolerant Paradigm for Distributed Search. Artificial Intelligence 140, pdf, pdf20032003).Parallelizing a Simple Chess Program. pdf20042004).Parallel Chess Searching and Bitboards. Masters Thesis, postscript## 2005 ...

2005).Transposition-Driven Scheduling in Parallel Two-Player State-Space Search. Masters Thesis, pdf20062006).The Problem with Threads. Technical Report No. UCB/EECS-2006-1, University of California, Berkeley, pdf20072007).The Method of the Chess Search Algorithms - Parallelization using Two-Processor distributed System, pdf2007).On the Parallelization of UCT. CGW 2007, pdf » UCT2007).Multiple Parallel Search in Shogi. 12th Game Programming Workshop20082008).Parallel Monte-Carlo Tree Search. CG 2008, pdf2008).A Parallel Monte-Carlo Tree Search Algorithm. CG 2008, pdf2008).The Parallelization of Monte-Carlo Planning - Parallelization of MC-Planning. ICINCO-ICSO 2008: 244-249, pdf, slides as pdf2008).A Twofold Distributed Game-Tree Search Approach Using Interconnected Clusters. Euro-Par 2008: 587-598, abstract from springerlink2008).A Parallel Monte-Carlo Tree Search Algorithm. pdf2008).A Survey of Parallel Search Algorithms over Alpha-Beta Search Trees using Symmetric Multiprocessor Machines. Masters Project, East Carolina University, advisor Ronnie Smith20092009).A lock-free multithreaded Monte-Carlo tree search algorithm, Advances in Computer Games 12, pdf2009).Parallel Heuristic Search. In: C.A. Floudas, P.M. Pardalos (eds.), Encyclopedia of Optimization 2nd ed. pp 2908-29122009).Parallel Nested Monte-Carlo Search. NIDISC 2009, pdf2009).Aggrandizement of Board Games’ Performance on Multi-core Systems: Taking GNU-Chess as a prototype. BMS College of Engineering, Faculty mentor: Professor Ashok Kumar, Intel® Developer Zone » GNU Chess## 2010 ...

2010).Scalability and Parallelization of Monte-Carlo Tree Search. CG 2010, pdf2010).Parallel Alpha-Beta Based Game Tree Search, slides as pdf2010).Implementation of Parallel Game Tree Search on a SIMD System. Huazhong University of Science & Technology, Wuhan, China, ICIE 2010, Vol. 12010).Parallel Depth First Proof Number Search. AAAI 2010 » Proof-Number Search20112011).A Parallel General Game Player. KI Journal, Vol. 25, No. 1, pdf2011).Scalable Distributed Monte Carlo Tree Search. SoCS2011, pdf2011).Parallel Game Tree Search Using GPU. Institute of Informatics and Software Engineering, Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava, pdf » GPU2011).A Distributed Chess Playing Software System Model Using Dynamic CPU Availability Prediction. SERP 2011, pdf2011).Shared-memory Parallel Programming with Cilk. Rice University, slides as pdf » Cilk2011).UCT-Treesplit - Parallel MCTS on Distributed Memory. MCTS Workshop, Freiburg, Germany, pdf » UCT2011).Parallel Monte-Carlo Tree Search for HPC Systems. Euro-Par 2011, pdf2011).Parallel Alpha-Beta Algorithm on the GPU. CIT. Journal of Computing and Information Technology, Vol. 19, No. 4 » GPU, Reversi20122012).Volunteer Computing System Applied to Computer Games. TCGA 2012 Workshop, pdf2012).A Node-based Parallel Game Tree Algorithm Using GPUs. CLUSTER 2012, pdf » GPU20132013).Parallel Dovetailing and its Application to Depth-First Proof-Number Search. ICGA Journal, Vol. 36, No. 1 » Proof-Number Search^{[27]}2013).Scalable Parallel DFPN Search. CG 2013^{[28]}, Jan Treibig, Gerhard Wellein (2013).The Practitioner's Cookbook for Good Parallel Performance on Multi- and Many-Core Systems. RRZE, SC13, slides as pdf20142014).Is Parallel Programming Hard, And, If So, What Can You Do About It?. pdf2014).Parallel Monte-Carlo Tree Search for HPC Systems and its Application to Computer Go. Ph.D. thesis, University of Paderborn, advisors Marco Platzner, Ulf Lorenz, pdf, pdf2014).Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi. CoRR abs/1409.4297 » Go, MCTS, x86-642014).A Study of Software Framework for Parallel Monte Carlo Tree Search. GPW-2014## 2015 ...

2015).Feature Strength and Parallelization of Sibling Conspiracy Number Search. Advances in Computer Games 142015).Parameter-Free Tree Style Pipeline in Asynchronous Parallel Game-Tree Search. Advances in Computer Games 142015).Scaling Monte Carlo Tree Search on Intel Xeon Phi. CoRR abs/1507.04383 » Hex, MCTS, x86-642015).Parallel Monte Carlo Tree Search from Multi-core to Many-core Processors. TrustCom/BigDataSE/|ISPA 2015, pdf2015).Software Development Framework for Job-Level Algorithms. ICGA Journal, Vol. 38, No. 32015).Dynamic Prediction of Minimal Trees in Large-Scale Parallel Game Tree Search. Journal of Information Processing, Vol. 23, No. 12015).A Parallel Algorithm for Game Tree Search Using GPGPU. IEEE Transactions on Parallel and Distributed Systems, Vol. 26, No. 8 » GPU2015).Job-Level Alpha-Beta Search. IEEE Transactions on Computational Intelligence and AI in Games, Vol. 7, No. 12015).Distributed Monte Carlo Tree Search: A Novel Technique and its Application to Computer Go. IEEE Transactions on Computational Intelligence and AI in Games, Vol. 7, No. 4^{[29]}## Forum Posts

## 1995 ...

## 2000 ...

## 2005 ...

20082009Results from UCT parallelization by Gian-Carlo Pascutto, CCC, March 11, 2009

## 2010 ...

201120122013^{[30]}^{[31]}Measure of SMP scalability (sub-thread) by Ernest Bonnem, CCC, July 08, 2013

2014## 2015 ...

Explanation for non-expert? by Louis Zulli, CCC, February 16, 2015 » Stockfish

Best Stockfish NPS scaling yet by Louis Zulli, CCC, March 02, 2015

^{[32]}2016^{[33]}Re: Baffling multithreading scaling behavior by Robert Hyatt, CCC, September 07, 2016

2017Approximate ABDADA by Peter Österlund, CCC, August 23, 2017 » ABDADA

^{[34]}* Question about parallel search and race conditions by Michael Sherwin, CCC, September 11, 2017 » Shared Hash Table## External Links

## Parallel Search

^{[35]}## Parallel Computing

## Scalability

## Shared Memory

False sharing from Wikipedia

## Cache

MSI protocol from Wikipedia

MESI protocol from Wikipedia

MOESI protocol from Wikipedia

assembly - The prefetch instruction - Stack Overflow

Data Prefetch Support - GNU Project - Free Software Foundation (FSF)

Software prefetching considered harmful by Linus Torvalds, LWN.net, May 19, 2011

## Concurrency and Synchronization

Cooperating sequential processes (EWD 123)

A challenge to memory designers? (EWD 497)

## Misc

## References

1996).The ABDADA Distributed Minimax Search Agorithm. Proceedings of the 1996 ACM Computer Science Conference, pp. 131-138. ACM, New York, N.Y, reprinted ICCA Journal, Vol. 19, No. 1, zipped postscript1996).Multithreaded Pruned Tree Search in Distributed Systems. Journal of Computing and Information, 2(1), 482-492, pdf1991).A Fully Distributed Chess Program. Advances in Computer Chess 6, pdf1993).Game Tree Search on Massively Parallel Systems. Ph.D. Thesis, pdf1994).Massively Parallel Chess, pdf1994).Synchronized MIMD Computing. Ph. D. Thesis, Massachusetts Institute of Technology, pdf1995).The StarTech Massively Parallel Chess Program. pdf1983).Improved Speedup Bounds for Parallel Alpha-Beta Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 5, No. 1, pp. 89 - 921982).Parallelism in Alpha-Beta Search. Artificial Intelligence, Vol. 19, No. 11981).Analysis of Speedup in Distributed Algorithms. Ph.D. Thesis, pdf1994).The DTS high-performance parallel tree search algorithm1996).A Taxonomy of Parallel Game-Tree Search Algorithms. ICCA Journal, Vol. 19: No. 31985).Lionel Moser: An Experiment in Distributed Game Tree Searching.ICCA Journal, Vol. 8, No. 2 (Review)2002).A Performance Analysis of Transposition-Table-Driven Scheduling in Distributed Search. IEEE Transactions on Parallel and Distributed Systems, Vol. 13, No. 5, pp. 447–459. pdf## What links here?

Up one level