Why are there no deep reinforcement learning engines for chess, similar to AlphaGo?

Question

Computers have for a long time been able to play chess using a "brute-force"-technique, searching to a certain depth and then evaluating the position. The AlphaGo computer however, only use an ANN to evaluate the positions (it does not do any depth-search as far as I know). Is it possible to create an chess engine that plays chess in the same way as AlphaGo plays Go? Why has no one done this? Would this program perform better than the top chess-engines (and chess players) of today?

See https://arxiv.org/abs/1509.01549 (Giraffe: Using Deep Reinforcement Learning to Play Chess) and a popular article https://www.technologyreview.com/s/541276/deep-learning-machine-teaches-itself-chess-in-72-hours-plays-at-international-master/. Also https://erikbern.com/2014/11/29/deep-learning-for-chess.html — amoeba, Oct 19 '17 at 08:07
It was just a matter of time until somebody gets around to doing this properly. So one month after you posted your question, here you go: https://arxiv.org/abs/1712.01815. — amoeba, Dec 19 '17 at 08:48

SmallChess · Accepted Answer · 2017-12-11T06:51:44.470

EDIT (after reading the paper):

I've read the paper thoughtfully. Let's start off with what Google claimed in the paper:

They defeated Stockfish with Monte-Carlo-Tree-Search + Deep neural networks
The match was absolutely one-sided, many wins for AlphaZero but none for Stockfish
They were able to do it in just four hours
AlphaZero played like a human

Unfortunately, I don't think it's a good journal paper. I'm going to explain with links (so you know I'm not dreaming):

https://chess.stackexchange.com/questions/19360/how-is-alpha-zero-more-human has my answer on how AlphaZero played like a human
The match was unfair, strongly biased. I quote Tord Romstad, the original programmer for Stockfish.

https://www.chess.com/news/view/alphazero-reactions-from-top-gms-stockfish-author

The match results by themselves are not particularly meaningful because of the rather strange choice of time controls and Stockfish parameter settings: The games were played at a fixed time of 1 minute/move, which means that Stockfish has no use of its time management heuristics (lot of effort has been put into making Stockfish identify critical points in the game and decide when to spend some extra time on a move; at a fixed time per move, the strength will suffer significantly).

Stockfish couldn't have played the best chess with only a minute per move. The program was not designed for that.

Stockfish was running on a regular commercial machine, while AlphaZero was on a 4 millions+ TPU machine tuned for AlphaZero. This is a like matching your high-end desktop against a cheap Android phone. Tord wrote:

One is a conventional chess program running on ordinary computers, the other uses fundamentally different techniques and is running on custom designed hardware that is not available for purchase (and would be way out of the budget of ordinary users if it were).

Google inadvertently gave 64 threads to a 32 core machine for Stockfish. I quote GM Larry Kaufman (world class computer chess expert):

http://talkchess.com/forum/viewtopic.php?p=741987&highlight=#741987

I agree that the test was far from fair; another issue that hurt SF was that it was apparently run on 64 threads on a 32 core machine, but it would play much better running just 32 threads on that machine, since there is almost no SMP benefit to offset the roughly 5 to 3 slowdown. Also the cost ratio was more than I said; I was thinking it was a 64 core machine, but a 32 core machine costs about half what I guessed. So maybe all in all 30 to 1 isn't so bad an estimate. On the other hand I think you underestimate how much it could be further improved.

Stockfish gave only 1GB hash table. This is a joke... I have a larger hash table for my Stockfish iOS app (Disclaimer: I'm the author) on my iPhone! Tord wrote:

... way too small hash tables for the number of threads ...

1GB hash table is absolutely unacceptable for a match like this. Stockfish would frequently encounter hash collision. It takes CPU cycles to replace old hash entries.

Stockfish is not designed to run with that many number of threads. In my iOS chess app, only a few threads are used. Tord wrote:

... was playing with far more search threads than has ever received any significant amount of testing ...

Stockfish was running without an opening book or 6-piece Syzygy endgame tablebase. The sample size was insufficient. The Stockfish version was not the latest. Discussion here.

CONCLUSION

Google has not proven without doubts their methods are superior to Stockfish. Their numbers are superficial and strongly biased to AlphaZero. Their methods are not reproducible by an independent third party. It's still a bit too early to say Deep Learning is a superior method to traditional chess programming.

EDIT (Dec 2017):

There is a new paper from Google Deepmind (https://arxiv.org/pdf/1712.01815.pdf) for deep reinforcement learning in chess. From the abstract, the world number one Stockfish chess engine was "convincingly" defeated. I think this is the most significant achievement in computer chess since the 1997 Deep Blue match. I'll update my answer once I read the paper in details.

Original (before Dec 2017)

Let's clarify your question:

No, chess engines don't use brute-force.
AlphaGo does use tree searching, it uses Monte Carlo Tree Search. Google "Monte Carlo Tree Search alphaGo" if you want to be convinced.

ANN can be used for chess engines:

Giraffe (the link posted by @Tim)
NeuroChess

Would this program perform better than the top chess-engines (and chess players) of today?

Giraffe plays at about Internation Master level, which is about FIDE 2400 rating. However, Stockfish, Houdini and Komodo all play at about FIDE 3000. This is a big gap. Why? Why not Monte-Carlo Tree Search?

Material heuristic in chess is simple. Most of the time, a chess position is winning/losing by just counting materials on the board. Please recall counting materials doesn't work for Go. Material counting is orders of magnitude faster than running neural networks - this can be done by bitboards represented by a 64-bit integer. On the 64 bits system, it can be done by only several machine instructions. Searching with the traditional algorithm is much faster than machine learning. Higher nodes per second translate to deeper search.
Similarly, there're very useful and cheap techniques such as null move pruning, late move reduction and killer moves etc. They are cheap to run, and much efficient to the approach used in AlphaGo.
Static evaluation in chess is fast and useful
Machine learning is useful for optimizating parameters, but we also have SPSA and CLOP for chess.
There are lots of useful metrics for tree reduction in chess. Much less so for Go.

There was research that Monte Carlo Tree Search don't scale well for chess. Go is a different game to chess. The chess algorithms don't work for Go because chess relies on brutal tactics. Tactics is arguably more important in chess.

Now, we've established that MCTS work well for AlphaGo but less so for chess. Deep learning would be more useful if:

The tuned NN evaluation is better than the traditional algorithms. However ... deep learning is not magic, you as the programmer would still need to do the programming. As mentioned, we have something like SPSA for self-playing for parameters tuning in chess.
Investment, money! There's not much money for machine learning in chess. Stockfish is free and open source, but strong enough to defeat all human players. Why would Google spend millions if anybody can just download Stockfish for free? Why's going to pay for the CPU clusters? Who's going to pay for talents? Nobody wants to do it, because chess is considered a "solved" game.

If deep learning can achieve the following, it'll beat the traditional algorithm:

Given a chess position, "feel" it like a human grandmaster. For example, a human grandmaster wouldn't go into lines that are bad - by experience. Neither the traditional algorithm nor deep learning can achieve that. Your NN model might give you a probability [0..1] for your position, but that's not good enough.

Let me point out:

No. Giraffe (the link posted by @Tim) doesn't use Monte Carlo Tree Search. It uses the regular nega-max algorithm. All it does is replace the regular evaluation function with NN, and it's very slow.

one more:

Although Kasparov was beaten by Deep Blue in the 1997 match. "Humanity" was really lost around 2003-2005, when Kramnik lost a match to Deep Fritz without a win and Michael Adams lost to a cluster machine in a one-sided match. Around that time, Rybka proved too strong for even the best players in the world.

Reference:

http://www.talkchess.com/forum/viewtopic.php?t=64096&postdays=0&postorder=asc&highlight=alphago+chess&topic_view=flat&start=0

I quote:

In chess we have the concept of materiality which already gives a resonable estimation of how well an engine is doing and can be computed quickly. Furthermore, there a lot of other aspects of the game that can be encoded in a static evaluation function which couldn`t be done in Go. Due to the many heuristics and good evaluation, the EBF (Effective-Branching-Factor) is quite small. Using a Neural Network as a replacement for the static evaluation function would definently slow down the engine by quite a lot.

Thank you. Some questions: Chess engines use the alpha-beta algorithm, is this not a "brute-force"-algorithm? Does "Monte Carlo Tree Search" mean that one looks a number of moves ahead of the current position? — lijas, Oct 19 '17 at 08:43
@lijas "brute-force" is generally defined as searching all possibilities. Chess engines don't do that. — SmallChess, Oct 19 '17 at 08:46
@lijas Monte Carlo Tree Search is an algorithm that computer draws many different game paths. It's more like sampling from an unknown distribution. Wikipedia has a description. — SmallChess, Oct 19 '17 at 08:48
Why is the NN evaluation function used by giraffe slow? Isn't NN just a bunch of quick matrix multiplications. Are regular chess engines evaluations faster due to the bitboards etc you mentioned in your post? — lijas, Oct 19 '17 at 09:03
@lijas You just answered the question. Matrix multiplications is a slow operation. — SmallChess, Oct 19 '17 at 09:06
@lijas: Alpha-Beta pruning is just a tiny part of what Chess Engines do. If that was all they did, and they just used it to scan all possibilities to a certain depth, it could be considered brute force. But they don't, and it can't. Modern chess software is "lossy" - it aggressively prunes possibilities that potentially could be relevant, but statistically are unlikely to be so, in order to reach such high depth that the tactics become akin to strategy. — Meni Rosenfeld, Oct 19 '17 at 12:54
Alpha Beta search sure is "brute forcish". Hans Berliner on AI trends: "I consider the most important trend was that computers got considerably faster in these last 50 years. In this process, we found that many things for which we had at best anthropomorphic solutions, which in many cases failed to capture the real gist of a human's method, could be done by more brute-forcish methods that merely enumerated until a satisfactory solution was found. If this is heresy, so be it." (see http://ieeexplore.ieee.org/document/820322/?reload=true) — Daniel Lidström, Oct 19 '17 at 13:25
@DanielLidström ??? That doesn’t say anything like alpha beta? — SmallChess, Oct 19 '17 at 13:32
(+1) Really fitting someone appropriately named @SmallChess has the accepted answer in this question haha — Firebug, Oct 19 '17 at 13:52
@smallchess alpha beta is de facto search algorithm, even it’s variants like negascout which are just incremental improvements. What else could he be referring to? This was written well before deep learning systems came to be. — Daniel Lidström, Oct 19 '17 at 16:56
_"Stockfish, Houdini and Komodo all play at about FIDE 3000."_ sounds like a strong understatement. That would mean that the top 10 humans (who are around 2800) would achieve around 0.24 expected score. Assuming they can't win, that would mean they have almost a 50 % chance of drawing, and that sounds way too big to be reasonable. And giving any significant chance of winning seems even more unreasonable. — JiK, Oct 19 '17 at 20:01
@JiK It's an approximate number. They don't have a human FIDE rating. — SmallChess, Oct 19 '17 at 23:37
Thanks for the update. When would you say it is a fair game? When both AZ and SF use the same amount of energy? According to the paper, az searched 80K pos/s, while SF searched 70M pos/s. It would be interesting to see at which point SF starts winning, at 80M pos/s or 100M pos/s ...? — lijas, Dec 13 '17 at 06:26
@lijas https://groups.google.com/forum/#!topic/fishcooking/ExSnY8xy7sY — SmallChess, Dec 13 '17 at 06:32
" "brute-force" is generally defined as searching all possibilities" -- alphabeta search does exactly that, down to positions that are deemed quiescent. "That doesn’t say anything like alpha beta?" -- Perhaps you aren't familiar with alphabeta ... it's just an opimization technique that prunes branches that *couldn't possibly* have better evaluations. It doesn't change a brute force search into one that isn't. — Jim Balter, Jan 08 '18 at 10:22
@MeniRosenfeld "Modern chess software is "lossy" - it aggressively prunes possibilities that potentially could be relevant, but statistically are unlikely to be so" -- this simply isn't true. This approach was abandoned decades ago. — Jim Balter, Jan 08 '18 at 10:32
@JimBalter Could you please write a new answer for your points? I'm curious. Please. — SmallChess, Jan 08 '18 at 10:35
@JimBalter: That's ridiculous. Can you provide some sort of reference? What you just said is so obviously false that I can't even begin to imagine what you meant. — Meni Rosenfeld, Jan 08 '18 at 21:23
@MeniRosenfeld No, it's not ridiculous, and if you can't imagine what I meant then you would have no basis on which to call it ridiculous or deem it false. For a reference, see Daniel Lidström's comment above ... but I doubt that you have any idea who Hans Berliner is. Now, as you are just trolling, this is the last time I will respond to you, ever. — Jim Balter, Jan 09 '18 at 01:48
Please refer to http://talkchess.com/forum/viewtopic.php?topic_view=threads&p=747138&t=66298. As I said, modern chess software is obviously very lossy, aka "selective". Apparently Jim's comment might have been true in 2000 when the article he mentioned was published. But things have changed since then and selective methods have made a resolute comeback. I did not tag Jim per his request to discontinue communications with me, which is fine since I have little interest in communicating with someone who can hold on so strongly to wrong beliefs and accuse critics of trolling. — Meni Rosenfeld, Jan 11 '18 at 04:09
@MeniRosenfeld Well. I asked him for a new answer, which is the point of this open forum but he denied... — SmallChess, Jan 11 '18 at 04:10

Tim · Answer 2 · 2017-10-19T08:26:54.237

6

DeepBlue has already beaten Kasparov so this problem is solved with much simpler approach. This was possible because the number of possible moves in chess is much smaller then in go, so it is a much simpler problem. Moreover, notice that both NN and brute force need huge computing resources (here you can find a photo of the computer behind AlphaGo, notice that it uses not even GPU's, but TPU's for computation). The whole fuss with go was that when Deep Blue beat Kasparov, the go community has argued that this would not be possible with go (for lots of different reasons, but to summarize the arguments I'd need to give a detailed introduction to the game of go). Yes you can teach NN to play chess, Mario, or try teaching it to play Starcraft...

I guess that the reason for it is that you simply don't often hear in mainstream media about cases when people solve problems that were already solved.

Moreover your premise is wrong, Deep Learning is used to play chess, e.g. as described in Deep Learning Machine Teaches Itself Chess in 72 Hours, Plays at International Master Level. See also the corresponding paper, Giraffe: Using Deep Reinforcement Learning to Play Chess.

edited Oct 19 '17 at 08:26

answered Oct 19 '17 at 07:47

Tim

108,699
20
212
390

3

Despite there obviously being *some* chess programs trained with deep reinforcement learning, the fact remains that nobody built one that would beat "traditional" chess engines. I assume this is because this problem (beating traditional engines) is simply not interesting/motivating enough to invest a lot of effort needed to develop something of the AlphaGo level. – amoeba Oct 19 '17 at 08:14
1

@amoeba the widely available go-playing software also does not use deep learning and it is usually weaker then amateur 1 dan players, so *much* worse then AlphaGo. AlphaGo is a proof of concept. – Tim Oct 19 '17 at 08:16
If chess would be so easy to brute-force, we'd already solved it. – rus9384 Oct 19 '17 at 08:17
1

@rus9384 it is not easy but we already "solved" it, Deep Bluie has beaten Kasparov, we have our black swan that has passed chess Turing test. – Tim Oct 19 '17 at 08:18
5

Solved game is another thing:we don't know if perfect play guarantees win for black/white or ends by draw. – rus9384 Oct 19 '17 at 08:19
@rus9384 do you know a *human* player who have 100% chance of winning with a profesional human chess player..? – Tim Oct 19 '17 at 08:21
I just mean that no chess solver really uses full brute-force. It analyses a few moves and uses some criterias to decide which next move will be better. – rus9384 Oct 19 '17 at 08:23
@rus9384 I'm writing my answer to address it. – SmallChess Oct 19 '17 at 08:23
I guess the last link in your post answers most of my questions: There are chess-engines that uses deep neural networks, but they cant beat current chess engines and players. In my opinionn this should be the motivation behind investing more effort in creating a deep-learning chess engine. Chess is also internationally bigger game than Go. – lijas Oct 19 '17 at 08:33
@lijas My answer addresses your point for money. – SmallChess Oct 19 '17 at 08:52
1

@rus9384: It would be fun to start a game against a perfect chess AI and see "White wins. Checkmate in 97 moves". – Eric Duminil Oct 20 '17 at 09:33
@EricDuminil, 97 moves is not a win for white. I think all 97 move games already were brute-forced. There are such positions where white always win playing perfectly in at least 500+ moves. – rus9384 Oct 20 '17 at 10:42
@rus9384: Thanks for the comment. 97 was just a random integer. I had no idea how high it should be. – Eric Duminil Oct 20 '17 at 11:33
Exactly. Chess is a done problem, so there isn't as much interest. There is still lots of work though. – Ricardo Magalhães Cruz Oct 24 '17 at 19:13
@EricDuminil His comment is utter nonsense. Endgame tables, which are brute forced, have only been evaluated to 7 moves. – Jim Balter Jan 08 '18 at 10:40

Why are there no deep reinforcement learning engines for chess, similar to AlphaGo?

2 Answers2