Training a neural network on chess data

Question

I have been writing a chess engine with a friend and the engine itself is really good already (2700+ CCRL). We had the idea to use a neural network to have a better evaluation of positions.

Input to the network

because the output of the network greatly depends on which side has to move, we use the first half of the inputs to parse the position of who has to move and the second half for the opponent. In fact, we have for each piece and for each square an input which would result in 12x64 inputs. We had the idea to also include the opponent king position. So each side had 6x64 inputs and this for each square the opponent king can be -> 6x64x64. In total, this results in 12x64x64 binary input values where at maximum 32 are set.

Layers

The next layer consists of 64neurons where the first 32 neurons only accept inputs from the first half of the input features and the last 32 only accept inputs from the second half of the input features.

It follows a layer with 32 neurons fully connected and the output layer has only a single output.

Activation function

We use LeakyReLU at both hidden layers and a linear activation function at the output.

Training

Initially, I wanted to train the network on about 1 million positions yet this is taking ages. The position itself has a target value in the range of -20 to 20. I am using stochastic gradient descent using ADAM with a learning rate of 0.0001 and MSE as the loss function.

The problem I have is that this is taking a very very long time to even train those 1 million positions. The target is to later train on 300M positions.

I am not sure where I could improve the training progress.

Below are the graphs which show the training progress over 1000 iterations

The change for each iteration looks like this:

I hope someone could give me one or two hints on what I could improve in order to train the network faster. I am very happy for any advice!

Greetings, Finn

Edit 1

As suggested, I should convert my network to keras. I am having problems getting the sparse input to run.

import keras
from keras.layers import Input, Concatenate, Dense, LeakyReLU
from keras.models import Model
from keras import backend as K
import numpy as np







# trainX1 = tf.SparseTensor(indices=[[0,0], [0,1]], values=[1, 2], dense_shape=[1,24576])
# trainX2 = tf.SparseTensor(indices=[[0,0], [0,1]], values=[1, 2], dense_shape=[1,24576])
#
# trainY = np.random.rand(1)


trainX1 = np.random.random((10000,24576))
trainX2 = np.random.random((10000,24576))

trainY = np.zeros((10000,1))



#input for player to move
activeInput = Input((64*64*6,))
inactiveInput = Input((64*64*6,))


denseActive = Dense(64)(activeInput)
denseInactive = Dense(64)(inactiveInput)


act1 = LeakyReLU(alpha=0.1)(denseActive)
act2 = LeakyReLU(alpha=0.1)(denseInactive)

concat_layer= Concatenate()([act1, act2])
dense1 = Dense(32)(concat_layer)

act3 = LeakyReLU(alpha=0.1)(dense1)

output = Dense(1, activation="linear")(act3)

model = Model(inputs=[activeInput, inactiveInput], outputs=output)
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])

# print(model.summary())

print(model.fit([trainX1,trainX2], trainY, epochs=1))

If I use sparse=True for the Dense layer, it will throw some exceptions. I am happy if someone could help me creating sparse input vectors.

What hardware is it running on and what do you mean by a "long time" ? — Robert Long, Jul 26 '20 at 07:37
I implemented the code myself in c++. 1M forward iterations take about 1sec whereas 1M training iterations take about 15sec. I train on a single core on my AMD ryzen 3950x — Finn Eggers, Jul 26 '20 at 07:39
So no GPU ? Are you using a c++ deep learning framework or have you coded it from scratch? I would look at comparing it with a modest GPU — Robert Long, Jul 26 '20 at 07:50
Maybe just try it on Colab with GPU? Also, unless you know what you’re doing and have enough time for optimizing the code, using ready software like TensorFlow or PyTorch would be more efficient. — Tim, Jul 26 '20 at 07:50
I dont use a GPU. I wrote everything from scratch but I sort of know what I am doing. I implemented full AVX2 support. I mean the speed itself really isnt the problem I think. The problem is more that its simply not converging as fast as it should — Finn Eggers, Jul 26 '20 at 07:52
https://github.com/Luecx/Koivisto/tree/nnSearch/src_files/nn This is the code btw. the backprop has eta in it but its actually not used. the training stuff is inside data->Trainer.cpp — Finn Eggers, Jul 26 '20 at 07:53
That is a good question. Looking at some graphs of adam for other scenarios, it looks like 200 or 300 iterations should be enough. I mean obviously you cannot compare them like that — Finn Eggers, Jul 26 '20 at 07:59
Yeah it's hard to compare like that, which is why I think you should give GPU a go - just to compare first. — Robert Long, Jul 26 '20 at 08:01
@RobertLong: one cannot compare with GPU if the code is not implemented it GPU languages like CUDA... — Michael M, Jul 26 '20 at 10:37
@MichaelM I'm saying they should implement it in Torch or similar. — Robert Long, Jul 26 '20 at 10:44
It looks like you may be able to increase the learning rate... `0.0001` seems awfully slow considering that your network is making significant downward progress... that should at least slightly improve your training speed. — vikarjramun, Jul 26 '20 at 15:42
What output training data are you using? Are you simulating full games? — Federico Poloni, Jul 26 '20 at 19:43
I used my engine to generate a small test set with about 750k positions which has been evaluated with a depth 8 search. there was some discussion and the result is that a search with a larger depth would create too much noise. basically the network is so small that it wouldnt understand those hidden things — Finn Eggers, Jul 26 '20 at 19:44
So I am only training on a single position -> eval depth 8 from the previous engine — Finn Eggers, Jul 26 '20 at 19:44
"we use the first half of the inputs to parse the position" You aren't using the word "parse" correctly. You seem to mean "represent". Parsing means extracting meaning from a representation, not creating one. — Acccumulation, Jul 27 '20 at 00:37
@Acccumulation yes you are right. it was the first word that came to my mind. my bad :) — Finn Eggers, Jul 27 '20 at 08:08

score 14 · Accepted Answer · answered Jul 26 '20 at 08:01

14

I think you need to consider running it on a GPU. Google Colab is free and Amazon AWS is very cheap. You seem to know what you are doing so you can probably get up and running with PyTorch very quickly. Once you compare the performance of the same network implemented on GPU vs your single processor setup, you will be in a better to position to know where to go next.

answered Jul 26 '20 at 08:01

Robert Long

53,316
10
84
148

2

the problem is that simply putting it onto the gpu wont help most likely. the biggest problem here is the immense size of the input. I managed to handle the sparse input on the CPU and effectively only use about 1000 weights per iteration of the first layer. if I would use tensorflow or pytorch or sth like that, it wouldnt be able to understand that sparse input – Finn Eggers Jul 26 '20 at 08:03
2

Torch supports sparse tensors – Robert Long Jul 26 '20 at 08:13
mhm i will have a look at it. thank you – Finn Eggers Jul 26 '20 at 08:22
No worries. Good luck, it's an interesting project! – Robert Long Jul 26 '20 at 08:24
I got Keras running with gpu support and was wondering how exactly this sparse type can be used in keras/tf. If you dont mind, have a look at the edit i made :) – Finn Eggers Jul 26 '20 at 15:36
I was talking about Torch not TensorFlow but it should be available in TF. I will take a look when I get home – Robert Long Jul 26 '20 at 15:45
Thank you very much :). The thing is that i probably want to use some sort of data loader to eventually train on 300M datapoints which creates sparse inputs... I dont even know where I should start. I will probably try to get pytorch running – Finn Eggers Jul 26 '20 at 15:47
Regarding your edit, did you specify sparse for the dense and input layers? Unfortunately programming questions are off topic here ! – Robert Long Jul 26 '20 at 16:12
Yes I did but I dont know how to set the matrices to sparse. I will probably transfer this specific question to stackoverflow. thank you for your help :) – Finn Eggers Jul 26 '20 at 16:21
No problem. Also try the tf discussion forum – Robert Long Jul 26 '20 at 16:29
I converted the data to keras.... it took about 300 iterations for the loss to go below 5 :) – Finn Eggers Jul 26 '20 at 17:30
Great !!! That's more than a 3 fold improvement. If it's exactly the same network architecture with the same loss, activations, learning rate etc then you must have a bug in your C++ code. On the other hand, this kind of demonstrates why it's probably not a good idea to roll your own (though I'm very sure it's a great learning experience) – Robert Long Jul 26 '20 at 17:52
well its not clear what the reason is. i mean keras etc use a lot of tricks to speed up training. simply the fact that i dont need to give the learning rate says a lot... – Finn Eggers Jul 26 '20 at 17:55
i simply hope that when i transfer the weights, the output will be the same :) – Finn Eggers Jul 26 '20 at 17:57
I assume you tried increasing the learning rate in your code? – Robert Long Jul 26 '20 at 18:21
1

Yes I did. the error started going up. both for sgd and adam – Finn Eggers Jul 26 '20 at 18:21

player1 · Answer 2 · 2020-07-29T18:19:38.823

1

You could also try the CPU-friendly NNUE alternative. It is currently been developed for chess by the Stockfish team and seems to give good results. It is easy to use and train the networks, and it should be much easier than the hard-way. I've been working on the Stockfish team, and I think I could also help you with your engine if you wish (I'm also working on my own chess engine). Regards and good luck!

edited Jul 29 '20 at 18:19

answered Jul 29 '20 at 18:18

player1

11
2

thats what we are trying :) – Finn Eggers Jul 29 '20 at 18:19
It would be amazing if you could help us! If you like, you can add me on discord: Luecx#0540 – Finn Eggers Jul 29 '20 at 18:21
What language are you using for the engine? – player1 Jul 29 '20 at 18:24
C++. I also added AVX support – Finn Eggers Jul 29 '20 at 18:24
Okay, fine. Does it support UCI commands? – player1 Jul 29 '20 at 18:25
yes ofcourse :) – Finn Eggers Jul 29 '20 at 18:26