How does AlphaZero guarantee it could make consistent improvement?

Asked Mar 22 '21 at 05:17

Active Mar 22 '21 at 05:17

Viewed 31 times

I know the detail of AlphaZero. And in detail, I know it is improving by "policy iteration" mechanism. I found an answer that prove it can finally converge to optimal. But... Is it still correct when using policy iteration with neural network like in AlphaZero?

Besides, we know AlphaZero is learning from very basic as an idiot, and it is improving by data generated by itself. But what if we let AlphaZero fight against chessMaster first? Woud this setting make it improve faster?

asked Mar 22 '21 at 05:17

Junwei Dong

How does AlphaZero guarantee it could make consistent improvement?

0 Answers0