reinforcement learning - Berkeley Pac-Man Project: features divided through by 10 -


i busy coding reinforcement learning agents game pac-man , came across berkeley's cs course's pac-man projects, reinforcement learning section.

for approximate q-learning agent, feature approximation used. simple extractor implemented in this code. curious why, before features returned, scaled down 10? running solution without factor of 10 can notice pac-man worse, why?

after running multiple tests turns out optimal q-value can diverge wildly away. in fact, features can become negative, 1 incline pacman eat pills. stands there , tries run ghosts never tries finish level.

i speculate happens when loses in training, negative reward propagated through system , since potential number of ghosts can greater one, has heavy bearing on weights, causing become negative , system can't "recover" this.

i confirmed adjusting feature extractor scale #-of-ghosts-one-step-away feature , pacman manages better result

in retrospect question more mathsy , might fit better on stackexchange.


Comments

Popular posts from this blog

Winapi c++: DialogBox hangs when breaking a loop -

vb.net - Font adding using PDFsharp -

javascript - jQuery iScroll clickable list elements while retaining scroll? -