reinforcement learning - Berkeley Pac-Man Project: features divided through by 10 -


i busy coding reinforcement learning agents game pac-man , came across berkeley's cs course's pac-man projects, reinforcement learning section.

for approximate q-learning agent, feature approximation used. simple extractor implemented in this code. curious why, before features returned, scaled down 10? running solution without factor of 10 can notice pac-man worse, why?

after running multiple tests turns out optimal q-value can diverge wildly away. in fact, features can become negative, 1 incline pacman eat pills. stands there , tries run ghosts never tries finish level.

i speculate happens when loses in training, negative reward propagated through system , since potential number of ghosts can greater one, has heavy bearing on weights, causing become negative , system can't "recover" this.

i confirmed adjusting feature extractor scale #-of-ghosts-one-step-away feature , pacman manages better result

in retrospect question more mathsy , might fit better on stackexchange.


Comments

Popular posts from this blog

Perl - how to grep a block of text from a file -

delphi - How to remove all the grips on a coolbar if I have several coolbands? -

javascript - Animating array of divs; only the final element is modified -