reinforcement learning - Berkeley Pac-Man Project: features divided through by 10 -


i busy coding reinforcement learning agents game pac-man , came across berkeley's cs course's pac-man projects, reinforcement learning section.

for approximate q-learning agent, feature approximation used. simple extractor implemented in this code. curious why, before features returned, scaled down 10? running solution without factor of 10 can notice pac-man worse, why?

after running multiple tests turns out optimal q-value can diverge wildly away. in fact, features can become negative, 1 incline pacman eat pills. stands there , tries run ghosts never tries finish level.

i speculate happens when loses in training, negative reward propagated through system , since potential number of ghosts can greater one, has heavy bearing on weights, causing become negative , system can't "recover" this.

i confirmed adjusting feature extractor scale #-of-ghosts-one-step-away feature , pacman manages better result

in retrospect question more mathsy , might fit better on stackexchange.


Comments

Popular posts from this blog

c++ - Function signature as a function template parameter -

algorithm - What are some ways to combine a number of (potentially incompatible) sorted sub-sets of a total set into a (partial) ordering of the total set? -

How to call a javascript function after the page loads with a chrome extension? -