reinforcement learning - Berkeley Pac-Man Project: features divided through by 10 -
i busy coding reinforcement learning agents game pac-man , came across berkeley's cs course's pac-man projects, reinforcement learning section.
for approximate q-learning agent, feature approximation used. simple extractor implemented in this code. curious why, before features returned, scaled down 10? running solution without factor of 10 can notice pac-man worse, why?
after running multiple tests turns out optimal q-value can diverge wildly away. in fact, features can become negative, 1 incline pacman eat pills. stands there , tries run ghosts never tries finish level.
i speculate happens when loses in training, negative reward propagated through system , since potential number of ghosts can greater one, has heavy bearing on weights, causing become negative , system can't "recover" this.
i confirmed adjusting feature extractor scale #-of-ghosts-one-step-away
feature , pacman manages better result
in retrospect question more mathsy , might fit better on stackexchange.
Comments
Post a Comment