machine learning - Scikit-learn: BernoulliNB, v0.10 vs v0.13: very different results -
this of follow-up this thread, getting erroneous results gaussiannb classifier, turned out because had scikit-learn v0.10 on linux vm doing experiments on. ended using bernoulli , multinomial nb classifiers instead, when (finally) got scipy installed on macbook, scikit-learn version grabbed 0.13, latest of writing. presented new problem:
- on v0.10, getting on 90% accuracy bernoullinb classifier on 1 of feature sets, notable improvement i've gotten far.
- on v0.13, it's coming in @ 67% using same code
does know changed between versions? had @ repo history didn't see account kind of change in accuracy. since i'm getting results bernoullinb v0.10, i'd use them, i'm hesitant without little more understanding of conflicting results between versions.
i've tried setting (newer) class_prior property didn't change results 0.13.
edit: short of coming worked example (which will, well, work on), 0.13 outcomes heavily biased, not expect bayesian classifier, , leads me believe may have been regression on class prior calculations, though haven't tracked down yet. example:
0.10: t\p f m f 120 18 m 19 175 0.13: t\p f m f 119 19 m 59 135
edit 2:
i worked through few examples hand. 0.13 version correct , 0.10 version not, both suspected , feared. error in 0.10 appears in class prior calculation. _count
function bugged, on this line of file, class counts wrong: compare the 0.13 branch, ignoring 2 branches pull in smoothing factors @ different places.
i have think more, why botched feature counts resulting in such performance on data, , i'm still little unsure why setting class priors didn't work. perhaps penalizing against male bias present in source documents?
edit 3:
i believe doing. _count
function, , consequently calculation of feature priors within fit
, not take parameter effect, while class_priors taken account within predict
, not used build model during training. not sure if intentional - want ignore priors used build model @ testing time?
to sum results, bug in 0.10 version of bernoullinb classifier, skewing class counts when calculating feature priors, , apparently biasing resulting model yield superior results. managed adapt pieces of doing , got equivalent performance (correct) multinomialnb in version 0.13.
Comments
Post a Comment