python - Why does re.findall() find more matches than re.sub()? -
consider following:
>>> import re >>> = "first:second" >>> re.findall("[^:]*", a) ['first', '', 'second', ''] >>> re.sub("[^:]*", r"(\g<0>)", a) '(first):(second)' re.sub()'s behavior makes more sense initially, can understand re.findall()'s behavior. after all, can match empty string between first , : consists of non-colon characters (exactly 0 of them), why isn't re.sub() behaving same way?
shouldn't result of last command (first)():(second)()?
you use * allows empty matches:
'first' -> matched ':' -> not in character class but, pattern can empty due *, empty string matched -->'' 'second' -> matched '$' -> can contain empty string before, empty string matched -->'' quoting documentation re.findall():
empty matches included in result unless touch beginning of match.
the reason don't see empty matches in sub results explained in documentation re.sub():
empty matches pattern replaced when not adjacent previous match.
try this:
re.sub('(?:choucroute garnie)*', '#', 'ornithorynque') and this:
print re.sub('(?:nithorynque)*', '#', 'ornithorynque') there no consecutive #
Comments
Post a Comment