python - Why does re.findall() find more matches than re.sub()? -
consider following:
>>> import re >>> = "first:second" >>> re.findall("[^:]*", a) ['first', '', 'second', ''] >>> re.sub("[^:]*", r"(\g<0>)", a) '(first):(second)'
re.sub()
's behavior makes more sense initially, can understand re.findall()
's behavior. after all, can match empty string between first
, :
consists of non-colon characters (exactly 0 of them), why isn't re.sub()
behaving same way?
shouldn't result of last command (first)():(second)()
?
you use * allows empty matches:
'first' -> matched ':' -> not in character class but, pattern can empty due *, empty string matched -->'' 'second' -> matched '$' -> can contain empty string before, empty string matched -->''
quoting documentation re.findall()
:
empty matches included in result unless touch beginning of match.
the reason don't see empty matches in sub results explained in documentation re.sub()
:
empty matches pattern replaced when not adjacent previous match.
try this:
re.sub('(?:choucroute garnie)*', '#', 'ornithorynque')
and this:
print re.sub('(?:nithorynque)*', '#', 'ornithorynque')
there no consecutive #
Comments
Post a Comment