python - Why does re.findall() find more matches than re.sub()? -


consider following:

>>> import re >>> = "first:second" >>> re.findall("[^:]*", a) ['first', '', 'second', ''] >>> re.sub("[^:]*", r"(\g<0>)", a) '(first):(second)' 

re.sub()'s behavior makes more sense initially, can understand re.findall()'s behavior. after all, can match empty string between first , : consists of non-colon characters (exactly 0 of them), why isn't re.sub() behaving same way?

shouldn't result of last command (first)():(second)()?

you use * allows empty matches:

'first'   -> matched ':'       -> not in character class but, pattern can empty due               *, empty string matched -->'' 'second'  -> matched '$'       -> can contain empty string before,              empty string matched -->'' 

quoting documentation re.findall():

empty matches included in result unless touch beginning of match.

the reason don't see empty matches in sub results explained in documentation re.sub():

empty matches pattern replaced when not adjacent previous match.

try this:

re.sub('(?:choucroute garnie)*', '#', 'ornithorynque')  

and this:

print re.sub('(?:nithorynque)*', '#', 'ornithorynque') 

there no consecutive #


Comments

Popular posts from this blog

Perl - how to grep a block of text from a file -

delphi - How to remove all the grips on a coolbar if I have several coolbands? -

javascript - Animating array of divs; only the final element is modified -