python - Why does re.findall() find more matches than re.sub()? -

- January 15, 2010

consider following:

>>> import re >>> = "first:second" >>> re.findall("[^:]*", a) ['first', '', 'second', ''] >>> re.sub("[^:]*", r"(\g<0>)", a) '(first):(second)'

re.sub()'s behavior makes more sense initially, can understand re.findall()'s behavior. after all, can match empty string between first , : consists of non-colon characters (exactly 0 of them), why isn't re.sub() behaving same way?

shouldn't result of last command (first)():(second)()?

you use * allows empty matches:

'first'   -> matched ':'       -> not in character class but, pattern can empty due               *, empty string matched -->'' 'second'  -> matched '$'       -> can contain empty string before,              empty string matched -->''

quoting documentation re.findall():

empty matches included in result unless touch beginning of match.

the reason don't see empty matches in sub results explained in documentation re.sub():

empty matches pattern replaced when not adjacent previous match.

try this:

re.sub('(?:choucroute garnie)*', '#', 'ornithorynque')

and this:

print re.sub('(?:nithorynque)*', '#', 'ornithorynque')

there no consecutive #

Search This Blog

HPH

python - Why does re.findall() find more matches than re.sub()? -

Comments

Post a Comment

Popular posts from this blog

c++ - Function signature as a function template parameter -

algorithm - What are some ways to combine a number of (potentially incompatible) sorted sub-sets of a total set into a (partial) ordering of the total set? -

How to call a javascript function after the page loads with a chrome extension? -