amccormack.net

Things I've learned and suspect I'll forget.

Spot The Python Bug 2016-03-19

Lets play a game

Lets suppose you have the following simple program:

from __future__ import print_function
import collections

Color = collections.namedtuple('Color',['name', 'hex_value'])

class User(object):
    def __init__(self):
        self.pallet = []
        self.pallet.append(Color('red','#ff0000'))
        self.pallet.append(Color('green','#00ff00'))
        self.pallet.append(Color('blue','#0000ff'))

    def has_color(self, color):
        find_color = filter(lambda x: x.name == color.name, [ color for color in self.pallet])
        found_colors = [x for x in find_color]
        if len(found_colors) > 0:
            return True
        return False

if __name__ == '__main__':
    user = User()
    print('User has red?', user.has_color(Color('red','#ff0000')) )
    print('User has white?', user.has_color(Color('white','#ffffff')) )

What is the value of the two print lines? From looking just at the init function and the print statements, we would expect an output of:

User has red? True
User has white? False

So is that what we get? It depends on which version of python you use, as you can see here:

$ python2 example1.py 
User has red? True
User has white? True

$ python3 example1.py 
User has red? True
User has white? False

List Comprehension Leakage

When using python2, the problem with our code is in the has_color method, and is the line:

find_color = filter(lambda x: x.name == color.name, [ color for color in self.pallet])

In python2, the expression [x for x in iterable] does not limit the scope of x to the list comprehension. So [color for color in self.pallet] will modify the argument color that was supplied to the method.

When I finally traced down a bug resulting from similar code, I couldn't believe it. It certainly is not very pythonic to have behavior like this. The form of the list comprehension implies a limited scope, and the benefit of being able to grab the last value from the iteration outweighs the risk of accidentaly trashing a local variable.

It turns out I was right to suspect this behavior, as many in the python community didn't like it either. In a blog post in 2010, Guido van Rossum, discussing this leak says:

This was an artifact of the original implementation of list comprehensions; it was one of Python's "dirty little secrets" for years. It started out as an intentional compromise to make list comprehensions blindingly fast, and while it was not a common pitfall for beginners, it definitely stung people occasionally. [...]

However, in Python 3, we decided to fix the "dirty little secret" of list comprehensions by using the same implementation strategy as for generator expressions. Thus, in Python 3, the above example (after modification to use print(x) :-) will print 'before', proving that the 'x' in the list comprehension temporarily shadows but does not override the 'x' in the surrounding scope.

Suggestion For List Comprehension with Python2

In order to avoid running into this mistake, I would suggest preceeding a variable in a list comprehension with tmp_. Thus our line above would become:

find_color = filter(lambda x: x.name == color.name, [ tmp_color for tmp_color in self.pallet])

This won't help you if you happen to already have a variable called tmp_something, but, chances are someone will ask you why you always preceed the variable with tmp_ and you'll get an opportunity to tell thim about this little caveat before it bites them.

published on 2016-03-19 by alex