I'm trying to find a Python built-in that will return the first item i in a sequence that given a function f returns i when f(i) == True. I need this about every 300 lines of code or so (that's actually quite a bit of code in Python). The general use-case is running through a list looking for an item matching some criteria and then returning it. This is more commonly dictionary-land (i.e. the items should be stored in a dict keyed by the criteria instead of a list) but that's not always practical/needed.
So here's a quick find function:
def find(f, seq):
"""Return first item in sequence where f(item) == True."""
for item in seq:
if f(item):
return item
To illustrate usage, let's say we had a list of People
(peeps) and one of them has the name Fred
. We want
to get the Fred People instance. I would write this as:
for person in peeps:
if person.name == 'Fred':
fred = person
While I admit this isn't a ton of code and is really very descriptive, I find that quickly nesting code structures like this bother me (especially in Python). Here we have three lines of code and three blocks. I'm unaware of any reason why this might be bad other than that I think it looks ugly, but there it is.. Anyway, this three line nastiness is reduced down to a single line using find:
fred = find(lambda person: person.name == 'Fred', peeps)
I should note that the same can be achieved with the filter or reduce built-ins but they both require full list traversal where find requires traversal only until a True result is encountered. You will also need to have some kind of fallback for when none of the items match.
# using filter:
fred = filter(lambda person: person.name == 'Fred', peeps)[0]
# using reduce:
f = lambda x, person: person.name == 'Fred' and person or x
fred = reduce(f, peeps)
A downside is that using find is less efficient than the traditional triple-nesting traversal approach due to the additional function call overhead of each iteration. However, this pattern seems to show up frequently in areas of code that don't get called very often. If lookup speed is a concern, you will likely have a dict available anyway. This seems to come up in fringe case lookups that are too rare to justify the extra space and insertion time of a dict.
Another critique might be that while the number of lines have been reduced, readability has not increased because of the nested lambda form. My personal opinion is that this is probably bordering on minor-obsfucation but is worth being able to remove the triple nesting structure.
Discuss
been a few years since you wrote this post for sure, but here’s a technique using generator expressions that works pretty well: http://dev.ionous.net/2009/01/python-find-item-in-list.html
— ionous on Friday, January 09, 2009 at 10:16 AM #
I was inspired by ionous’s page to propose the following. First, here’s a faster, two-line replacement for the three-line loop you dislike:
By the way, your version is missing a
break, so it checks the whole sequence (taking time) and keeps storing matches infred(giving you the last matching item).Even better, Python 2.6 lets you simplify to just one line:
There, no more function call overhead!
If you’re not using Python 2.6 yet, and still prefer a function, here’s a drop-in replacement that uses the above technique:
Personally, I prefer the greater flexibility of getting the index of the item, rather than the item itself (as in
list.index()). This makes it easy to modify a list; for example, to delete the found item.This approach also lets you efficiently find the last occurence of a matching item, by searching backwards from the end:
And here’s a corresponding function. (I'm taking the liberty of swapping the arguments to match pythonic built-in functions like
max()andsorted(), rather than functional-style ones likefilter().)An
rindex()function for searching in reverse may be accomplished as above.— Lee on Thursday, July 02, 2009 at 06:40 PM #
Sorry, comments are temporarily closed due to a broken spam filter. bbl.