ncg-l
developer field notes
Often in my job, I receive a list that is supposed to contain one item. My code makes this assumption, the business logic makes this assumption, if there is not one item then it indicates a big problem. I assert this limitation in my code so we know of it, and fail on it, immediately.
When do I find this useful?
- asserting a query has only retrieved one item
- only run logic if multiple functions agree/return the same value, i.e.
try: [agreement] = set(choice1(), choice2(), choice3()) except ValueError: ...
- simply unpack things that are lists of one item (you don’t always make the data model)
- for me,
my_list[0]
is reserved for explicitly grabbing the first item of a list, and I only use it when I’ve sorted my list by some criteria.my_list[0]
does not serve to indicate the list contains one item.
- for me,
Options
The common way to determine the length of an in-memory typing.Sequence
in Python is len(my_sequence)
. This is one of the first built-in functions you learn and it has received years of attention and optimization which makes it much better than something you could roll yourself. But it has one main drawback: you can’t len(some_generator)
.
I’d like to go over some different patterns for catching this case in Python and discuss their tradeoffs.
Option 1: Our starting point
This option reads the entire generator into a list (storing it in memory) and then uses the already discussed built-in function to check its length. I do not like this option.
def option_1(some_iterator):
as_list = list(some_iterator)
len_ = len(as_list)
if len_ == 0:
raise ExpectedOneItem(f"{some_iterator} contained 0 elements, expected 1!")
elif len_ > 1:
raise ExpectedOneItem(f"{some_iterator} contained {len_} elements, expected 1!")
else:
return as_list[0]
Option 2a:
This approach is my preference, as it allows you to both unpack the iterable and name the internal variable in one line. If the provided argument is a generator, it will only exhaust 2 iterables before raising the error - i.e. it doesn’t dump the entire generator into memory and then raise the error.
def option_2a(some_iterator):
try:
[singular_item] = some_iterator
except ValueError as error:
if "too many" in str(error):
raise ExpectedOneItem(f"{some_iterator} contained more than 1 element!")
elif "not enough" in str(error):
raise ExpectedOneItem(f"{some_iterator} did not contain a single element!")
else:
raise error
else:
return singular_item
Option 2b:
This approach is slightly different in that, if provided a generator, it will exhaust all iterations, and assign them to the _ variable. In some cases, this could be better as it allows you to provide more context in the error message about what was additionally returned.
def option_2b(some_iterator):
try:
singular_item, *_ = some_iterator
except ValueError:
raise ExpectedOneItem(f"{some_iterator} did not contain a single element!")
else:
if _:
raise ExpectedOneItem(f"{some_iterator} contained {len(_) + 1} elements, expected 1!")
else:
return singular_item
I find that Option 2a serves most of my use-cases. It’s safer and more pleasing to read than my_list[0]
, and additionally more memory efficient than list(my_generator)[0]
. Give it a try.