Numpy Array Not In
While working on a small task recently I came across an annoying little issue. I wanted to create a set of unique numpy array objects, so I resorted to using a list, and only inserting if the item is not already in the list. (Performance wasn’t key, so it seemed a reasonable start).
So now I needed to evaluate if an array is already in my list, I will happily use the
in
operator that I am familiar with, but I came across an issue:
>>> import numpy as np
>>> a=np.array([1, 2, 3, 4, 5])
>>> b=a
>>> b is a
True
>>> c = [a]
>>> b in c
True
>>> d=np.array([1, 2, 3, 4, 5])
>>> e = [d]
>>> a in e
Traceback (most recent call last):
File "<input>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Why does this happen?
As is described in the Python docs:
For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).
Python is lazy, so when x is e
returns True
, both or
and any
exit the
computation early, never triggering the error hit when calling ==
between two arrays.
This is why it was working ok for the b in c
case. I was forgetting about the
or x == e
part and that for a numpy array this would trigger an error
How to work around it
I needed something very similar, but without the equality check
from typing import Iterable, TypeVar
T = TypeVar('T')
def object_is_in(a: T, l: Iterable[T]) -> bool:
return any(a is x for x in l)
now I can happily do:
>>> not object_is_in(a, e)
True