Numpy Array Not In

Posted: 2021-06-15

While working on a small task recently I came across an annoying little issue. I wanted to create a set of unique numpy array objects, so I resorted to using a list, and only inserting if the item is not already in the list. (Performance wasn’t key, so it seemed a reasonable start).

So now I needed to evaluate if an array is already in my list, I will happily use the in operator that I am familiar with, but I came across an issue:

>>> import numpy as np
>>> a=np.array([1, 2, 3, 4, 5])
>>> b=a
>>> b is a
True
>>> c = [a]
>>> b in c
True
>>> d=np.array([1, 2, 3, 4, 5])
>>> e = [d]
>>> a in e
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Why does this happen?

As is described in the Python docs:

For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).

Python is lazy, so when x is e returns True, both or and any exit the computation early, never triggering the error hit when calling == between two arrays. This is why it was working ok for the b in c case. I was forgetting about the or x == e part and that for a numpy array this would trigger an error

How to work around it

I needed something very similar, but without the equality check

from typing import Iterable, TypeVar

T = TypeVar('T')


def object_is_in(a: T, l: Iterable[T]) -> bool:
    return any(a is x for x in l)

now I can happily do:

>>> not object_is_in(a, e)
True