Categories
identity int operators python python-internals

“is” operator behaves unexpectedly with integers

599

Why does the following behave unexpectedly in Python?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?
>>> 257 is 257
True           # Yet the literal numbers compare properly

I am using Python 2.5.2. Trying some different versions of Python, it appears that Python 2.3.3 shows the above behaviour between 99 and 100.

Based on the above, I can hypothesize that Python is internally implemented such that “small” integers are stored in a different way than larger integers and the is operator can tell the difference. Why the leaky abstraction? What is a better way of comparing two arbitrary objects to see whether they are the same when I don’t know in advance whether they are numbers or not?

3

  • 2

    Take a look here > The current implementation keeps an array of integer objects for all > integers between -5 and 256, when you create an int in that range you > actually just get back a reference to the existing object.

    Feb 18, 2016 at 12:46

  • 3

    This is a CPython-specific implementation detail and a undefined behavior, use with cautions

    – ospider

    Jun 5, 2019 at 2:38

  • Does this answer your question? Is there a difference between “==” and “is”?

    Sep 15, 2021 at 12:13

438

Take a look at this:

>>> a = 256
>>> b = 256
>>> id(a)
9987148
>>> id(b)
9987148
>>> a = 257
>>> b = 257
>>> id(a)
11662816
>>> id(b)
11662828

Here’s what I found in the Python 2 documentation, “Plain Integer Objects” (It’s the same for Python 3):

The current implementation keeps an
array of integer objects for all
integers between -5 and 256, when you
create an int in that range you
actually just get back a reference to
the existing object. So it should be
possible to change the value of 1. I
suspect the behaviour of Python in
this case is undefined. 🙂

9

  • 62

    does anyone know how that range (-5, 256) was chosen? i wouldn’t be too surprised if it were (0, 255) or even (-255, 255), but a range of 262 numbers starting at -5 seems surprisingly arbitrary.

    Aug 24, 2017 at 18:58


  • 7

    @WoodrowBarlow: The -5 is just a heuristic to capture common negative placeholders, I think. 0..255 covers arrays of single byte values. It’s 256 that’s mysterious, but I guess it’s for (dis)assembling integers into/from bytes.

    Jan 27, 2018 at 1:56


  • 4

    From what I understand the range was chosen by looking at the commonly used values across multiple projects (and multiple languages).

    Jul 1, 2018 at 19:35

  • 10

    According to reddit.com/r/Python/comments/18leav/…, the range used to be [-5,100]. It was expanded to include the full range of byte values – plus 256, because that’s presumably a common number.

    Jul 7, 2018 at 22:47

  • 1

    The note about changing the value of 1 is obviously a joke, because integers in Python are immutable. That phrase isn’t in the latest documentation.

    Oct 9, 2021 at 1:43

146

Python’s “is” operator behaves unexpectedly with integers?

In summary – let me emphasize: Do not use is to compare integers.

This isn’t behavior you should have any expectations about.

Instead, use == and != to compare for equality and inequality, respectively. For example:

>>> a = 1000
>>> a == 1000       # Test integers like this,
True
>>> a != 5000       # or this!
True
>>> a is 1000       # Don't do this! - Don't use `is` to test integers!!
False

Explanation

To know this, you need to know the following.

First, what does is do? It is a comparison operator. From the documentation:

The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value.

And so the following are equivalent.

>>> a is b
>>> id(a) == id(b)

From the documentation:

id
Return the “identity” of an object. This is an integer (or long
integer) which is guaranteed to be unique and constant for this object
during its lifetime. Two objects with non-overlapping lifetimes may
have the same id() value.

Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for id.

So what is the use-case for is? PEP8 describes:

Comparisons to singletons like None should always be done with is or
is not, never the equality operators.

The Question

You ask, and state, the following question (with code):

Why does the following behave unexpectedly in Python?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result

It is not an expected result. Why is it expected? It only means that the integers valued at 256 referenced by both a and b are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.

But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.

>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?

Looks like we now have two separate instances of integers with the value of 257 in memory. Since integers are immutable, this wastes memory. Let’s hope we’re not wasting a lot of it. We’re probably not. But this behavior is not guaranteed.

>>> 257 is 257
True           # Yet the literal numbers compare properly

Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.

It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.

But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e. ==.

What is does

is checks that the id of two objects are the same. In CPython, the id is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:

>>> a is b

is the same as

>>> id(a) == id(b)

Why would we want to use is then?

This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for None, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check with is, but these are relatively rare. Here’s an example (will work in Python 2 and 3) e.g.

SENTINEL_SINGLETON = object() # this will only be created one time.

def foo(keyword_argument=None):
    if keyword_argument is None:
        print('no argument given to foo')
    bar()
    bar(keyword_argument)
    bar('baz')

def bar(keyword_argument=SENTINEL_SINGLETON):
    # SENTINEL_SINGLETON tells us if we were not passed anything
    # as None is a legitimate potential argument we could get.
    if keyword_argument is SENTINEL_SINGLETON:
        print('no argument given to bar')
    else:
        print('argument to bar: {0}'.format(keyword_argument))

foo()

Which prints:

no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz

And so we see, with is and a sentinel, we are able to differentiate between when bar is called with no arguments and when it is called with None. These are the primary use-cases for is – do not use it to test for equality of integers, strings, tuples, or other things like these.

10

  • “These are the primary use-cases for is – do not use it to test for equality of integers, strings, tuples, or other things like these.” However, i am trying to integrate a simple state machine into my class, and since the states are opaque values whose only observable property is that of being identical or different, it looks quite natural for them to be comparable with is. I plan to use interned strings as states. I would have preferred plain integers, but unfortunately Python cannot intern integers (0 is 0 is an implementation detail).

    – Alexey

    May 3, 2019 at 12:28


  • 2

    @Alexey sounds like you need enums? stackoverflow.com/questions/37601644/…

    May 3, 2019 at 12:47

  • Maybe, thanks, didn’t know of them. This could be an appropriate addition to you answer IMO.

    – Alexey

    May 3, 2019 at 13:05

  • Maybe using a number of dumb objects like the sentinel in your answer would be a more lightweight solution…

    – Alexey

    May 3, 2019 at 13:10

  • 1

    @MarkRansom Don’t use is with floats, even if you’re checking for NaN: float('nan') is float('nan') returns False, for the same reason that int('257') is int('257') does: they’re different instances, even though they are otherwise indistinguishable.

    – Jasmijn

    Jan 16 at 16:35

71

I’m late but, you want some source with your answer? I’ll try and word this in an introductory manner so more folks can follow along.


A good thing about CPython is that you can actually see the source for this. I’m going to use links for the 3.5 release, but finding the corresponding 2.x ones is trivial.

In CPython, the C-API function that handles creating a new int object is PyLong_FromLong(long v). The description for this function is:

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. 🙂

(My italics)

Don’t know about you but I see this and think: Let’s find that array!

If you haven’t fiddled with the C code implementing CPython you should; everything is pretty organized and readable. For our case, we need to look in the Objects subdirectory of the main source code directory tree.

PyLong_FromLong deals with long objects so it shouldn’t be hard to deduce that we need to peek inside longobject.c. After looking inside you might think things are chaotic; they are, but fear not, the function we’re looking for is chilling at line 230 waiting for us to check it out. It’s a smallish function so the main body (excluding declarations) is easily pasted here:

PyObject *
PyLong_FromLong(long ival)
{
    // omitting declarations

    CHECK_SMALL_INT(ival);

    if (ival < 0) {
        /* negate: cant write this as abs_ival = -ival since that
           invokes undefined behaviour when ival is LONG_MIN */
        abs_ival = 0U-(unsigned long)ival;
        sign = -1;
    }
    else {
        abs_ival = (unsigned long)ival;
    }

    /* Fast path for single-digit ints */
    if (!(abs_ival >> PyLong_SHIFT)) {
        v = _PyLong_New(1);
        if (v) {
            Py_SIZE(v) = sign;
            v->ob_digit[0] = Py_SAFE_DOWNCAST(
                abs_ival, unsigned long, digit);
        }
        return (PyObject*)v; 
}

Now, we’re no C master-code-haxxorz but we’re also not dumb, we can see that CHECK_SMALL_INT(ival); peeking at us all seductively; we can understand it has something to do with this. Let’s check it out:

#define CHECK_SMALL_INT(ival) \
    do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
        return get_small_int((sdigit)ival); \
    } while(0)

So it’s a macro that calls function get_small_int if the value ival satisfies the condition:

if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)

So what are NSMALLNEGINTS and NSMALLPOSINTS? Macros! Here they are:

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS           257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS           5
#endif

So our condition is if (-5 <= ival && ival < 257) call get_small_int.

Next let’s look at get_small_int in all its glory (well, we’ll just look at its body because that’s where the interesting things are):

PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);

Okay, declare a PyObject, assert that the previous condition holds and execute the assignment:

v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];

small_ints looks a lot like that array we’ve been searching for, and it is! We could’ve just read the damn documentation and we would’ve know all along!:

/* Small integers are preallocated in this array so that they
   can be shared.
   The integers that are preallocated are those in the range
   -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

So yup, this is our guy. When you want to create a new int in the range [NSMALLNEGINTS, NSMALLPOSINTS) you’ll just get back a reference to an already existing object that has been preallocated.

Since the reference refers to the same object, issuing id() directly or checking for identity with is on it will return exactly the same thing.

But, when are they allocated??

During initialization in _PyLong_Init Python will gladly enter in a for loop to do this for you:

for (ival = -NSMALLNEGINTS; ival <  NSMALLPOSINTS; ival++, v++) {

Check out the source to read the loop body!

I hope my explanation has made you C things clearly now (pun obviously intented).


But, 257 is 257? What’s up?

This is actually easier to explain, and I have attempted to do so already; it’s due to the fact that Python will execute this interactive statement as a single block:

>>> 257 is 257

During complilation of this statement, CPython will see that you have two matching literals and will use the same PyLongObject representing 257. You can see this if you do the compilation yourself and examine its contents:

>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)

When CPython does the operation, it’s now just going to load the exact same object:

>>> import dis
>>> dis.dis(codeObj)
  1           0 LOAD_CONST               0 (257)   # dis
              3 LOAD_CONST               0 (257)   # dis again
              6 COMPARE_OP               8 (is)

So is will return True.

0