Categories

# Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?

It is my understanding that the `range()` function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.

This being the case, I would have expected the following line to take an inordinate amount of time because, in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:

``````1_000_000_000_000_000 in range(1_000_000_000_000_001)
``````

Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).

I have also tried things like this, but the calculation is still almost instant:

``````# count by tens
1_000_000_000_000_000_000_000 in range(0,1_000_000_000_000_000_000_001,10)
``````

If I try to implement my own range function, the result is not so nice!

``````def my_crappy_range(N):
i = 0
while i < N:
yield i
i += 1
return
``````

What is the `range()` object doing under the hood that makes it so fast?

Martijn Pieters’s answer was chosen for its completeness, but also see abarnert’s first answer for a good discussion of what it means for `range` to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for `__contains__` function optimization across Python implementations. abarnert’s other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of `xrange` in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.

• Note that this is the case only if the item we are checking is a `bool` or `long` type, with other object types it will go crazy. Try with: `100000000000000.0 in range(1000000000000001)`

May 6, 2015 at 15:44

• One last thing: Does Python 3 actually guarantee this behavior? I know every version of CPython at least 3.1+ and PyPy3 from the first beta on provided it, but I think it would be perfectly valid if, say, IronPython 3.4 came out tomorrow and had an O(N) `__contains__` method.

May 6, 2015 at 16:19

• @AshwiniChaudhary isn’t Python2 `xrange` the same as Python3 `range`?

May 6, 2015 at 20:16

• @Superbest `xrange()` objects have no `__contains__` method, so the item check has to loop through all the items. Plus there are few other changes in `range()`, like it supports slicing(which again returns a `range` object) and now also has `count` and `index` methods to make it compatible with `collections.Sequence` ABC.

May 6, 2015 at 21:42

• – jfs

Feb 23, 2018 at 8:49

The Python 3 `range()` object doesn’t produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.

The object also implements the `object.__contains__` hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.

The advantage of the `range` type over a regular `list` or `tuple` is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the `start`, `stop` and `step` values, calculating individual items and subranges as needed).

So at a minimum, your `range()` object would do:

``````class my_range:
def __init__(self, start, stop=None, step=1, /):
if stop is None:
start, stop = 0, start
self.start, self.stop, self.step = start, stop, step
if step < 0:
lo, hi, step = stop, start, -step
else:
lo, hi = start, stop
self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

def __iter__(self):
current = self.start
if self.step < 0:
while current > self.stop:
yield current
current += self.step
else:
while current < self.stop:
yield current
current += self.step

def __len__(self):
return self.length

def __getitem__(self, i):
if i < 0:
i += self.length
if 0 <= i < self.length:
return self.start + i * self.step
raise IndexError('my_range object index out of range')

def __contains__(self, num):
if self.step < 0:
if not (self.stop < num <= self.start):
return False
else:
if not (self.start <= num < self.stop):
return False
return (num - self.start) % self.step == 0
``````

This is still missing several things that a real `range()` supports (such as the `.index()` or `.count()` methods, hashing, equality testing, or slicing), but should give you an idea.

I also simplified the `__contains__` implementation to only focus on integer tests; if you give a real `range()` object a non-integer value (including subclasses of `int`), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.

* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it’s all executed in optimised C code and Python stores integer values in 30-bit chunks, you’d run out of memory before you saw any performance impact due to the size of the integers involved here.

• Fun fact: because you have a working implementation of `__getitem__` and `__len__`, the `__iter__` implementation is actually unnecessary.

May 6, 2015 at 20:55

• @Lucretiel: In Python 2.3, a special `xrangeiterator` was added specifically because that wasn’t fast enough. And then somewhere in 3.x (I’m not sure if it was 3.0 or 3.2) it was tossed and they use the same `listiterator` type that `list` uses.

May 6, 2015 at 22:01

• I would define the constructor as `def __init__(self, *start_stop_step)` and parse it out from there; the way the arguments are labelled now are now are kind of confusing. Nevertheless, +1; you still definitely explained the behavior.

May 8, 2015 at 15:15

• @CodyPiersall: Actually, here’s a quote from Guido the `argclinic` discussion, when Nick Coghlan came up with a way to allow defining `range` unambiguously: “Please don’t make it easier for people to copy my worst design decision.” So, I’m pretty sure he agrees that `range` is confusing as written.

May 16, 2015 at 9:25

• @KarlKnechtel you can’t predict how other types behave, full stop. There is no guarantee that range was passed an actual numeric type. It is not enough to just convert the argument to `int` because why bother with a custom type then? It is up to the developer to make the call on whether or not to use `int(custom_type) in range(....)`.

Jul 13, 2019 at 14:12

The fundamental misunderstanding here is in thinking that `range` is a generator. It’s not. In fact, it’s not any kind of iterator.

You can tell this pretty easily:

``````>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]
``````

If it were a generator, iterating it once would exhaust it:

``````>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]
``````

What `range` actually is, is a sequence, just like a list. You can even test this:

``````>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True
``````

This means it has to follow all the rules of being a sequence:

``````>>> a         # indexable
3
>>> len(a)       # sized
5
>>> 3 in a       # membership
True
>>> reversed(a)  # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3)   # implements 'index'
3
>>> a.count(3)   # implements 'count'
1
``````

The difference between a `range` and a `list` is that a `range` is a lazy or dynamic sequence; it doesn’t remember all of its values, it just remembers its `start`, `stop`, and `step`, and creates the values on demand on `__getitem__`.

(As a side note, if you `print(iter(a))`, you’ll notice that `range` uses the same `listiterator` type as `list`. How does that work? A `listiterator` doesn’t use anything special about `list` except for the fact that it provides a C implementation of `__getitem__`, so it works fine for `range` too.)

Now, there’s nothing that says that `Sequence.__contains__` has to be constant time—in fact, for obvious examples of sequences like `list`, it isn’t. But there’s nothing that says it can’t be. And it’s easier to implement `range.__contains__` to just check it mathematically (`(val - start) % step`, but with some extra complexity to deal with negative steps) than to actually generate and test all the values, so why shouldn’t it do it the better way?

But there doesn’t seem to be anything in the language that guarantees this will happen. As Ashwini Chaudhari points out, if you give it a non-integral value, instead of converting to integer and doing the mathematical test, it will fall back to iterating all the values and comparing them one by one. And just because CPython 3.2+ and PyPy 3.x versions happen to contain this optimization, and it’s an obvious good idea and easy to do, there’s no reason that IronPython or NewKickAssPython 3.x couldn’t leave it out. (And in fact, CPython 3.0-3.1 didn’t include it.)

If `range` actually were a generator, like `my_crappy_range`, then it wouldn’t make sense to test `__contains__` this way, or at least the way it makes sense wouldn’t be obvious. If you’d already iterated the first 3 values, is `1` still `in` the generator? Should testing for `1` cause it to iterate and consume all the values up to `1` (or up to the first value `>= 1`)?

• This is a pretty important thing to get straight. I suppose the differences between Python 2 and 3 may have lead to my confusion on this point. In any case, I should have realized since `range` is listed (along with `list` and `tuple`) as a sequence type.

May 6, 2015 at 16:05

• @RickTeachey: Actually, in 2.6+ (I think; maybe 2.5+), `xrange` is a sequence too. See 2.7 docs. In fact, it was always an almost-sequence.

May 6, 2015 at 16:17

• @RickTeachey: Actually, I was wrong; in 2.6-2.7 (and 3.0-3.1), it claims to be a sequence, but it’s still just an almost-sequence. See my other answer.

May 6, 2015 at 22:03

• It’s not an iterator, it’s a sequence (Iterable in terms of Java, IEnumerable of C#) – something with an `.__iter__()` method that will return an iterator. It in its turn can be used only once.

Jun 18, 2016 at 19:40

• @ThomasAhle: Because `range` isn’t checking types when it’s not an integer, since it’s always possible a type has a `__eq__` that is compatible with `int`. Sure, `str` obviously won’t work, but they didn’t want to slow things down by explicitly checking all the types that can’t be in there (and after all, a `str` subclass could override `__eq__` and be contained in the `range`).

Oct 17, 2016 at 12:47

Use the source, Luke!

In CPython, `range(...).__contains__` (a method wrapper) will eventually delegate to a simple calculation which checks if the value can possibly be in the range. The reason for the speed here is we’re using mathematical reasoning about the bounds, rather than a direct iteration of the range object. To explain the logic used:

1. Check that the number is between `start` and `stop`, and
2. Check that the stride value doesn’t “step over” our number.

For example, `994` is in `range(4, 1000, 2)` because:

1. `4 <= 994 < 1000`, and
2. `(994 - 4) % 2 == 0`.

The full C code is included below, which is a bit more verbose because of memory management and reference counting details, but the basic idea is there:

``````static int
range_contains_long(rangeobject *r, PyObject *ob)
{
int cmp1, cmp2, cmp3;
PyObject *tmp1 = NULL;
PyObject *tmp2 = NULL;
PyObject *zero = NULL;
int result = -1;

zero = PyLong_FromLong(0);
if (zero == NULL) /* MemoryError in int(0) */
goto end;

/* Check if the value can possibly be in the range. */

cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
if (cmp1 == -1)
goto end;
if (cmp1 == 1) { /* positive steps: start <= ob < stop */
cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
}
else { /* negative steps: stop < ob <= start */
cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
}

if (cmp2 == -1 || cmp3 == -1) /* TypeError */
goto end;
if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
result = 0;
goto end;
}

/* Check that the stride does not invalidate ob's membership. */
tmp1 = PyNumber_Subtract(ob, r->start);
if (tmp1 == NULL)
goto end;
tmp2 = PyNumber_Remainder(tmp1, r->step);
if (tmp2 == NULL)
goto end;
/* result = ((int(ob) - start) % step) == 0 */
result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
end:
Py_XDECREF(tmp1);
Py_XDECREF(tmp2);
Py_XDECREF(zero);
return result;
}

static int
range_contains(rangeobject *r, PyObject *ob)
{
if (PyLong_CheckExact(ob) || PyBool_Check(ob))
return range_contains_long(r, ob);

return (int)_PySequence_IterSearch((PyObject*)r, ob,
PY_ITERSEARCH_CONTAINS);
}
``````

The “meat” of the idea is mentioned in the line:

``````/* result = ((int(ob) - start) % step) == 0 */
``````

As a final note – look at the `range_contains` function at the bottom of the code snippet. If the exact type check fails then we don’t use the clever algorithm described, instead falling back to a dumb iteration search of the range using `_PySequence_IterSearch`! You can check this behaviour in the interpreter (I’m using v3.5.0 here):

``````>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
...     pass
...
>>> x_ = MyInt(x)
>>> x in r  # calculates immediately :)
True
>>> x_ in r  # iterates for ages.. :(
^\Quit (core dumped)
``````