Categories
algorithm big-o complexity-theory optimization performance

Big O, how do you calculate/approximate it?

955

Most people with a degree in CS will certainly know what Big O stands for.
It helps us to measure how well an algorithm scales.

But I’m curious, how do you calculate or approximate the complexity of your algorithms?

10

  • 5

    Maybe you don’t actually need to improve your algorithm’s complexity, but you should at least be able to compute it to decide…

    Sep 23, 2008 at 20:58

  • 5

    I found this a very clear explanation of Big O, Big Omega, and Big Theta: xoax.net/comp/sci/algorithms/Lesson6.php

    Dec 3, 2010 at 12:21

  • 36

    -1: Sigh, another abuse of BigOh. BigOh is just an asymptotic upper bound and could be used for anything and is not just CS related. Talking about BigOh as if there is one unique is meaningless (A linear time algorithm is also O(n^2), O(n^3) etc). Saying it helps us measure efficiency is misleading too. Also, what is with the link to the complexity classes? If all you are interested in, is techniques to compute the running times of algorithms, how is that relevant?

    – Aryabhatta

    May 23, 2011 at 17:35


  • 42

    Big-O does not measure efficiency; it measures how well an algorithm scales with size (it could apply to other things than size too but that’s what we likely are interested here) – and that only asymptotically, so if you are out of luck an algorithm with a “smaller” big-O may be slower (if the Big-O applies to cycles) than a different one until you reach extremely large numbers.

    Sep 5, 2011 at 17:05


  • 7

    Choosing an algorithm on the basis of its Big-O complexity is usually an essential part of program design. It is most definitely not a case of ‘premature optimization’, which in any case is a much-abused selective quotation.

    Jan 31, 2016 at 23:52

1533

I’ll do my best to explain it here on simple terms, but be warned that this topic takes my students a couple of months to finally grasp. You can find more information on the Chapter 2 of the Data Structures and Algorithms in Java book.


There is no mechanical procedure that can be used to get the BigOh.

As a “cookbook”, to obtain the BigOh from a piece of code you first need to realize that you are creating a math formula to count how many steps of computations get executed given an input of some size.

The purpose is simple: to compare algorithms from a theoretical point of view, without the need to execute the code. The lesser the number of steps, the faster the algorithm.

For example, let’s say you have this piece of code:

int sum(int* data, int N) {
    int result = 0;               // 1

    for (int i = 0; i < N; i++) { // 2
        result += data[i];        // 3
    }

    return result;                // 4
}

This function returns the sum of all the elements of the array, and we want to create a formula to count the computational complexity of that function:

Number_Of_Steps = f(N)

So we have f(N), a function to count the number of computational steps. The input of the function is the size of the structure to process. It means that this function is called such as:

Number_Of_Steps = f(data.length)

The parameter N takes the data.length value. Now we need the actual definition of the function f(). This is done from the source code, in which each interesting line is numbered from 1 to 4.

There are many ways to calculate the BigOh. From this point forward we are going to assume that every sentence that doesn’t depend on the size of the input data takes a constant C number computational steps.

We are going to add the individual number of steps of the function, and neither the local variable declaration nor the return statement depends on the size of the data array.

That means that lines 1 and 4 takes C amount of steps each, and the function is somewhat like this:

f(N) = C + ??? + C

The next part is to define the value of the for statement. Remember that we are counting the number of computational steps, meaning that the body of the for statement gets executed N times. That’s the same as adding C, N times:

f(N) = C + (C + C + ... + C) + C = C + N * C + C

There is no mechanical rule to count how many times the body of the for gets executed, you need to count it by looking at what does the code do. To simplify the calculations, we are ignoring the variable initialization, condition and increment parts of the for statement.

To get the actual BigOh we need the Asymptotic analysis of the function. This is roughly done like this:

  1. Take away all the constants C.
  2. From f() get the polynomium in its standard form.
  3. Divide the terms of the polynomium and sort them by the rate of growth.
  4. Keep the one that grows bigger when N approaches infinity.

Our f() has two terms:

f(N) = 2 * C * N ^ 0 + 1 * C * N ^ 1

Taking away all the C constants and redundant parts:

f(N) = 1 + N ^ 1

Since the last term is the one which grows bigger when f() approaches infinity (think on limits) this is the BigOh argument, and the sum() function has a BigOh of:

O(N)

There are a few tricks to solve some tricky ones: use summations whenever you can.

As an example, this code can be easily solved using summations:

for (i = 0; i < 2*n; i += 2) {  // 1
    for (j=n; j > i; j--) {     // 2
        foo();                  // 3
    }
}

The first thing you needed to be asked is the order of execution of foo(). While the usual is to be O(1), you need to ask your professors about it. O(1) means (almost, mostly) constant C, independent of the size N.

The for statement on the sentence number one is tricky. While the index ends at 2 * N, the increment is done by two. That means that the first for gets executed only N steps, and we need to divide the count by two.

f(N) = Summation(i from 1 to 2 * N / 2)( ... ) = 
     = Summation(i from 1 to N)( ... )

The sentence number two is even trickier since it depends on the value of i. Take a look: the index i takes the values: 0, 2, 4, 6, 8, …, 2 * N, and the second for get executed: N times the first one, N – 2 the second, N – 4 the third… up to the N / 2 stage, on which the second for never gets executed.

On formula, that means:

f(N) = Summation(i from 1 to N)( Summation(j = ???)(  ) )

Again, we are counting the number of steps. And by definition, every summation should always start at one, and end at a number bigger-or-equal than one.

f(N) = Summation(i from 1 to N)( Summation(j = 1 to (N - (i - 1) * 2)( C ) )

(We are assuming that foo() is O(1) and takes C steps.)

We have a problem here: when i takes the value N / 2 + 1 upwards, the inner Summation ends at a negative number! That’s impossible and wrong. We need to split the summation in two, being the pivotal point the moment i takes N / 2 + 1.

f(N) = Summation(i from 1 to N / 2)( Summation(j = 1 to (N - (i - 1) * 2)) * ( C ) ) + Summation(i from 1 to N / 2) * ( C )

Since the pivotal moment i > N / 2, the inner for won’t get executed, and we are assuming a constant C execution complexity on its body.

Now the summations can be simplified using some identity rules:

  1. Summation(w from 1 to N)( C ) = N * C
  2. Summation(w from 1 to N)( A (+/-) B ) = Summation(w from 1 to N)( A ) (+/-) Summation(w from 1 to N)( B )
  3. Summation(w from 1 to N)( w * C ) = C * Summation(w from 1 to N)( w ) (C is a constant, independent of w)
  4. Summation(w from 1 to N)( w ) = (N * (N + 1)) / 2

Applying some algebra:

f(N) = Summation(i from 1 to N / 2)( (N - (i - 1) * 2) * ( C ) ) + (N / 2)( C )

f(N) = C * Summation(i from 1 to N / 2)( (N - (i - 1) * 2)) + (N / 2)( C )

f(N) = C * (Summation(i from 1 to N / 2)( N ) - Summation(i from 1 to N / 2)( (i - 1) * 2)) + (N / 2)( C )

f(N) = C * (( N ^ 2 / 2 ) - 2 * Summation(i from 1 to N / 2)( i - 1 )) + (N / 2)( C )

=> Summation(i from 1 to N / 2)( i - 1 ) = Summation(i from 1 to N / 2 - 1)( i )

f(N) = C * (( N ^ 2 / 2 ) - 2 * Summation(i from 1 to N / 2 - 1)( i )) + (N / 2)( C )

f(N) = C * (( N ^ 2 / 2 ) - 2 * ( (N / 2 - 1) * (N / 2 - 1 + 1) / 2) ) + (N / 2)( C )

=> (N / 2 - 1) * (N / 2 - 1 + 1) / 2 = 

   (N / 2 - 1) * (N / 2) / 2 = 

   ((N ^ 2 / 4) - (N / 2)) / 2 = 

   (N ^ 2 / 8) - (N / 4)

f(N) = C * (( N ^ 2 / 2 ) - 2 * ( (N ^ 2 / 8) - (N / 4) )) + (N / 2)( C )

f(N) = C * (( N ^ 2 / 2 ) - ( (N ^ 2 / 4) - (N / 2) )) + (N / 2)( C )

f(N) = C * (( N ^ 2 / 2 ) - (N ^ 2 / 4) + (N / 2)) + (N / 2)( C )

f(N) = C * ( N ^ 2 / 4 ) + C * (N / 2) + C * (N / 2)

f(N) = C * ( N ^ 2 / 4 ) + 2 * C * (N / 2)

f(N) = C * ( N ^ 2 / 4 ) + C * N

f(N) = C * 1/4 * N ^ 2 + C * N

And the BigOh is:

O(N²)

8

  • 6

    @arthur That would be O(N^2) because you would require one loop to read through all the columns and one to read all rows of a particular column.

    Nov 14, 2014 at 20:34

  • 1

    @arthur: It depends. It’s O(n) where n is the number of elements, or O(x*y) where x and y are the dimensions of the array. Big-oh is “relative to the input”, so it depends on what is your input.

    Apr 6, 2015 at 22:55


  • 1

    Great answer, but I am really stuck. How does Summation(i from 1 to N / 2)( N ) turns into ( N ^ 2 / 2 ) ?

    – Parsa

    Nov 19, 2015 at 11:35


  • 2

    @ParsaAkbari As a general rule, sum(i from 1 to a) (b) is a * b. This is just another way of saying b+b+…(a times)+b = a * b (by definition for some definitions of integer multiplication).

    Nov 24, 2015 at 14:26


  • 1

    @Franva those are free variables for the “summation identities” (Google term). Check out here for a better formatted math: courses.cs.washington.edu/courses/cse373/19sp/resources/math/…

    – vz0

    Jun 21, 2021 at 9:53

212

Big O gives the upper bound for time complexity of an algorithm. It is usually used in conjunction with processing data sets (lists) but can be used elsewhere.

A few examples of how it’s used in C code.

Say we have an array of n elements

int array[n];

If we wanted to access the first element of the array this would be O(1) since it doesn’t matter how big the array is, it always takes the same constant time to get the first item.

x = array[0];

If we wanted to find a number in the list:

for(int i = 0; i < n; i++){
    if(array[i] == numToFind){ return i; }
}

This would be O(n) since at most we would have to look through the entire list to find our number. The Big-O is still O(n) even though we might find our number the first try and run through the loop once because Big-O describes the upper bound for an algorithm (omega is for lower bound and theta is for tight bound).

When we get to nested loops:

for(int i = 0; i < n; i++){
    for(int j = i; j < n; j++){
        array[j] += 2;
    }
}

This is O(n^2) since for each pass of the outer loop ( O(n) ) we have to go through the entire list again so the n’s multiply leaving us with n squared.

This is barely scratching the surface but when you get to analyzing more complex algorithms complex math involving proofs comes into play. Hope this familiarizes you with the basics at least though.

3

  • Great explanation! So if someone says his algorithm has a O(n^2) complexity, does it mean he will be using nested loops?

    Jul 31, 2009 at 4:29

  • 2

    Not really, any aspect that lead to n squared times will be considered as n^2

    – asyncwait

    Oct 9, 2009 at 11:21

  • 1

    @NavaneethKN: You won’t always see the nested loop, as function calls can do > O(1) work themselves. In the C standard APIs for instance, bsearch is inherently O(log n), strlen is O(n), and qsort is O(n log n) (technically it has no guarantees, and quicksort itself has a worst case complexity of O(n²), but assuming your libc author isn’t a moron, its average case complexity is O(n log n) and it uses a pivot selection strategy that reduces the odds of hitting the O(n²) case). And both bsearch and qsort can be worse if the comparator function is pathological.

    Mar 21, 2019 at 14:36

108

While knowing how to figure out the Big O time for your particular problem is useful, knowing some general cases can go a long way in helping you make decisions in your algorithm.

Here are some of the most common cases, lifted from http://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions:

O(1) – Determining if a number is even or odd; using a constant-size lookup table or hash table

O(logn) – Finding an item in a sorted array with a binary search

O(n) – Finding an item in an unsorted list; adding two n-digit numbers

O(n2) – Multiplying two n-digit numbers by a simple algorithm; adding two n×n matrices; bubble sort or insertion sort

O(n3) – Multiplying two n×n matrices by simple algorithm

O(cn) – Finding the (exact) solution to the traveling salesman problem using dynamic programming; determining if two logical statements are equivalent using brute force

O(n!) – Solving the traveling salesman problem via brute-force search

O(nn) – Often used instead of O(n!) to derive simpler formulas for asymptotic complexity

3

  • Why not use x&1==1 to check for oddness?

    Dec 11, 2017 at 6:15

  • 3

    @SamyBencherif: That would be a typical way to check (actually, just testing x & 1 would be sufficient, no need to check == 1; in C, x&1==1 is evaluated as x&(1==1) thanks to operator precedence, so it’s actually the same as testing x&1). I think you’re misreading the answer though; there is a semi-colon there, not a comma. It’s not saying you’d need a lookup table for even/odd testing, it’s saying both even/odd testing and checking a lookup table are O(1) operations.

    Mar 21, 2019 at 14:42

  • 1

    I don’t know about the claim on usage in the last sentence, but whoever does that is replacing a class by another that is not equivalent. The class O(n!) contains, but is strictly larger than O(n^n). The actual equivalence would be O(n!) = O(n^ne^{-n}sqrt(n)).

    Sep 26, 2019 at 17:52