Cory Althoff

The Self-Taught Computer Scientist


Скачать книгу

= 1 + 2 × 5. Computer scientists call the variable n in an equation that describes the number of steps in an algorithm the size of the problem. In this case, you could say the time it takes to solve a problem of size n is 1 + 2n, or in mathematical notation, T(n) = 1 + 2n.

      An equation describing the number of steps in an algorithm is not very helpful, however, because, among other things, you can't always reliably count the number of steps in an algorithm. For example, if an algorithm has many conditional statements, you have no way of knowing which of them will execute in advance. The good news is, as a computer scientist, you don't care about the exact number of steps in an algorithm. What you want to know is how an algorithm performs as n gets bigger. Most algorithms perform fine on a small data set but may be a disaster with larger data sets. Even the most inefficient algorithm will perform well if n is 1. In the real world, however, n will probably not be 1. It may be several hundred thousand, a million, or more.

      The important thing to know about an algorithm is not the exact number of steps it will take but rather an approximation of the number of steps it will take as n gets bigger. As n gets larger, one part of the equation will overshadow the rest of the equation to the point that everything else becomes irrelevant. Take a look at this Python code:

      What part of this program is most important for determining how many steps your algorithm takes to complete? You may think both parts of the function (the first loop and the second loop containing other loops) are important. After all, if n is 10,000, your computer will print many numbers in both loops.

      It turns out that the following code is irrelevant when you are talking about your algorithm's efficiency:

      # loop 1 for i in range(n): print(i)

      To understand why, you need to look at what happens as n gets bigger.

      Here is the equation for the number of steps in your algorithm:

      T(n) = n + n**3

      When you have two nested for loops that take n steps, it translates to n**2 (n to the second power) because if n is 10, you have to do 10 steps twice, or 10**2. Three nested for loops are always n**3 for the same reason. In this equation, when n is 10, the first loop in your program takes 10 steps, and the second loop takes 103 steps, which is 1,000. When n is 1,000, the first loop takes 1,000 steps, and the second loop takes 1,0003, which is 1 billion.

      See what happened? As n gets bigger, the second part of your algorithm grows so much more quickly that the first part becomes irrelevant. For example, if you needed this program to work for 100,000,000 database records, you wouldn't care about how many steps the first part of the equation takes because the second part will take exponentially more steps. With 100,000,000 records, the second part of the algorithm would take more than a septillion steps, which is 1 followed by 24 zeros, so it is not a reasonable algorithm to use. The first 100,000,000 steps aren't relevant to your decision.

      Because the important part of an algorithm is the part that grows the fastest as n gets bigger, computer scientists use big O notation to express an algorithm's efficiency instead of a T(n) equation. Big O notation is a mathematical notation that describes how an algorithm's time or space requirements (you will learn about space requirements later) increase as the size of n increases.

       Constant time

       Logarithmic time

       Linear time

       Log-linear time

       Quadratic time

       Cubic time

       Exponential time

      Each order of magnitude describes an algorithm's time complexity. Time complexity is the maximum number of steps an algorithm takes to complete as n gets larger.

      Let's take a look at each order of magnitude.

      The most efficient order of magnitude is called constant time complexity. An algorithm runs in constant time when it requires the same number of steps regardless of the problem's size. The big O notation for constant complexity is O(1).

      Say you run an online bookstore and give a free book to your first customer each day. You store your customers in a list called customers . Your algorithm might look like this:

      free_books = customers[0]

      Your T(n) equation looks like this:

      T(n) = 1

      Your algorithm requires one step no matter how many customers you have. If you have 1,000 customers, your algorithm takes one step. If you have 10,000 customers, your algorithm takes one step, and if you have a trillion customers, your algorithm takes only one step.

      As you can see, the number of steps your algorithm takes to complete does not get larger as the problem's size increases. Therefore, it is the most efficient algorithm you can write because your algorithm's run time does not change as your data sets grow larger.

Graph depicts constant complexity

      The required number of steps grows more slowly in a logarithmic algorithm as the data set gets larger.

Graph depicts logarithmic complexity