P v NP: A Modern Mystery – 30 STEM Links a Week

Programming may seem magical, but at some point almost every programmer is going to hit against one big limitation: time. You can “solve” any problem in the world, but if your code takes years to finish it’s not really a solution anyone can use.

Programmers and computer scientists like to talk about the complexity of programs, both space and time complexity. Space complexity is about how much memory the program uses or, in other words, how much data it needs. We’re not going to talk about this much, instead focusing on time complexity.

We don’t tend to measure time complexity in terms of time but rather in terms of how many steps something takes. Steps are a better measurement since the number of steps-per-second processors can calculate goes up every year, but the steps don’t change over time. Further, we don’t even need to be particularly exact in measuring how many steps the program takes. We just need a rough measurement in terms of the “size” of the problem we’re working on.

For example:

the length of the list we’re sorting
the number of accounts in the database we’re searching
the number of computers in the network when we plan how to efficiently send data over the internet
the number of variables in the system of equations we’re solving

To be specific, let’s consider a list of length n that we need to sort.

If we’re trying to figure out whether or not the sorting algorithm we want to use is feasible, or in other words it can finish in a reasonable time, we don’t actually care whether the number of steps an algorithm takes is 3*n or n + 100 or 20*n. What we care about is how fast the number of steps grows: whether it grows by 3*n or 20*n making the list ten times larger only makes the number of steps grow by a factor of ten. On the other hand, if your algorithm grows like 3*n^2 or 2*n^2 and you make the list ten times longer, it will take one hundred times longer to finish.

This is why computer scientists use what they call the “big-O” notation. We can say O(n^2) to mean that the number of steps grows like n^2, even though it might actually be 5*n^2 + 30 or 2*n^2.

Algorithms that are polynomial in the size of the problem, in other words they grow by something like O(n^3) or O(n^2) or even O(n^10), are algorithms we at least have a fighting chance to run in our lifetimes.

If an algorithm grows more like O(2^n) then we simply can’t run the program for any problem of significant size. Going back to sorting, imagine a pretty slow sorting algorithm like the bubble sort which grows like O(n^2). If sorting a list of 1000 elements with the bubble sort takes five seconds, sorting 2000 elements would take twenty seconds.

What if instead we have an algorithm that grows like O(2^n) and sorting a list of 1000 elements takes five seconds? Then sorting 2000 elements takes 5*2^1000 seconds! This is roughly 10^300 (which is almost a centillion, or 10^303!) seconds, or about 10^280 times longer than the lifespan of the universe to date. We doubled the problem size and it went from taking a few seconds to being absolutely positively impossible to ever run. That’s scary!

All this bring us to the question of two classes of problem: P and NP. P is the set of programs that we know how to solve with polynomial growth or faster. NP is the set of problems that can be solved in polynomial growth or faster, but by a very special machine: a non-deterministic computer.

Non-deterministic computers are hypothetical devices that differ from our normal computers in one important respect: they can try an unlimited number of possibilities at once. They’re imaginary machines that are a lot like multicore processors or graphics card GPUS but taken to an extreme. So, for example, a non-deterministic computer could plan out the fastest delivery route for packages by simultaneously calculating the distances for every possible route over every possible combination of roads and then returning the shortest route.

That problem in particular is called the travelling salesman problem and the fastest known solution using an ordinary computer is about O(2^n) and if you double the number of places you’re trying to deliver you might increase the solution time by billions of years.

This distinction also has a huge impact on cryptography. Many of our techniques for securely encrypting data rely on the fact that it’s really hard to break apart large numbers into their prime factors, in other words the prime numbers that give you back the original number when multiplied. Like splitting 15 into 3 and 5. For really huge numbers this takes a very long time, possibly many years, on an ordinary computer, but would only take seconds on a non-deterministic machine.

So P and NP might seem very different from each other, and most computer scientists think they probably are, but the really strange thing is that no one has been able to show this. In other words, we want to know if every problem solvable in polynomial growth by a non-deterministic computer can be solved in polynomial growth by an ordinary computer! It’s surprising, then, that no one has either been able to prove or disprove this. If it turned out that P and NP are the same set of problems then almost everything about what computers do could change, including having to rebuild cryptography from the ground up.

This is one of the big odd mysteries of computer science. While most people think P and NP aren’t the same, as long as we don’t know there’s still room for some big surprises that could drastically change the world!

Learn More

NP Completeness

NP-complete problems are special problems that, if you could solve them in polynomial time, could be used to prove that P=NP
https://en.wikipedia.org/wiki/NP-completeness

Millenium Problem

Proving or disproving that P and NP are the same is one of the “millennium problems” that can nab you a million dollars to solve
http://www.claymath.org/millennium-problems/