Can you really think rationally?

HAL9000

We know that people are not rational. Humans have emotions and limited little brains. But, is it only a technological challenge that is preventing us to build a completely rational machine? I think it is not just a technological issue and that it is in the very nature of thinking that we need irrationality:

When a system achieves 100% rationality its output becomes practically useless.

Imagine a supercomputer so powerful that it would make a current TOP500 list look like a toys department. The supercomputer understands written language and has access to all of world knowledge. It can make deductions of unlimited depth from known facts. For any given question “Is X true?” it will give the definitive answer “it follows from Y which follows from Z and Q.” It may also say “I can’t conclude based on the known facts but please investigate S, I will know after that input”. Intelligence that is so advanced that it is never insecure and never makes errors is such a powerful image that it is exploited in a lot of science fiction. A nice example is “HAL 9000”, the spaceship computer from the movie 2001: A Space Odyssey. To put it in HAL’s own words: “The 9000 series is the most reliable computer ever made. No 9000 computer has ever made a mistake or distorted information. We are all, by any practical definition of the words, foolproof and incapable of error.

Is such a computer really possible? While reading a book Artificial Intelligence: A Modern Approach[ref]http://www.amazon.com/dp/0136042597[/ref] I noticed that chapter after chapter different fields of AI are having the same issue: combinatorial explosion. A classic example is the game of chess. Chess programs are programmed to understand the facts (game rules) and state of the world (current situation on the board). Our supercomputer should be able to “solve the game”, or in other words, to play perfectly every move. However, in practice, strange thing happens. Every future move you need to plan multiplies number of game combinations. If you want to perfectly plan 50 moves ahead, you have to examine around 1046 combinations[ref]http://en.wikipedia.org/wiki/Shannon_number[/ref]. Even a simple game of chess is unsolvable by contemporary supercomputers and will be for many years to come.

Of course, the real world has many orders of magnitude more rules than any board game. However, that is not the real problem. We can efficiently index and search any number of facts and data. Actually Google, Wolfram Alpha or Siri are already answering simple trivia questions. The problem is that if you try to do a valid reasoning over a database of facts, you also get the combinatorial explosion. Flight of a butterfly can cause the formation of a hurricane a few months after[ref]http://en.wikipedia.org/wiki/Butterfly_effect[/ref]. And it is not only about weather. How many factors influence probability of a startup success or a presidential reelection? How many of those factors are chained, meaning that the outcome of one decision changes the decisions that can be made after that? The point I am trying to make is that if you have:

  1. a lot of decision options, and
  2. an outcome of one decision influences decisions that can be taken after that,

then you have the same combinatorial explosion like in the game of chess. Any real world planning activity or reasoning in a series of steps has potential for combinatorial explosion.

How do humans cope with that? Our brains use marvelous invention of generalization and intuition. Generalization allows us to simplify the outside world by presuming all instances of the same class have the same properties or behavior. Intuition allows us to feel something is right or wrong without really thinking about it. Generalization reduces the number of facts we need to remember. Intuition reduces the number of combinations, as we only explore paths that we feel are going in the right direction. Although that makes us much more efficient than computers, it is important to notice that:

Generalization and intuition are both completely irrational.

Generalization example: most of us have a generalization that life expectancy is better in the rich countries than in the poor countries. We all learned in the school that rich countries have better food options and better medical care. Did you know that life expectancy at birth is actually better in Costa Rica than in the USA[ref]http://en.wikipedia.org/w/index.php?title=List_of_countries_by_life_expectancy&oldid=542690698[/ref], although Costa Rica GDP per capita is 4 times smaller?[ref]http://en.wikipedia.org/w/index.php?title=List_of_countries_by_GDP_(PPP)_per_capita&oldid=542608974[/ref]

Intuition example: people intuitively feel that heavier object will fall faster than lighter objects. It is so intuitive that nobody bothered to experiment for 19 centuries between Aristotle and Galileo Galilei. Of course, it is wrong, and interestingly most of undergraduates still fall for the same trick[ref]http://www.newyorker.com/online/blogs/frontal-cortex/2012/06/brain-experiments-why-we-dont-believe-science.html[/ref].

However, that irrationality enables us to avoid analysis paralysis. If you start your day thinking what can happen and how to react to that, you would never exit your house. “Do I need an umbrella or not? Will I need warmer clothes in the evening? Let’s check the weather forecast. No, lets check hourly forecast from two websites and compare, that is safer.” When you go out, are you sure you didn’t leave your iron on? You get the point. Usually, I just look through the window and if it seems like a nice sunny day I put the short pants on. Generalization (its sunny) and intuition (it is similar to yesterday). I get out, realize it is freaking cold, and feel like an idiot because I need to go back to change the clothes.

The same thing applies to any thinking process. Mathematicians try to prove theorems using strategies they “feel” are going to produce the solution. Engineers solve problems comparing to the similar problems in the past. That is called heuristic. To prove how heuristic is efficient, let’s explore the fascinating case of two chess systems.

First is the famous Deep Blue chess computer. Developed by IBM with a lot of fanfare and money, it was a big black box with 30 state-of-the-art processors and 480 special purpose chess chips. That monster was able to evaluate 200 million positions per second, which was enough to defeat Garry Kasparov in the historic 1997[ref]http://en.wikipedia.org/wiki/Deep_Blue_versus_Garry_Kasparov[/ref] match. That seemed really impressive. IBM was satisfied with media coverage but decided to pull the plug on further financing of such expensive machines.

Second is the Deep Fritz (don’t know why, but people in the chess community love the word “Deep”). Developed by two programmers, it is actually not a full computer. It is a downloadable program you can run on your home PC. It doesn’t require multiple processors or special purpose chess chips. However, in the November 2006 Deep Fritz defeated world champion Vladimir Kramnik with the score 4–2. Running on a PC that would today be your granny’s computer. Because Deep Fritz runs on a commodity hardware, it was able to analyze only 8 million positions per second – which is 25 times less than Deep Blue. It gets even better, Deep Blue prototype lost a direct match in 1995 from Deep Fritz running on a 90MhZ Pentium[ref]http://www.grappa.univ-lille3.fr/icga/tournament.php?id=29[/ref]! Only after IBM seriously upgraded Deep Blue with hundreds of processors was it able to make the history and defeat the human grandmaster.

As a programmer I get angry with that course of history. If IBM gave half of that money to Deep Fritz team they would probably do the better job. But it would be a bad public relations to show some other programmers are better than IBM ones, wouldn’t it? The question is why was Deep Fritz better? Deep Fritz was better because of heuristics.

Chess programs reduce the number of paths needed to explore using evaluation function. That is a fast method of determining how good is the situation on the board. Very simple evaluation function is to sum relative values of all pieces. Most chess books say that your pawn is worth 1 point and queen is worth 9 points[ref]http://en.wikipedia.org/wiki/Chess_piece_relative_value[/ref]. What is the point of that scoring? Chess is not scrabble, you win by checkmate, not by points! And notice that evaluation is both:

  1. Irrational – you can have more points than your opponent who is going to checkmate you in the next move.
  2. Intuition based on experience – if it was really mathematically calculated then you would get a decimal and not integer numbers.

However, that heuristic enables one thing; you can quickly decide which moves can be discarded. If a series of moves would decrease your points much more than your opponent’s, then you ignore that path and focus on a more promising one. When modern chess program examines 20 moves in advance, that doesn’t mean all possible combinations of 20 moves. The computer can miss checkmate in 9 moves just because evaluation function cut the search of at that point.

I find it fascinating that the introduction of irrational prejudice in a system (hey little pawn, you are worthless compared to the queen) makes the entire system more efficient. And that prejudice needs to be simple and fast, or otherwise you are not going to achieve millions of evaluations per second. If you start calculating who can be threatened by a particular pawn in the few moves then evaluation function is too slow to be useful.

Humans are many orders of magnitude slower than computers. But using generalizations and pattern recognition we can get to suboptimal solutions fast. Humans are still beating the best computer programs at the simple game of Go[ref]http://en.wikipedia.org/wiki/Computer_Go[/ref].

It is intriguing how many different problems hit the same barrier or combinatorial explosion. I think AI needs to focus more on the science of approximation than on the science of conclusions. You would be surprised how simple and obviously wrong approximation methods can outperform “right” approaches.

Take machine translation for example. For decades researchers were trying to fill machines with vocabulary and grammar rules in order to translate from one language to another. For decades they were having miserable results. Then somebody noticed that simple statistical models give pretty good results. Statistical translators just calculate probability that one phrase translates to another based on the words surrounding it. It is obviously a “wrong” approach — program is just doing word counting and statistics, without understanding of words, grammar or syntax. It produces some funny translate errors. However, it works really great. Google Translate is a statistical machine translator, trained on approximately 200 billion words from United Nations documents[ref]http://en.wikipedia.org/wiki/Machine_translation#Approaches[/ref].

Let me end with an old computer joke. In nineties Intel released Pentium P5, the fastest processor to date. However, it had a floating point division bug, the Pentium P5 was giving the wrong decimals for some calculations[ref]http://en.wikipedia.org/wiki/Pentium_FDIV_bug[/ref]. The joke goes:

Q: Why is Pentium faster than other chips?
A: Because it guesses the result.

I think the same will be true for any general purpose AI system; generalization and experience based intuition is an integral part of reasoning about the complex world. Otherwise, you just end up doing calculations forever. The computers that are always correct will stay in a beautiful world of science fiction.