To achieve the singularity, it isn’t enough to just run today’s software faster. We would also need to build smarter and more capable software programs. Creating this kind of advanced software requires a prior scientific understanding of the foundations of human cognition, and we are just scraping the surface of this.
This prior need to understand the basic science of cognition is where the “singularity is near” arguments fail to persuade us. It is true that computer hardware technology can develop amazingly quickly once we have a solid scientific framework and adequate economic incentives. However, creating the software for a real singularity-level computer intelligence will require fundamental scientific progress beyond where we are today.
The Singularity Isn’t Near
by Paul Allen
MIT Technology Review
The Singularity Summit approaches this weekend in New York. But the Microsoft cofounder and a colleague say the singularity itself is a long way off.
https://www.technologyreview.com/s/425733/paul-allen-the-singularity-isnt-near/
uturists like Vernor Vinge and Ray Kurzweil have argued that the world is rapidly approaching a tipping point, where the accelerating pace of smarter and smarter machines will soon outrun all human capabilities. They call this tipping point the singularity, because they believe it is impossible to predict how the human future might unfold after this point. Once these machines exist, Kurzweil and Vinge claim, they’ll possess a superhuman intelligence that is so incomprehensible to us that we cannot even rationally guess how our life experiences would be altered. Vinge asks us to ponder the role of humans in a world where machines are as much smarter than us as we are smarter than our pet dogs and cats. Kurzweil, who is a bit more optimistic, envisions a future in which developments in medical nanotechnology will allow us to download a copy of our individual brains into these superhuman machines, leave our bodies behind, and, in a sense, live forever. It’s heady stuff.
While we suppose this kind of singularity might one day occur, we don’t think it is near. In fact, we think it will be a very long time coming. Kurzweil disagrees, based on his extrapolations about the rate of relevant scientific and technical progress. He reasons that the rate of progress toward the singularity isn’t just a progression of steadily increasing capability, but is in fact exponentially accelerating—what Kurzweil calls the “Law of Accelerating Returns.” He writes that:
By working through a set of models and historical data, Kurzweil famously calculates that the singularity will arrive around 2045.
This prediction seems to us quite far-fetched. Of course, we are aware that the history of science and technology is littered with people who confidently assert that some event can’t happen, only to be later proven wrong—often in spectacular fashion. We acknowledge that it is possible but highly unlikely that Kurzweil will eventually be vindicated. An adult brain is a finite thing, so its basic workings can ultimately be known through sustained human effort. But if the singularity is to arrive by 2045, it will take unforeseeable and fundamentally unpredictable breakthroughs, and not because the Law of Accelerating Returns made it the inevitable result of a specific exponential rate of progress.
Kurzweil’s reasoning rests on the Law of Accelerating Returns and its siblings, but these are not physical laws. They are assertions about how past rates of scientific and technical progress can predict the future rate. Therefore, like other attempts to forecast the future from the past, these “laws” will work until they don’t. More problematically for the singularity, these kinds of extrapolations derive much of their overall exponential shape from supposing that there will be a constant supply of increasingly more powerful computing capabilities. For the Law to apply and the singularity to occur circa 2045, the advances in capability have to occur not only in a computer’s hardware technologies (memory, processing power, bus speed, etc.) but also in the software we create to run on these more capable computers. To achieve the singularity, it isn’t enough to just run today’s software faster. We would also need to build smarter and more capable software programs. Creating this kind of advanced software requires a prior scientific understanding of the foundations of human cognition, and we are just scraping the surface of this.
This prior need to understand the basic science of cognition is where the “singularity is near” arguments fail to persuade us. It is true that computer hardware technology can develop amazingly quickly once we have a solid scientific framework and adequate economic incentives. However, creating the software for a real singularity-level computer intelligence will require fundamental scientific progress beyond where we are today. This kind of progress is very different than the Moore’s Law-style evolution of computer hardware capabilities that inspired Kurzweil and Vinge. Building the complex software that would allow the singularity to happen requires us to first have a detailed scientific understanding of how the human brain works that we can use as an architectural guide, or else create it all de novo. This means not just knowing the physical structure of the brain, but also how the brain reacts and changes, and how billions of parallel neuron interactions can result in human consciousness and original thought. Getting this kind of comprehensive understanding of the brain is not impossible. If the singularity is going to occur on anything like Kurzweil’s timeline, though, then we absolutely require a massive acceleration of our scientific progress in understanding every facet of the human brain.
But history tells us that the process of original scientific discovery just doesn’t behave this way, especially in complex areas like neuroscience, nuclear fusion, or cancer research. Overall scientific progress in understanding the brain rarely resembles an orderly, inexorable march to the truth, let alone an exponentially accelerating one. Instead, scientific advances are often irregular, with unpredictable flashes of insight punctuating the slow grind-it-out lab work of creating and testing theories that can fit with experimental observations. Truly significant conceptual breakthroughs don’t arrive when predicted, and every so often new scientific paradigms sweep through the field and cause scientists to reëvaluate portions of what they thought they had settled. We see this in neuroscience with the discovery of long-term potentiation, the columnar organization of cortical areas, and neuroplasticity. These kinds of fundamental shifts don’t support the overall Moore’s Law-style acceleration needed to get to the singularity on Kurzweil’s schedule.
The Complexity Brake
The foregoing points at a basic issue with how quickly a scientifically adequate account of human intelligence can be developed. We call this issue the complexity brake. As we go deeper and deeper in our understanding of natural systems, we typically find that we require more and more specialized knowledge to characterize them, and we are forced to continuously expand our scientific theories in more and more complex ways. Understanding the detailed mechanisms of human cognition is a task that is subject to this complexity brake. Just think about what is required to thoroughly understand the human brain at a micro level. The complexity of the brain is simply awesome. Every structure has been precisely shaped by millions of years of evolution to do a particular thing, whatever it might be. It is not like a computer, with billions of identical transistors in regular memory arrays that are controlled by a CPU with a few different elements. In the brain every individual structure and neural circuit has been individually refined by evolution and environmental factors. The closer we look at the brain, the greater the degree of neural variation we find. Understanding the neural structure of the human brain is getting harder as we learn more. Put another way, the more we learn, the more we realize there is to know, and the more we have to go back and revise our earlier understandings. We believe that one day this steady increase in complexity will end—the brain is, after all, a finite set of neurons and operates according to physical principles. But for the foreseeable future, it is the complexity brake and arrival of powerful new theories, rather than the Law of Accelerating Returns, that will govern the pace of scientific progress required to achieve the singularity.
So, while we think a fine-grained understanding of the neural structure of the brain is ultimately achievable, it has not shown itself to be the kind of area in which we can make exponentially accelerating progress. But suppose scientists make some brilliant new advance in brain scanning technology. Singularity proponents often claim that we can achieve computer intelligence just by numerically simulating the brain “bottom up” from a detailed neural-level picture. For example, Kurzweil predicts the development of nondestructive brain scanners that will allow us to precisely take a snapshot a person’s living brain at the subneuron level. He suggests that these scanners would most likely operate from inside the brain via millions of injectable medical nanobots. But, regardless of whether nanobot-based scanning succeeds (and we aren’t even close to knowing if this is possible), Kurzweil essentially argues that this is the needed scientific advance that will gate the singularity: computers could exhibit human-level intelligence simply by loading the state and connectivity of each of a brain’s neurons inside a massive digital brain simulator, hooking up inputs and outputs, and pressing “start.”
However, the difficulty of building human-level software goes deeper than computationally modeling the structural connections and biology of each of our neurons. “Brain duplication” strategies like these presuppose that there is no fundamental issue in getting to human cognition other than having sufficient computer power and neuron structure maps to do the simulation. While this may be true theoretically, it has not worked out that way in practice, because it doesn’t address everything that is actually needed to build the software. For example, if we wanted to build software to simulate a bird’s ability to fly in various conditions, simply having a complete diagram of bird anatomy isn’t sufficient. To fully simulate the flight of an actual bird, we also need to know how everything functions together. In neuroscience, there is a parallel situation. Hundreds of attempts have been made (using many different organisms) to chain together simulations of different neurons along with their chemical environment. The uniform result of these attempts is that in order to create an adequate simulation of the real ongoing neural activity of an organism, you also need a vast amount of knowledge about the functional role that these neurons play, how their connection patterns evolve, how they are structured into groups to turn raw stimuli into information, and how neural information processing ultimately affects an organism’s behavior. Without this information, it has proven impossible to construct effective computer-based simulation models. Especially for the cognitive neuroscience of humans, we are not close to the requisite level of functional knowledge. Brain simulation projects underway today model only a small fraction of what neurons do and lack the detail to fully simulate what occurs in a brain. The pace of research in this area, while encouraging, hardly seems to be exponential. Again, as we learn more and more about the actual complexity of how the brain functions, the main thing we find is that the problem is actually getting harder.
The AI Approach
Singularity proponents occasionally appeal to developments in artificial intelligence (AI) as a way to get around the slow rate of overall scientific progress in bottom-up, neuroscience-based approaches to cognition. It is true that AI has had great successes in duplicating certain isolated cognitive tasks, most recently with IBM’s Watson system for Jeopardy! question answering. But when we step back, we can see that overall AI-based capabilities haven’t been exponentially increasing either, at least when measured against the creation of a fully general human intelligence. While we have learned a great deal about how to build individual AI systems that do seemingly intelligent things, our systems have always remained brittle—their performance boundaries are rigidly set by their internal assumptions and defining algorithms, they cannot generalize, and they frequently give nonsensical answers outside of their specific focus areas. A computer program that plays excellent chess can’t leverage its skill to play other games. The best medical diagnosis programs contain immensely detailed knowledge of the human body but can’t deduce that a tightrope walker would have a great sense of balance.
Why has it proven so difficult for AI researchers to build human-like intelligence, even at a small scale? One answer involves the basic scientific framework that AI researchers use. As humans grow from infants to adults, they begin by acquiring a general knowledge about the world, and then continuously augment and refine this general knowledge with specific knowledge about different areas and contexts. AI researchers have typically tried to do the opposite: they have built systems with deep knowledge of narrow areas, and tried to create a more general capability by combining these systems. This strategy has not generally been successful, although Watson’s performance on Jeopardy! indicates paths like this may yet have promise. The few attempts that have been made to directly create a large amount of general knowledge of the world, and then add the specialized knowledge of a domain (for example, the work of Cycorp), have also met with only limited success. And in any case, AI researchers are only just beginning to theorize about how to effectively model the complex phenomena that give human cognition its unique flexibility: uncertainty, contextual sensitivity, rules of thumb, self-reflection, and the flashes of insight that are essential to higher-level thought. Just as in neuroscience, the AI-based route to achieving singularity-level computer intelligence seems to require many more discoveries, some new Nobel-quality theories, and probably even whole new research approaches that are incommensurate with what we believe now. This kind of basic scientific progress doesn’t happen on a reliable exponential growth curve. So although developments in AI might ultimately end up being the route to the singularity, again the complexity brake slows our rate of progress, and pushes the singularity considerably into the future.
The amazing intricacy of human cognition should serve as a caution to those who claim the singularity is close. Without having a scientifically deep understanding of cognition, we can’t create the software that could spark the singularity. Rather than the ever-accelerating advancement predicted by Kurzweil, we believe that progress toward this understanding is fundamentally slowed by the complexity brake. Our ability to achieve this understanding, via either the AI or the neuroscience approaches, is itself a human cognitive act, arising from the unpredictable nature of human ingenuity and discovery. Progress here is deeply affected by the ways in which our brains absorb and process new information, and by the creativity of researchers in dreaming up new theories. It is also governed by the ways that we socially organize research work in these fields, and disseminate the knowledge that results. At Vulcan and at the Allen Institute for Brain Science, we are working on advanced tools to help researchers deal with this daunting complexity, and speed them in their research. Gaining a comprehensive scientific understanding of human cognition is one of the hardest problems there is. We continue to make encouraging progress. But by the end of the century, we believe, we will still be wondering if the singularity is near.
Kurzweil Responds: Don’t Underestimate the Singularity
MIT Technology Review
Last week, Paul Allen and a colleague challenged the prediction that computers will soon exceed human intelligence. Now Ray Kurzweil, the leading proponent of the “Singularity,” offers a rebuttal.
https://www.technologyreview.com/s/425818/kurzweil-responds-dont-underestimate-the-singularity/
Although Paul Allen paraphrases my 2005 book, The Singularity Is Near, in the title of his essay (cowritten with his colleague Mark Greaves), it appears that he has not actually read the book. His only citation is to an essay I wrote in 2001 (“The Law of Accelerating Returns”) and his article does not acknowledge or respond to arguments I actually make in the book.
When my 1999 book, The Age of Spiritual Machines, was published, and augmented a couple of years later by the 2001 essay, it generated several lines of criticism, such as Moore’s law will come to an end, hardware capability may be expanding exponentially but software is stuck in the mud, the brain is too complicated, there are capabilities in the brain that inherently cannot be replicated in software, and several others. I specifically wrote The Singularity Is Near to respond to those critiques.
I cannot say that Allen would necessarily be convinced by the arguments I make in the book, but at least he could have responded to what I actually wrote. Instead, he offers de novo arguments as if nothing has ever been written to respond to these issues. Allen’s descriptions of my own positions appear to be drawn from my 10-year-old essay. While I continue to stand by that essay, Allen does not summarize my positions correctly even from that essay.
Allen writes that “the Law of Accelerating Returns (LOAR)… is not a physical law.” I would point out that most scientific laws are not physical laws, but result from the emergent properties of a large number of events at a finer level. A classical example is the laws of thermodynamics (LOT). If you look at the mathematics underlying the LOT, they model each particle as following a random walk. So by definition, we cannot predict where any particular particle will be at any future time. Yet the overall properties of the gas are highly predictable to a high degree of precision according to the laws of thermodynamics. So it is with the law of accelerating returns. Each technology project and contributor is unpredictable, yet the overall trajectory as quantified by basic measures of price-performance and capacity nonetheless follow remarkably predictable paths.
If computer technology were being pursued by only a handful of researchers, it would indeed be unpredictable. But it’s being pursued by a sufficiently dynamic system of competitive projects that a basic measure such as instructions per second per constant dollar follows a very smooth exponential path going back to the 1890 American census. I discuss the theoretical basis for the LOAR extensively in my book, but the strongest case is made by the extensive empirical evidence that I and others present.
Allen writes that “these ‘laws’ work until they don’t.” Here, Allen is confusing paradigms with the ongoing trajectory of a basic area of information technology. If we were examining the trend of creating ever-smaller vacuum tubes, the paradigm for improving computation in the 1950s, it’s true that this specific trend continued until it didn’t. But as the end of this particular paradigm became clear, research pressure grew for the next paradigm. The technology of transistors kept the underlying trend of the exponential growth of price-performance going, and that led to the fifth paradigm (Moore’s law) and the continual compression of features on integrated circuits. There have been regular predictions that Moore’s law will come to an end. The semiconductor industry’s roadmap titled projects seven-nanometer features by the early 2020s. At that point, key features will be the width of 35 carbon atoms, and it will be difficult to continue shrinking them. However, Intel and other chip makers are already taking the first steps toward the sixth paradigm, which is computing in three dimensions to continue exponential improvement in price performance. Intel projects that three-dimensional chips will be mainstream by the teen years. Already three-dimensional transistors and three-dimensional memory chips have been introduced.
This sixth paradigm will keep the LOAR going with regard to computer price performance to the point, later in this century, where a thousand dollars of computation will be trillions of times more powerful than the human brain. And it appears that Allen and I are at least in agreement on what level of computation is required to functionally simulate the human brain.
Allen then goes on to give the standard argument that software is not progressing in the same exponential manner of hardware. In The Singularity Is Near, I address this issue at length, citing different methods of measuring complexity and capability in software that demonstrate a similar exponential growth. One recent study (“Report to the President and Congress, Designing a Digital Future: Federally Funded Research and Development in Networking and Information Technology” by the President’s Council of Advisors on Science and Technology) states the following:
“Even more remarkable—and even less widely understood—is that in many areas, performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed. The algorithms that we use today for speech recognition, for natural language translation, for chess playing, for logistics planning, have evolved remarkably in the past decade … Here is just one example, provided by Professor Martin Grötschel of Konrad-Zuse-Zentrum für Informationstechnik Berlin. Grötschel, an expert in optimization, observes that a benchmark production planning model solved using linear programming would have taken 82 years to solve in 1988, using the computers and the linear programming algorithms of the day. Fifteen years later—in 2003—this same model could be solved in roughly one minute, an improvement by a factor of roughly 43 million. Of this, a factor of roughly 1,000 was due to increased processor speed, whereas a factor of roughly 43,000 was due to improvements in algorithms! Grötschel also cites an algorithmic improvement of roughly 30,000 for mixed integer programming between 1991 and 2008. The design and analysis of algorithms, and the study of the inherent computational complexity of problems, are fundamental subfields of computer science.”
I cite many other examples like this in the book.
Regarding AI, Allen is quick to dismiss IBM’s Watson as narrow, rigid, and brittle. I get the sense that Allen would dismiss any demonstration short of a valid passing of the Turing test. I would point out that Watson is not so narrow. It deals with a vast range of human knowledge and is capable of dealing with subtle forms of language, including puns, similes, and metaphors. It’s not perfect, but neither are humans, and it was good enough to get a higher score than the best two human Jeopardy! players put together.
Allen writes that Watson was put together by the scientists themselves, building each link of narrow knowledge in specific areas. Although some areas of Watson’s knowledge were programmed directly, according to IBM, Watson acquired most of its knowledge on its own by reading natural language documents such as encyclopedias. That represents its key strength. It not only is able to understand the convoluted language in Jeopardy! queries (answers in search of a question), but it acquired its knowledge by reading vast amounts of natural-language documents. IBM is now working with Nuance (a company I originally founded as Kurzweil Computer Products) to have Watson read tens of thousands of medical articles to create a medical diagnostician.
A word on the nature of Watson’s “understanding” is in order here. A lot has been written that Watson works through statistical knowledge rather than “true” understanding. Many readers interpret this to mean that Watson is merely gathering statistics on word sequences. The term “statistical information” in the case of Watson refers to distributed coefficients in self-organizing methods such as Markov models. One could just as easily refer to the distributed neurotransmitter concentrations in the human cortex as “statistical information.” Indeed, we resolve ambiguities in much the same way that Watson does by considering the likelihood of different interpretations of a phrase.
Allen writes: “Every structure [in the brain] has been precisely shaped by millions of years of evolution to do a particular thing, whatever it might be. It is not like a computer, with billions of identical transistors in regular memory arrays that are controlled by a CPU with a few different elements. In the brain, every individual structure and neural circuit has been individually refined by evolution and environmental factors.”
Allen’s statement that every structure and neural circuit is unique is simply impossible. That would mean that the design of the brain would require hundreds of trillions of bytes of information. Yet the design of the brain (like the rest of the body) is contained in the genome. And while the translation of the genome into a brain is not straightforward, the brain cannot have more design information than the genome. Note that epigenetic information (such as the peptides controlling gene expression) do not appreciably add to the amount of information in the genome. Experience and learning do add significantly to the amount of information, but the same can be said of AI systems. I show in The Singularity Is Near that after lossless compression (due to massive redundancy in the genome), the amount of design information in the genome is about 50 million bytes, roughly half of which pertains to the brain. That’s not simple, but it is a level of complexity we can deal with and represents less complexity than many software systems in the modern world.
How do we get on the order of 100 trillion connections in the brain from only tens of millions of bytes of design information? Obviously, the answer is through redundancy. There are on the order of a billion pattern-recognition mechanisms in the cortex. They are interconnected in intricate ways, but even in the connections there is massive redundancy. The cerebellum also has billions of repeated patterns of neurons. It is true that the massively repeated structures in the brain learn different items of information as we learn and gain experience, but the same thing is true of artificially intelligent systems such as Watson.
Dharmendra S. Modha, manager of cognitive computing for IBM Research, writes: “…neuroanatomists have not found a hopelessly tangled, arbitrarily connected network, completely idiosyncratic to the brain of each individual, but instead a great deal of repeating structure within an individual brain and a great deal of homology across species … The astonishing natural reconfigurability gives hope that the core algorithms of neurocomputation are independent of the specific sensory or motor modalities and that much of the observed variation in cortical structure across areas represents a refinement of a canonical circuit; it is indeed this canonical circuit we wish to reverse engineer.”
Allen articulates what I describe in my book as the “scientist’s pessimism.” Scientists working on the next generation are invariably struggling with that next set of challenges, so if someone describes what the technology will look like in 10 generations, their eyes glaze over. One of the pioneers of integrated circuits was describing to me recently the struggles to go from 10 micron (10,000-nanometer) feature sizes to five-micron (5,000 nanometers) features over 30 years ago. They were cautiously confident of this goal, but when people predicted that someday we would actually have circuitry with feature sizes under one micron (1,000 nanometers), most of the scientists struggling to get to five microns thought that was too wild to contemplate. Objections were made on the fragility of circuitry at that level of precision, thermal effects, and so on. Well, today, Intel is starting to use chips with 22-nanometer gate lengths.
We saw the same pessimism with the genome project. Halfway through the 15-year project, only 1 percent of the genome had been collected, and critics were proposing basic limits on how quickly the genome could be sequenced without destroying the delicate genetic structures. But the exponential growth in both capacity and price performance continued (both roughly doubling every year), and the project was finished seven years later. The project to reverse-engineer the human brain is making similar progress. It is only recently, for example, that we have reached a threshold with noninvasive scanning techniques that we can see individual interneuronal connections forming and firing in real time.
Allen’s “complexity brake” confuses the forest with the trees. If you want to understand, model, simulate, and re-create a pancreas, you don’t need to re-create or simulate every organelle in every pancreatic Islet cell. You would want, instead, to fully understand one Islet cell, then abstract its basic functionality, and then extend that to a large group of such cells. This algorithm is well understood with regard to Islet cells. There are now artificial pancreases that utilize this functional model being tested. Although there is certainly far more intricacy and variation in the brain than in the massively repeated Islet cells of the pancreas, there is nonetheless massive repetition of functions.
Allen mischaracterizes my proposal to learn about the brain from scanning the brain to understand its fine structure. It is not my proposal to simulate an entire brain “bottom up” without understanding the information processing functions. We do need to understand in detail how individual types of neurons work, and then gather information about how functional modules are connected. The functional methods that are derived from this type of analysis can then guide the development of intelligent systems. Basically, we are looking for biologically inspired methods that can accelerate work in AI, much of which has progressed without significant insight as to how the brain performs similar functions. From my own work in speech recognition, I know that our work was greatly accelerated when we gained insights as to how the brain prepares and transforms auditory information.
The way that these massively redundant structures in the brain differentiate is through learning and experience. The current state of the art in AI does, however, enable systems to also learn from their own experience. The Google self-driving cars (which have driven over 140,000 miles through California cities and towns) learn from their own driving experience as well as from Google cars driven by human drivers. As I mentioned, Watson learned most of its knowledge by reading on its own.
It is true that Watson is not quite at human levels in its ability to understand human language (if it were, we would be at the Turing test level now), yet it was able to defeat the best humans. This is because of the inherent speed and reliability of memory that computers have. So when a computer does reach human levels, which I believe will happen by the end of the 2020s, it will be able to go out on the Web and read billions of pages as well as have experiences in online virtual worlds. Combining human-level pattern recognition with the inherent speed and accuracy of computers will be very powerful. But this is not an alien invasion of intelligence machines—we create these tools to make ourselves smarter. I think Allen will agree with me that this is what is unique about the human species: we build these tools to extend our own reach.