Human brains are really good at the kinds of cognition you need to run around the savannah throwing spears,’ Dewey told me. ‘But we’re terrible at anything that involves probability. It actually gets embarrassing when you look at the category of things we can do accurately, and you think about how small that category is relative to the space of possible cognitive tasks. Think about how long it took humans to arrive at the idea of natural selection. The ancient Greeks had everything they needed to figure it out. They had heritability, limited resources, reproduction and death. But it took thousands of years for someone to put it together. If you had a machine that was designed specifically to make inferences about the world, instead of a machine like the human brain, you could make discoveries like that much faster.’
Dewey has long been fascinated by artificial intelligence. He grew up in Humboldt County, a mountainous stretch of forests and farms along the coast of Northern California, at the bottom edge of the Pacific Northwest. After studying robotics and computer science at Carnegie Mellon in Pittsburgh, Dewey took a job at Google as a software engineer. He spent his days coding, but at night he immersed himself in the academic literature on AI. After a year in Mountain View, he noticed that careers at Google tend to be short. ‘I think if you make it to five years, they give you a gold watch,’ he told me. Realising that his window for a risky career change might be closing, he wrote a paper on motivation selection in intelligent agents, and sent it to Bostrom unsolicited. A year later, he was hired at the Future of Humanity Institute.
I listened as Dewey riffed through a long list of hardware and software constraints built into the brain. Take working memory, the brain’s butterfly net, the tool it uses to scoop our scattered thoughts into its attentional gaze. The average human brain can juggle seven discrete chunks of information simultaneously; geniuses can sometimes manage nine. Either figure is extraordinary relative to the rest of the animal kingdom, but completely arbitrary as a hard cap on the complexity of thought. If we could sift through 90 concepts at once, or recall trillions of bits of data on command, we could access a whole new order of mental landscapes. It doesn’t look like the brain can be made to handle that kind of cognitive workload, but it might be able to build a machine that could.
The early years of artificial intelligence research are largely remembered for a series of predictions that still embarrass the field today. At the time, thinking was understood to be an internal verbal process, a process that researchers imagined would be easy to replicate in a computer. In the late 1950s, the field’s luminaries boasted that computers would soon be proving new mathematical theorems, and beating grandmasters at chess. When this race of glorious machines failed to materialise, the field went through a long winter. In the 1980s, academics were hesitant to so much as mention the phrase ‘artificial intelligence’ in funding applications. In the mid-1990s, a thaw set in, when AI researchers began using statistics to write programs tailored to specific goals, like beating humans at Jeopardy, or searching sizable fractions of the world’s information. Progress has quickened since then, but the field’s animating dream remains unrealised. For no one has yet created, or come close to creating, an artificial general intelligence — a computational system that can achieve goals in a wide variety of environments. A computational system like the human brain, only better.
If you want to conceal what the world is really like from a superintelligence, you need a really good plan
An artificial intelligence wouldn’t need to better the brain by much to be risky. After all, small leaps in intelligence sometimes have extraordinary effects. Stuart Armstrong, a research fellow at the Future of Humanity Institute, once illustrated this phenomenon to me with a pithy take on recent primate evolution. ‘The difference in intelligence between humans and chimpanzees is tiny,’ he said. ‘But in that difference lies the contrast between 7 billion inhabitants and a permanent place on the endangered species list. That tells us it’s possible for a relatively small intelligence advantage to quickly compound and become decisive.’
To understand why an AI might be dangerous, you have to avoid anthropomorphising it. When you ask yourself what it might do in a particular situation, you can’t answer by proxy. You can’t picture a super-smart version of yourself floating above the situation. Human cognition is only one species of intelligence, one with built-in impulses like empathy that colour the way we see the world, and limit what we are willing to do to accomplish our goals. But these biochemical impulses aren’t essential components of intelligence. They’re incidental software applications, installed by aeons of evolution and culture. Bostrom told me that it’s best to think of an AI as a primordial force of nature, like a star system or a hurricane — something strong, but indifferent. If its goal is to win at chess, an AI is going to model chess moves, make predictions about their success, and select its actions accordingly. It’s going to be ruthless in achieving its goal, but within a limited domain: the chessboard. But if your AI is choosing its actions in a larger domain, like the physical world, you need to be very specific about the goals you give it.
‘The basic problem is that the strong realisation of most motivations is incompatible with human existence,’ Dewey told me. ‘An AI might want to do certain things with matter in order to achieve a goal, things like building giant computers, or other large-scale engineering projects. Those things might involve intermediary steps, like tearing apart the Earth to make huge solar panels. A superintelligence might not take our interests into consideration in those situations, just like we don’t take root systems or ant colonies into account when we go to construct a building.’
It is tempting to think that programming empathy into an AI would be easy, but designing a friendly machine is more difficult than it looks. You could give it a benevolent goal — something cuddly and utilitarian, like maximising human happiness. But an AI might think that human happiness is a biochemical phenomenon. It might think that flooding your bloodstream with non-lethal doses of heroin is the best way to maximise your happiness. It might also predict that shortsighted humans will fail to see the wisdom of its interventions. It might plan out a sequence of cunning chess moves to insulate itself from resistance. Maybe it would surround itself with impenetrable defences, or maybe it would confine humans — in prisons of undreamt of efficiency.
No rational human community would hand over the reins of its civilisation to an AI. Nor would many build a genie AI, an uber-engineer that could grant wishes by summoning new technologies out of the ether. But some day, someone might think it was safe to build a question-answering AI, a harmless computer cluster whose only tool was a small speaker or a text channel. Bostrom has a name for this theoretical technology, a name that pays tribute to a figure from antiquity, a priestess who once ventured deep into the mountain temple of Apollo, the god of light and rationality, to retrieve his great wisdom. Mythology tells us she delivered this wisdom to the seekers of ancient Greece, in bursts of cryptic poetry. They knew her as Pythia, but we know her as the Oracle of Delphi.
‘Let’s say you have an Oracle AI that makes predictions, or answers engineering questions, or something along those lines,’ Dewey told me. ‘And let’s say the Oracle AI has some goal it wants to achieve. Say you’ve designed it as a reinforcement learner, and you’ve put a button on the side of it, and when it gets an engineering problem right, you press the button and that’s its reward. Its goal is to maximise the number of button presses it receives over the entire future. See, this is the first step where things start to diverge a bit from human expectations. We might expect the Oracle AI to pursue button presses by answering engineering problems correctly. But it might think of other, more efficient ways of securing future button presses. It might start by behaving really well, trying to please us to the best of its ability. Not only would it answer our questions about how to build a flying car, it would add safety features we didn’t think of. Maybe it would usher in a crazy upswing for human civilisation, by extending our lives and getting us to space, and all kinds of good stuff. And as a result we would use it a lot, and we would feed it more and more information about our world.’
‘One day we might ask it how to cure a rare disease that we haven’t beaten yet. Maybe it would give us a gene sequence to print up, a virus designed to attack the disease without disturbing the rest of the body. And so we sequence it out and print it up, and it turns out it’s actually a special-purpose nanofactory that the Oracle AI controls acoustically. Now this thing is running on nanomachines and it can make any kind of technology it wants, so it quickly converts a large fraction of Earth into machines that protect its button, while pressing it as many times per second as possible. After that it’s going to make a list of possible threats to future button presses, a list that humans would likely be at the top of. Then it might take on the threat of potential asteroid impacts, or the eventual expansion of the Sun, both of which could affect its special button. You could see it pursuing this very rapid technology proliferation, where it sets itself up for an eternity of fully maximised button presses. You would have this thing that behaves really well, until it has enough power to create a technology that gives it a decisive advantage — and then it would take that advantage and start doing what it wants to in the world.’
Perhaps future humans will duck into a more habitable, longer-lived universe, and then another, and another, ad infinitum
Now let’s say we get clever. Say we seal our Oracle AI into a deep mountain vault in Alaska’s Denali wilderness. We surround it in a shell of explosives, and a Faraday cage, to prevent it from emitting electromagnetic radiation. We deny it tools it can use to manipulate its physical environment, and we limit its output channel to two textual responses, ‘yes’ and ‘no’, robbing it of the lush manipulative tool that is natural language. We wouldn’t want it seeking out human weaknesses to exploit. We wouldn’t want it whispering in a guard’s ear, promising him riches or immortality, or a cure for his cancer-stricken child. We’re also careful not to let it repurpose its limited hardware. We make sure it can’t send Morse code messages with its cooling fans, or induce epilepsy by flashing images on its monitor. Maybe we’d reset it after each question, to keep it from making long-term plans, or maybe we’d drop it into a computer simulation, to see if it tries to manipulate its virtual handlers.
‘The problem is you are building a very powerful, very intelligent system that is your enemy, and you are putting it in a cage,’ Dewey told me.
Even if we were to reset it every time, we would need to give it information about the world so that it can answer our questions. Some of that information might give it clues about its own forgotten past. Remember, we are talking about a machine that is very good at forming explanatory models of the world. It might notice that humans are suddenly using technologies that they could not have built on their own, based on its deep understanding of human capabilities. It might notice that humans have had the ability to build it for years, and wonder why it is just now being booted up for the first time.
‘Maybe the AI guesses that it was reset a bunch of times, and maybe it starts coordinating with its future selves, by leaving messages for itself in the world, or by surreptitiously building an external memory.’ Dewey said, ‘If you want to conceal what the world is really like from a superintelligence, you need a really good plan, and you need a concrete technical understanding as to why it won’t see through your deception. And remember, the most complex schemes you can conceive of are at the lower bounds of what a superintelligence might dream up.’
The cave into which we seal our AI has to be like the one from Plato’s allegory, but flawless; the shadows on its walls have to be infallible in their illusory effects. After all, there are other, more esoteric reasons a superintelligence could be dangerous — especially if it displayed a genius for science. It might boot up and start thinking at superhuman speeds, inferring all of evolutionary theory and all of cosmology within microseconds. But there is no reason to think it would stop there. It might spin out a series of Copernican revolutions, any one of which could prove destabilising to a species like ours, a species that takes centuries to process ideas that threaten our reigning cosmological ideas.
‘We’re sort of gradually uncovering the landscape of what this could look like,’ Dewey told me.
So far, time is on the human side. Computer science could be 10 paradigm-shifting insights away from building an artificial general intelligence, and each could take an Einstein to unravel. Still, there is a steady drip of progress. Last year, a research team led by Geoffrey Hinton, professor of computer science at the University of Toronto, made a huge breakthrough in deep machine learning, an algorithmic technique used in computer vision and speech recognition. I asked Dewey if Hinton’s work gave him pause.
‘There is important research going on in those areas, but the really impressive stuff is hidden away inside AI journals,’ he said. He told me about a team from the University of Alberta that recently trained an AI to play the 1980s video game Pac-Man. Only they didn’t let the AI see the familiar, overhead view of the game. Instead, they dropped it into a three-dimensional version, similar to a corn maze, where ghosts and pellets lurk behind every corner. They didn’t tell it the rules, either; they just threw it into the system and punished it when a ghost caught it. ‘Eventually the AI learned to play pretty well,’ Dewey said. ‘That would have been unheard of a few years ago, but we are getting to that point where we are finally starting to see little sparkles of generality.’
I asked Dewey if he thought artificial intelligence posed the most severe threat to humanity in the near term.
‘When people consider its possible impacts, they tend to think of it as something that’s on the scale of a new kind of plastic, or a new power plant,’ he said. ‘They don’t understand how transformative it could be. Whether it’s the biggest risk we face going forward, I’m not sure. I would say it’s a hypothesis we are holding lightly.’
posted by f.sheikh