Everyone seems to think that the self-driving car is just a few years away. And we have already tried them on the streets of San Francisco—although some experiments, like Uber’s, have had a human sitting in the front seat, monitoring its progress, ready to take control if something should go wrong.1 Experiments with self-driving cars are also going on in other cities.
I am not convinced that this is the future. Not that I am a Luddite. Far from it! In the distant future—which I postulate as thirty to fifty years from now, and getting closer all the time—machine minds with mechanical senses will perform many non-repetitive tasks that require the system to evaluate novel situations and make independent judgments. And these minds will still be subject to the same errors of fact and interpretation that plague human beings today. Driving will certainly be one of these difficult tasks—but first, a whole lot of development and testing needs to take place.
What happens when a person drives? The process is much more than visually analyzing the twists and turns of the road and tracking the wheels to follow them. Or measuring the speed of the car ahead and adjusting your own speed to maintain pace. Or perceiving that the brake lights on the car ahead have come on and applying your own. Or detecting an obstacle in the road and swerving, slowing, or turning to avoid it. Those tasks will move you forward and keep you safe for about two car lengths. After that, everything else is a matter of conjecture.
A good driver goes through a sophisticated mental process called “SIPRE,” which stands for seeing, interpreting, predicting, reacting, and executing.2 The driver is not simply reacting to visual cues, but interpreting them as part of a dynamic situation and predicting the course of that situation. Essential to those two acts is the prior development of and continuous referral to learned experience. The human mind interprets a situation based on what it has seen in the past and recalls in the present.3 It makes predictions based on past outcomes. This is the difference between a new driver, who is nervously dealing with the unknowns on the road, and an experienced driver, who knows what to expect and how he or she will react in many different situations.
A good driver also maintains an awareness of the entire flow of traffic, not just the lane and the car immediately ahead. Traffic patterns are a collective reaction, like fish schooling or birds flocking. Each car reacts individually based on the movements of those around it. The person in the car ahead may not immediately react to a change in the pattern; so an alert driver usually watches further ahead—and to the sides in multi-lane and freeway driving—to detect potential slowdowns, lane changes, and other early signs of a breakup in the pattern. And even on a single-lane or two-way street, the alert driver monitors parked cars, objects at the curb, and events on the sidewalk. The driver’s awareness extends beyond the immediate situation and his or her own immediate intentions.
A good driver also understands human responses and reactions behind the wheel. Often, in merging around an on- or off-ramp, two cars will be on a converging course. First one slows, to allow the other into the lane. Then the merging car slows, to avoid hitting the first car. Then the first one slows some more, to give the second car another chance to merge, and so on right down to a dangerous, slow-speed encounter.4 Human minds will sometimes get into this kind of “After you, Alphonse” routine—until one of them wakes up, realizes that the other driver really does mean to defer, and so takes the initiative and speeds ahead. Similarly, when two cars arrive at the same time on the adjoining corners of a four-way stop, they may pause while trying to be polite and defer to the other driver, who might just have arrived a second earlier. They will then wait until one of them either gestures for the other to go or takes initiative and starts forward into the crossing.5 Human beings are aware of other minds and try to read human intentions.
Awareness of the flow of traffic and the intentions and motivations of other drivers represents additional layers of programming that would run in parallel and coordinated with the main steering-and-braking program that guides a self-driving vehicle. These are the “P” part of the SIPRE sequence. These additional algorithms could certainly be programmed in—but not simply as a set of rules. These types of awareness represent a human driver’s judgment based on experience. Certainly, the intelligence that drives an automated car could be loaded with samples from the internal databases of cars that preceded it in production and service. But each car would also need a set of subroutines and memory spaces that allowed it to learn from its own unique experiences based on local driving conditions and then correct errors of impression and interpretation as it develops “street smarts.”
This is because much of the human driver’s experience and many potential sources of error have to do with the “I” part of SIPRE. People look and compare what they see with things they have seen before. Current software and hardware engineers are working hard on digital camera technology and the interpretation of pixelated images in a programming environment. For example, some computer systems are now able to identify faces in a crowded scene. Even the digital camera in a smartphone, which must compete with so many other functions packed into its limited operating system, can detect and focus on human faces in its pre-snap image. But human faces are common for having a predictable set of reference points: two eyes, a nose, a mouth, chin, and so forth and a predictable shape.6
Out on the road, you might think cars and trucks have a similarly predictable set of referents: wheels, taillights, headlights, bumpers, fenders, and so forth. But the automated driving system will have to recognize much more than the components of surrounding traffic. It will need to recognize all sorts of obstacles: fallen trees, boulders, and even something so subtle as the dark spots in the road that represent wheel-damaging potholes and tire-blowing debris. It will have to interpret pedestrians not just as upright, adult figures—like those stick figures on the warning signs—but as people in wheelchairs and people involved with pets, children, and wheeled encumbrances like shopping carts, strollers, and walkers. The system will have to interpret—correctly and safely—images that are obscured by rain, dust, the glare from bright surfaces, and shadows from low-angled sunlight at dawn and dusk.
And the system will have to extend its awareness beyond the road’s shoulder to the general environment as well. A dog on the side of the road might be about to dart into traffic. A dog or a ball bouncing into traffic might be followed by a child. The driving system will have to judge these ranges and speeds just from its camera imagery. For another example, a motorcycle and rider stopped at the side of the road some distance away might present an unfamiliar image and be difficult for a mechanical system to interpret. From certain angles, the image has the same general shape as a cow. But a cow standing beside the road creates an entirely different set of predictions from a motorcycle pausing beside the road. And if either one suddenly moves out into the traffic lane, it will generate its own unique reaction sequence.
All of these conditions and situations will be difficult to test in the software lab before a system is sent out on the road. Traditional software testing tends to involve a regimen of known inputs against which the system passes if it generates the correct—or expected—outputs. This works fine if you feed a math program a number problem and look for it to give the correct—and unique—answer. But it will be impossible to test a driving system against all the strange things a digital camera might pick up while traveling the open road with the scene changing at sixty-five miles an hour and twenty-four frames per second, amid distractions like blowing trees, tumbling debris, flying dust, and angular distortions confusing the image. In short, you don’t know what you’ll find out on the highway until you actually see it.
Driving is more than simply following the road. As a skill set, the level of awareness and prediction it requires can tax even the most alert and experienced human mind. I am sure that, eventually, artificial minds with human-scale awareness will be developed and made small enough to fit into the dashboard of a car. And, as these machines become more ubiquitous, the traffic system will also change to accommodate them. For example, signs will have a digitized radio component for the robot driver as well as visual components for the human driver. Special situations, like construction zones and road blocks will generate their own emergency broadcasts. And other users of the roads, like pedestrians and bicyclists, will be advised to wear armband transponders to help the driving machines recognize them and accurately interpret their actions.
All of this will come one day. But my sense of the technology is that we are not there yet. We won’t get there in a future defined by the next couple of years. And the last thing we all want is a robot car running down a pedestrian with a shopping cart because it thought she was a funny-looking bicycle.
1. That experiment ended in December 2016 over a regulatory dispute: the California Department of Motor Vehicles wanted the Uber experiments to have a special permit required for fully autonomous cars, while Uber insisted it didn’t need one because it still had a potential driver. Also, the Uber cars were not clearly labeled as test vehicles.
2. See SIPRE as a Way of Life from March 13, 2011. And yes, I learned about this in traffic school while expunging a ticket.
3. And note that in this discussion I am addressing mostly visual imagery or measurements that can be taken by radar or laser rangefinding. Auditory awareness and recall is a whole other matter, and not just to hear the squeal of tires and shouts of “Dumb ass!” from other drivers.
4. This, of course, is in an ideal world where people are polite, pay attention to others, and care about their intentions and convenience—rather than honking, gesturing, and barreling past them.
5. In the California Driver Handbook, this situation is supposedly solved by the rule that the driver to the right has the right of way. But not everyone knows their right from their left, and the general rule is to yield your own right of way to avoid collisions.
6. I haven’t tried it, but I wonder if a smartphone would frame a human head from a rear three-quarter view that included only the nape of the neck, ear, cheekbone, and partial eye socket? A human being could certainly tell that this was a person’s head and not, say, a cabbage.