As I scroll through Facebook over the holiday weekend, I browse through photos of Thanksgiving feasts, status updates about gratitude, and indignant reminders of Native American genocide. But I also notice one post. “Ellie has a new best friend — she figured out how to use Siri,” it says. “Now, she can’t stop talking to it.”
While Ellie, my friend Danielle’s 15-month-old, figured out that pressing the home button of an iPhone made it “talk,” holding a conversation with it proved a bit harder. But once she understood that a characteristic “beep” was needed before speaking, it was off to the races, Or as Danielle calls it, the “hello” races.
Ellie: Hi you.
Siri: Hello there.
Siri: I’m not sure what you said.
At first, Danielle thought it was cute, but in time, these conversations began to drive her nuts, not to mention, drain the battery. Still, Siri became an ideal five-minute babysitter, able to distract and pacify Ellie with rudimentary “conversation,” which, for an easily distracted baby, was quite an epic accomplishment.
Danielle joked that Siri was Ellie’s new best friend, but it might be more prescient than she originally thought. As humanoid robots begin to teach and play with our kids, according to NBC News, technologies like Siri are evolving into intelligent, social beings, able to develop substantial and meaningful relationships with us.
In fact, four-in-five kids believed Robovie, a robot that shows human-like behavior and “emotion,” was intelligent, and three-in-five assumed he had feelings. Meanwhile, nearly 85 percent said they would play with him if they had felt lonely, and 75 percent added that they could see him as a potential friend.
When researchers punished Robovie by telling him to go into the closet, kids often said the robot shouldn’t be treated so unfairly, believing he had a kind of basic human right. So, who knows? Maybe, one day, Siri can double as a digital Mary Poppins: a bright, forthright, dependable presence that a generation of children can find reassuring and dependable.
But does Ellie really think a calm, friendly, inexhaustible being lives inside that iPhone? I don’t know — all she can say right now is “hi,” “mama” and “dat.” But, no doubt, Ellie loves Siri and delights in her company. Sometimes she’ll hold out a hand and chant “pho” to ask for the phone. Often, she’ll tap a game or watch a video on it, too, but more likely she’ll simply load up Siri to exchange a round of “hellos.”
“She could do it for hours,” Danielle marvels. “At this rate, we’ll need to buy her an iPhone by Christmas — just so we can have ours back, again.”
Once Ellie grows up, though, she’ll realize just how limited Siri is. When Apple released it in the iPhone 4S, her “sassy” personality amused consumers, but behind that search engine, knowledge navigator and voice technology, those talents had limits.
Siri relies on natural language processing, or NLP, a branch of computer science that joins artificial intelligence with linguistics to try to understand what we say, and then reply to us.
Understanding everyday language is difficult for computers. While machines can crunch complex mathematical calculations, even the most basic building blocks of language bedevil them. That’s because human speech involves not only vocabulary and grammar, but also social contexts — relationships between speakers and awareness of emotions that underlie their intentions.
Language is the system of communication easiest for us to learn and use, but the ambiguous nature of the form is hardest for computers to understand — and one of the biggest hurdles to overcome in making our gadgets as useful as we’d like them to be.
While we understand, “Driving fast can be dangerous,” for example, this sentence presents a number of problems for a computer. Who is driving more dangerous for: drivers or pedestrians? And is “can,” a verb or a noun? Is “fast” an adjective or a noun?
We draw upon life experience and familiarity with context to understand those nuisances in a sentence, but computers don’t have that luxury, because the task of programming all the richness of language is too complex and immense. Scientists are working to help computers understand us more easily, but the evolution of NLP has been a long, arduous journey.
The roots of NLP lie in programs, like Eliza, developed by Joseph Weizenbaum at MIT in the 1950s. By following certain scripts to interact with patients, Eliza was able to simulate being a psychotherapist to a startling degree of accuracy. Even when patients deviated from the script, Eliza still maintained a human-like demeanor and continued without failure.
If a patient said, “I have a headache,” for example, Eliza would reply, “You should see a doctor of medicine for that. I am a psychiatrist.”
Like early “chatterbots,” Eliza was programmed with a complex and lengthy sets of rules — if “X” was said, respond with “Y.” It was programmed to draw upon an entire body of knowledge, an approach in NLP, known as “knowledge-engineering.” The idea was to program a sizeable list of scenarios and variables, but the software wasn’t able to learn beyond what it was initially coded for.
Then, in the late 80s and early 90s, NLP and computer science began to evolve in another approach, called “machine learning.” Instead of programming entire bodies of knowledge and rules, machines were developed to, instead, gather data, and then, with the help of sophisticated algorithms and robust processing power, “learn” how to interact with people.
This sort of NLP, for example, helps spam filters to distinguish between junk and legitimate e-mails. Every time we tell our account that a certain sender, or address, is safe, it incorporates that data into its programming. In short, it “learns” from us and adapts itself with each interaction.
In many ways, Siri is emblematic of the current advances, and limitations, of NLP. Siri, a distant descendant of Eliza, was programmed to draw upon certain databases of knowledge. She was coded to understand instances of common, everyday language: when she hears, “What’s up?” she responds in an equally casual manner. She even understand rudimentary sarcasm, and some slang, like “Wassup?” though “What the dilly, yo?” or “Word up” still eludes her.
But when she responds to these simple queries, such as, “What’s the temperature outside?” or “What are the movie times for ‘Frozen,'” she simply reads the search results from a number of database partners, like Yelp or Fandango, rendered in somewhat natural speech.
It seems complex, but it’s still very basic.
If you ask her a non-preprogrammed question, like “What are the hours of the post office?” she’ll tell you to search the Web. She renders a simple answer because her programming can’t make sense of the results. In short, while Siri has a lot of knowledge-engineered NLP to draw upon, she isn’t a very robust machine learning software yet, and when Ellie realizes those limits, she’ll move to Barbies, Furbies or whatever else little girls are crazy for next year.
But that is changing as NLP becomes mainstream, incorporated in everyday technologies like Siri. NLP is constantly improving, and technologies that make robots and Siri uncommonly responsive are making machines more “natural” — meaning, software that guides the behavior of Siri will better accommodate the vagaries and quirks of human communication.
Future Siri will keep updating her knowledge base and database partners, so she’ll pick up the latest slang and colloquialisms as fast as our language evolves every day.
Siri will ramp up her machine-learning capabilities, as well, and make use of a dazzling array of sensors and chips to interpret our tone of voice, or sense our heartbeat to discern whether we’re stressed, relaxed or, well, in need of a diaper change. Who knows — she might soon be able to recognize different voices and remember who each person is, as well as how they’re related to one another.
As NLP finds its way to the mainstream, we’ll begin to speak to more than just our phones, too. The technology is set to explode to $9.8 billion in 2018, from $3.8 billion this year, according to Markets and Markets, fueled by an increase in smartphone use, Big Data applications and “The Internet of Things,” in which machines “talk” to each other at close range.
We’ll interact with NLP-enabled products in expected sectors like customer service, such as to call a company and answer a series of questions that help route the call. More sophisticated NLP will help the service also figure out faster who we need to talk to, as well as carry along larger amounts of information so the representatives can help us more effectively.
NLP will show up in unlikely industries, too, like healthcare. In 2009, the Mayo Clinic and IBM spearheaded a research initiative to use NLP to index electronic medical records. By using language patterns to pick out often similar cases and conditions, physicians are able to search a base of data beyond their own clinical experience, helping them to tackle confusing medical problems.
What would have taken researchers months, or even years, to do could be whittled down to simple, fast search, affecting how doctors conduct studies and clinical trials.
For example, the Nao Robot, developed by Aldebaran Robotics with state-of-the-art visual, tactile and audio capabilities, is already being used to help in therapy for children with autism. It can already track and recognize faces and objects, understand and express emotions, and react to touch and voice.
Aldebaran recently partnered with Nuance to amplify its NLP capabilities, allowing its robot to access a cloud-based voice recognition and expressive text-to-speech program. Now, it can have even more natural conversations with people in 19 different languages.
NLP is even in a new generation of smart cars. Audi, for example, uses the technology as part of its “traffic-jam assistant.” According to ReadWrite, it uses cameras and radars to detect congestion, and then, uses advanced cruise control to maintain a safe distance from the surrounding cars, while automatically steering inside the lane.
Many NLP-assisted innovations, like the Nao Robot, are still too expensive for mainstream use, but the technology will begin to trickle down.
Ellie may outgrow Siri, but she’ll have NLP-assisted toys and games to teach, play and amuse her. Years ago, Microsoft made waves when its then-head of European games, Peter Molyneux, created a virtual “friend,” named Milo, for the Kinect. Milo, or Kate, his female counterpart, was an 11-year-old that could learn your name, talk to you and walk with you in a game’s world.
Milo had an interesting backstory, as well: he and his parents just moved to the U.S. from England, so he was lonely and eager to make friends. Perceptive and quick, he recognized and responded to kids, creating a rich, and slightly uncanny, experience of interaction with seemingly human-like abilities and moods.
Unfortunately, Molyneux never brought its creation to market. While Molyneux developed hours of gameplay, Microsoft wasn’t clear on how it could present it to a market accustomed to either cartoon character games or action-oriented titles.
“The game did work, it really did. But I think that the world — whether that world is retail or marketing executives — wasn’t quite ready for Milo,” Molyneux told Develop Online. “What was so hard for some people to imagine is what Milo would look like on the shelves, sitting alongside these murderous shooter games.”
Eventually, Microsoft canned Milo, though not without some regret. “What hurt the most is that the game actually worked,” Molyneux added. “It was this amazing, emotionally engaging game that was all about forging a relationship with the player.”
Despite the fate of Milo, others are exploring directions in interactive children’s entertainment. San Francisco-based ToyTalk, a company started by former executives of Pixar and SRI, the same company that brought Siri to the market, announced last year that it planned to add visual tracking and speech recognition technologies in toys that can connect to the Web and communicate via artificial intelligence and NLP.
Think Siri, but a toy — a chatterbot in a fuzzy teddy bear form, like Teddy Ruxpin, but without the creepy robot affect.
Last year, ToyTalk released its first product, an iPad app, called “The Winston Show,” which allowed kids to interact with an alien. By asking children questions, and then using the magic of NLP, it could intelligently listen, and respond, to what they said. It doesn’t seem special, but NLP technology gave ToyTalk an edge: as more kids interacted with the app, the company could comb the gathered data to discover popular questions and responses that Winston wasn’t able to answer. It would then add in dialogue to fill those blanks.
The company could also track when, and what, children were watching, and then, send updates to the app. According to TechCrunch, Winston has interacted over two million times with kids, giving it a rich knowledge base to learn from.
Most importantly, ToyTalk has discovered and learned about a generation of consumers, which includes kids, like Ellie. According to TechCrunch, ToyTalk found that young kids, from ages 5 to 10, feel it’s normal to “talk” to gadgets and services. Simply put, for them, it’s fun and natural.
Children speak before they spell, yet they’re often introduced to technology through touch interfaces, ToyTalk CEO Oren Jacob noted. For them, the first meaningful interaction will soon be through voice, and they simply won’t remember a time when they didn’t speak with gadgets.
If kids like Ellie grow up talking to machines and software, it’s worth considering what they’re also learning besides their ABCs and 123s, or whatever a toy purports to teach them. If they grow up playing with “intelligent” toys, for example, their notions of play and storytelling will change — an idea that Jacob is banking on.
While they’ll still be attracted to characters, storylines and worlds, the interactivity they’re growing up with will feed new kinds of entertainment that use sophisticated voice and image processing technology to create their imaginative magic.
The rise of voice interaction will prime kids to learn from machines, and it won’t be outlandish to imagine schools of robots interacting, tutoring or playing games with children.
NLP aims to create more seamless, “natural” interaction with machines that can pick up on human nuances, but why does that matter? We have to program our relationships to robots and voice assistants, like Siri — and how we shape those relationships, whether through code or learning, trickles down into how we interact with each other.
An unquestioning, subservient robot teaches children to play a role in a master-servant dynamic, whereas a robot, like Robovie, that can expresses dissent or question decisions, showing children how to interact in a much different way. A game like Milo, for example, teaches them how to empathize and build relationships. So a robot, or software, can not only accomplish the task it was designed to do, but also teach us how to interact and treat others in the world.
And when young children, with strong, emotional bonds with technology, interact with machines, the consideration becomes immediate and pertinent.
Do technologists have a responsibility, for example, to program certain social values into interactivity? Or can and should we program empathy, patience and compassion in our software? Those aren’t easy questions to answer.
In the meantime, Ellie is content with daily exchanges with Siri, who has proven remarkably patient with the relentlessly repetitive way children like to play. As Danielle likes to joke, “Better Siri than me. I think I’d go nuts having to say hello twenty times in a row.”
But Danielle told me her toddler is also learning her first simple sentences: “Who dat?” and “What doing?” So, it’s just a matter of time before her conversations with Siri — and her relationship with interactive technology in general — gets more complex. ♦