Talking to your digital assistant is often a one-sided affair. The primary purpose of a digital assistant is to listen, but their responses might seem cold.
If a human spoke to you the same way, you would think they were being passive aggressive, in order to leave as soon as possible. While it makes sense from a utilitarian perspective, it runs counter to the stated goals of digital assistants – like Siri, Google Assistant and the rest – that being conversational language.
In order for these 'virtual-friends', they have to understand more than just words, they have to take into account context, and even the mood or personality of the speaker.
Most digital assistants use AI to understand the context to some degree, so they can respond to follow-up queries. Google voice search has been able to respond to follow-up questions since at least 2013. You can ask, "Who is the prime minister of the UK?" and, when Google tells you it's Theresa May, you can follow up with, "When was she born?" to see that May was born on Oct. 1, 1956.
However, real, human conversations involve both parties often listening and speaking at the same time. If you're responding to someone, you're constantly reading their expressions and adjusting. You may even interject from time to time, so it's essential that the other party is also observing, ready to respond to such moments.
Digital assistants are really bad at all the subtle stuff that happens in a conversation today. Microsoft saw the opportunity and snuck out a recent announcement that gives hope to the future of tech-talks. The company claims they have taken a big leap in conversational AI by introducing "full duplexing" which is the ability to have a verbal conversation with an AI-driven assistant where it can speak and listen simultaneously.
"It’s the biggest deal," Shane Mac, CEO of Assist, a company that builds custom chatbots and assistants. "This is the solution for the ‘I do not understand’ problem. This is how the act of listening becomes understanding. When [digital assistants] start doing this stuff, people will say, 'Holy shit, that’s what I want.’"
This can be seen as a convenience, though it would mean we'd need to re-orient our expectations for when an assistant is actually listening to the content of what is being said and not just buffering audio to scan for a wake word. It also changes the privacy equation, and users would need to clearly opt-in to being listened to – and be recorded in some form – more often.