In this insightful episode of the Tech Talks Daily Podcast, I sat down with Tobias Dengel, CEO of WillowTree, a leader in the digital product industry. Our conversation revolved around the transformative potential of voice technology and its role in reshaping our interaction with devices and machines. Tobias brought a wealth of knowledge and foresight on how the latest advancements in AI and voice technology are set to revolutionize our daily lives and the business landscape.
At the outset, Tobias and I explored the concept of the complete reinvention of human-device interaction. We delved into how voice technology, powered by cutting-edge AI, is not just an emerging trend but a pivotal shift towards a more intuitive, efficient, and human-centric way of engaging with technology. Tobias emphasized the evolution of digital experiences towards multimodal interfaces, which integrate spoken words, screens, keyboards, sounds, and haptics, creating a seamless connection between humans and machines.
A crucial part of our discussion focused on crafting exceptional user experiences in voice technology. Tobias highlighted the need for a multimodal approach, where voice is used for input and other output forms like text or graphics. This approach, he believes, is the breakthrough we need to enhance our interaction with technology, making it more natural and effective.
We then tackled the challenges inherent in creating relatable and compelling voice interactions. Tobias and I discussed the crucial balance between making voice technology relatable yet transparent about its non-human nature. We also touched upon the sensitive data collection and usage issue, underscoring the need for clear communication and ethical practices in this domain.
Diversity in voice technology development was another significant topic we covered. Tobias highlighted the progress made in training systems to recognize a variety of accents and languages, underlining the importance of inclusivity in this field. He also advised businesses on implementing voice technology, advocating for a step-by-step approach that focuses on solving bounded problems and gradually building trust with users.
An exciting part of our conversation was when Tobias discussed his book, which delves into the convergence of General AI and conversational AI in the voice technology sector. He shared his prediction about the future of Siri and other voice assistants, envisioning a scenario where they become integrated layers within various applications, facilitating a more fluid and natural voice interaction experience within apps.
This episode is a must-listen for anyone interested in understanding the future trajectory of voice technology and its implications for both personal and professional realms. Join us as we unravel the complexities and explore the immense possibilities of voice technology with Tobias Dengel, a visionary in the digital product space.
[00:00:01] Welcome back once again to the Tech Talks Daily Podcast. I'm your host, Neil C. Hughes, and today we're going to be diving into the groundbreaking area of technology that's reshaping our interaction with the digital world. Yes, I'm talking about voice technology. And joining me today is the president of a company called WillowTree, which is a leader in digital product design and development. But my guest, Tobias Dengel, he's an expert in this field. He's got a Wall Street Journal bestselling.
[00:00:30] book out at the moment on voice technology, too. But I want to talk about how voice technology is not just a trend, but a transformative force in the tech landscape. From simple commands on Alexa or Siri to business applications, I think voice technology is making digital experiences more intuitive, efficient and human centric. But there is one challenge that still remains, building trust with users. So we'll talk about the principles of creating trustworthy
[00:00:59] voice tech tools today, the importance of transparency and how to integrate voice technology into existing business systems. But enough for me, let's dive into this fascinating conversation to discover how voice technology is not just changing the way we interact with our devices, but how it's also reshaping industries and entire user experiences. So buckle up and hold on tight as I beam your ears all the way stateside, where today's guest is waiting to share his story.
[00:01:29] So a massive warm welcome to the show. Can you tell everyone listening a little about who you are and what you do? Yeah, so my name's Tobias Dengel. I'm an engineer by education, I should say.
[00:01:42] I started my tech career at AOL. And what I learned there was really how important it was that we create incredible user experiences. I think AOL won the first online war in the US because it had the best user experience and everyone was super focused on that.
[00:02:03] Obviously, Apple's another story on that in that same way in mobile. And I followed that. So in 2010, I started a company called Willow Tree, which was focused on helping, originally helping clients use this new ecosystem of apps on first Apple, then Android.
[00:02:21] And then we turned it into a full digital product agency where we really helped clients with anything they need to build in digital that has to be super performant and a very high level of user experience.
[00:02:35] And what a great story and a great career you've had there. And as you were talking about it, I could still hear that sound of the US robotics modem connecting, hearing the words, you've got email from the AOL days. It was a magical time, wasn't it?
[00:02:51] It was super fun. And what's interesting about it for those of us who lived through it is an incredible majority of the experiences we have today were already present at AOL. Now, they weren't as good. They weren't as fast. They weren't as polished. But a lot of what we do today and that we think is innovative was already there in some way, including voice experiences.
[00:03:15] And I'm glad you mentioned voice experiences. That's one of the reasons I was excited to get you on here. And as business increasingly look to integrate voice technology into their operations, I'm curious from your standpoint, what should be the key considerations they should keep in mind to ensure these tools are not just innovative, but also trusted and seamlessly integrated into that existing ecosystem?
[00:03:38] Yeah. So, you know, we started thinking about voice really 10, 12 years ago when Siri came on the market. And then obviously Amazon launched Alexa, which until chat GPT was the fastest adoption of any tech in history. And so at Willowtree, we're sitting there saying, why do people love voice, A, but B, why is it not changing the world the way mobile did or the way the internet did? I mean, we have these voice experiences, we use them for this and that. They're in our kitchen.
[00:04:06] We turn on the music, but it's not life-changing per se. And when we really started to break down the problem, the fact is we as the industry, we as consumers are using voice all wrong. And I think that's what's going to change in the coming years. And really the way we all experience gen AI is going to be primarily through voice, through conversational AI first.
[00:04:31] And the simplest way I can break this down is we love voice because it's so fast. We can speak three to five times as fast as we can type. We hate to listen to voice because it's so slow to listen to versus text, et cetera. And we see this every day, right? When I, if I were to send you a text, Neil, I would probably dictate it to my phone because it's so fast. But if you send me one back, I don't want to listen to it. I want to read it.
[00:04:57] And so humans are already adopting the technology, even though it's not really designed that way. And the giant mistake the industry has made is we created these voice assistants to simulate humans. And that just doesn't work. What instead we want to do is ask Siri, Alexa, whatever, or our app, what movies are playing tonight? And then we see it on a screen and then we read it and then we say, all right, two tickets at 8 PM for Star Wars, right?
[00:05:23] And so that experience, that multimodal experience is going to be the breakthrough where voice is a way humans transmit to machines, but the machines either do something or respond to us with text or graphics. And that we are going to start seeing that everywhere here very, very quickly. I'm glad you mentioned that about the SMS message there, the voice notes that a lot of people use. It's great for the sender, as you said, but not so good for the person receiving it.
[00:05:51] And one of the things that attracted me to you, where you stand out to me, one of the reasons I want you on this podcast is when you said that we're doing voice tech wrong. And you advise against trying to make voice tools seem fully human. So can you elaborate on the balance between creating relatable voice interactions and also maintaining transparency about the non-human nature of these technologies too?
[00:06:15] Yeah. So it all comes down to trust and humans form trust in two ways. One is effective trust or emotional trust, which we really do in a couple seconds. It has very negative impacts on bias, et cetera. It's just the snap decisions we make and we have all maybe experienced in our lives. Like we fall in love with someone that we meet for the first time or in the subway or whatever with a tube, as you would say. But then, and that's based on emotion.
[00:06:46] And we have very little way as a machine to impact that. But what we do know and what we've studied heavily is this concept of the uncanny valley, which came out of Japan in the 1970s, which posited it has been proven that the more human like you make something, but it's not human. And we know it's not human, the less humans trust it. And so we really have this negative emotional trust. And then the second piece is cognitive trust.
[00:07:15] Over time, humans decide to trust each other based on data. And ultimately it is, does this person do what they say they are going to do? And we apply the same thing to machines. And so voice has failed both tests, right? The uncanny valley makes it weird and odd and not really fully trustworthy. And then over time, it hasn't been highly performant. It screws up our dictation. And then it's slow and responding because we got to listen to it. Both of those things are going to get fixed.
[00:07:45] But I do firmly believe that making these humanoid voice assistants is going to be counterproductive. What we want to do is we want to speak to machines and then have them do something. And by the way, Neil, all the nomenclature in the business is backwards. The things in our houses, they're called smart speakers. We should be calling them smart mics, right? The whole rollout of voice has been backwards of, I think, where it's going to end up being.
[00:08:13] And in the context then of earning user trust, how should a company approach the explanation of data collection and usage by voice tools or those smart mics? And what level of transparency is needed? How can this effectively be better communicated to users too? Because it's such a fine balancing act, isn't it? Yeah. I think, again, one of the interface complexities or mistakes that got made is this concept of the wake word.
[00:08:42] And so when you need a wake word to tell a voice system to do something, by definition, the system has to be listening all the time. Yeah. And so, yeah, Amazon, I think, is well known that they had to stop all the recording, et cetera. And I think they were hopefully doing it for good reasons. Who knows kind of all the ways that they were using it. But basically, the system was listening to you all the time.
[00:09:09] And I think the disclosure around, A, the fact that it was, because I think a lot of users just didn't understand that. And B, what happens with that data? Is it kept at all? Is it erased immediately? Does it kind of, you know, how it's dealt with has to be clear as day. What I think is even more important is that, and this is happening now in the industry, is this concept of the wake word needs to go away. We as humans need to be able to tell the machine, now it is an appropriate time for you to be listening to me.
[00:09:39] And now is not. That's one of the reasons, many reasons, is we think most of the voice experiences are actually not going to be through Siri and Alexa, which we think are more like AOL. Like they're the training wheels that get us to the next level. That really most of our voice experiences are going to be directly to an app. So you're going to take your banking app and you're going to hit a giant mic button and you're going to tell it what to do.
[00:10:03] Give me my balance, move money, order me new checks, transfer money to my brokerage account, whatever it is. That's going to be the experience and you're going to be able to turn it on and off as you will. But you'll be having a voice experience with the brand directly versus trying to go through Alexa and Siri. If HSBC was your bank, how would you even know what you can do with Siri or Alexa to communicate with that bank?
[00:10:30] It's just the discoverability problem is real. And of course, another big topic right now is diversity and voice tech development. So, again, I'm curious, how does incorporating more diverse sources of information and talent influence the development and acceptance of voice technology tools? And I don't know if you do, but can you provide any examples where this kind of diversity has led to more successful outcomes as a result?
[00:10:59] Yeah, so there's a couple of different angles to that question that I'd like to explore. So one is most of the early assistants were all female. And the reason they were was because they tested much better than male assistants, both in male and female audiences.
[00:11:19] But ultimately, when you study that, the reason is bias is that men and to a large extent women felt comfortable telling a female bot what to do than a male voice bot what to do. So just because it tested better doesn't mean it's the right thing to do.
[00:11:35] And so I think all the companies have recognized, all the major companies have recognized that and are now coming up with either voice assistants that are indistinguishable, whether they're quote unquote traditionally male or female or users can choose. So that was a very early issue in voice. And it manifests itself in our whole, my 11-year-old is always telling Google Assistant what to do.
[00:12:02] And it drives my wife crazy because my wife feels like he's learning how to yell at women, basically, because he's like, Google it wrong. And so there's something going on there that I think we have to be very conscious of. The second thing in terms of diversity is like a lot of systems and Jenny, I ran into this too. The early testing was done basically by the technologists themselves, which happened to be predominantly white male or Asian male.
[00:12:29] So the accents or languages especially, but accents that were not common were really slow to pick up. And so one of the things that's happened over the last two or three years is the major voice platforms have all gotten much more diverse in their training and that's shown up. And so now Google says with confidence that, you know, their transcription is at 98%. And that's across all, that's in English across all English speakers.
[00:12:58] But taking accents into account, which took quite a while. I think the interesting thing about voice now is languages is, you know, obviously the large languages that have 20, 30 million or more speakers are early adopters of this technology because you just have scale.
[00:13:16] But the real magic of voice in terms of diversity is, for example, India has over a thousand languages, is allowing some of these smaller language sets to, for the first time, have access to the digital world. Because, you know, many are illiterate as well. And now they can have, for the first time, full exposure to the internet and to digital via this technology.
[00:13:41] And I was also reading that you suggest applying voice tools only to purposes that suit their current capabilities. But if we look into the future, how do you foresee the evolution of these capabilities? And what would you say are the current limitations that businesses should be aware of too?
[00:13:59] Yeah, I think, you know, back to the trust thing, when a new technology comes, it's really important to take it step by step and not try to jump too far because you end up with implementations that ultimately set the technology back and set your brand back as a company. So you have to be careful and take this step by step and really focus on the bounded problems. And we talk a lot in the book about kind of bounded versus unbounded problems.
[00:14:27] One of the problems that the assistants have, like Siri or Alexa, is they kind of, if they're supposed to be human, they can do anything. They can tell you the weather. They can tell you kind of what's going on in the Middle East. They can tell you what the history of Russia is. I mean, they could do all those things. That's really hard to do. But what's much easier to do is to train a system to do everything that you might do with your financial company or bank.
[00:14:54] And then as soon as you get out of that and you start asking it something like, what's the weather? You just say, you know what? That's not what I'm here to talk to you about. And those bounded problems are where the technology is today. And so focusing on those bounded problems supported now by Gen.AI, which is really a breakthrough in helping voice get much better. That's where the magic is today. Now, five, 10 years from now, I think the unbounded problems become solvable.
[00:15:23] But we have to take it step by step. And if you're a major brand, the last thing you want is the launch of bot experience for American Express that starts falling in love with its users, right? Like that's you can't have that. And so bounding the problems, kind of working with partners that help you figure out how to launch high, you know, high fidelity, low risk bounded problems is step one.
[00:15:50] And on behalf of any business leaders listening to our conversation today, what would you say are the most significant barriers to trusting voice technology? And what strategies do you think could be most effective in overcoming some of these hurdles that we're mentioning today?
[00:16:06] Yeah. So I think going in a multimodal way where the humans who are using it aren't equating it to like an assistant or another human being, but they're equating it to is this a system that just helps me get done what I want to get done, right? So it's changing the mindset, right? Right now, the mindset of both the users, but of large companies is let's say your British Airways or Delta Airlines, right?
[00:16:35] You try to do something in the app, like your flight gets canceled, you try to reschedule and it's not working. What's your response? Your response is to call likely into a call center, likely with a long wait time, likely with a very frustrating authentication experience, etc. The optimal use of voice is that in your app, you hit the mic button and you just say, rebook me to the next flight, what are my options? And then they show up on your screen.
[00:17:04] And now the system is using voice to do something that is very, very important and powerful and faster than the alternative, but in a very high trust way, because you're not in your mind thinking like, is this a human being? Is this not a human being? You know, it's not a human being. You're just telling the app what you want to do. That's the future. And I think that's going to hit us real quick. People keep asking me the great question. How do we know this is happening? And how will we know voice is here?
[00:17:33] I always say, but when you start seeing mic buttons in every app and it starts to be the primary way you interface, instead of trying to find what to do and enter numbers or anything, and you're just talking to your apps, that's when we know that this is arriving. Right. And a huge congrats to you because The Sound of the Future, your book is a bestseller on so many of the bestsellers listed. I've got to be incredibly proud.
[00:17:59] Can you tell anyone listening about that book and what they can expect from it? Yeah. Yeah. So the book is the long form of what we're talking about today, right? And really diving into it, but diving into it, both the technology, lots and lots of stories and examples.
[00:18:17] And then really a how-to at the end of how to get started in voice and how to really start thinking about it, really aimed at business leaders, but also general interest readers who just kind of want to know what is going on in voice tech, which in the industry is often called conversational AI. And it's funny, I have an incredible amount of respect. I've always had respect for people who wrote books. Now I have an extraordinary amount of respect.
[00:18:43] It took us several years to put this together, lots and lots of interviews, lots of fact-checking. I have lots of respect for books now because I realized that anything that you actually publish through a publisher gets heavily fact-checked. But, you know, as we were going through the process, all of a sudden Gen AI showed up and we're saying, oh my gosh, are we talking about the wrong AI here when we're talking about voice and Gen AI? And then it very quickly became apparent that the magic is going to be the merging of Gen AI and conversational AI.
[00:19:13] And by pure chance, our book launched October 10th. The week before, OpenAI announced that their biggest innovation now is that it's going to be a voice-first experience. And so we just got really lucky with the timing. But Gen AI, we talk about it in the book, even before it was as big as it is now, because it's what powers voice.
[00:19:33] Because we're never going to be 100% perfect on transcription, but by applying Gen AI to it, you get a much better idea of what the intent of the speaker was, even if he didn't get all the words right. And then as another example, you know, I get asked all the time, like, yeah, I can dictate everything, but then I got to go back and edit it and get all the paragraphs right and blah, blah, blah. But Gen AI is going to do a lot of that. So you're going to speak whatever you want, and then Gen AI is going to create a draft, and then you might speak again to tell it what you want to change.
[00:20:03] And then at the very end, you might do a couple things using your keyboard. But I've already talked to a lot of authors who are approaching working that way. And it's super interesting. My kids are all voice-first, right? They tell their remote control what they want to watch on their television. They ask the Google Assistant every morning about the weather and sports scores from last night. I, you know, recently told a story. I put my son to bed because he wanted to watch a Champions League game, and I was like, you got to go to bed.
[00:20:33] And he told me he wanted to go to boarding school, and he walked over, leave the house, go to boarding school. He asked Google, he's like, what boarding schools except sixth graders? And then Worcester Academy came up, and he asked Google, he said, Worcester Academy, that's what I need an application to. Google, make me an application to Worcester Academy. And, you know, that's how he thinks. Wow, man, that would be scary for business leaders listening all over the world.
[00:21:01] It would be so exciting at the same time. And we are at that time of the year where futurist technologists are making predictions for what trends might dominate the year ahead. So what are some of the most exciting or maybe even unexpected trends that you might anticipate that could shape the future of voice technology? I'll ask you to stare into my virtual crystal ball. Would you like to predict anything?
[00:21:26] Well, I think Siri, as an example, is going to open up to be used in all applications as a layer and as a part of the software system versus a standalone experience. And I think that's going to be the big breakthrough. One of my favorite interviews I got to do in the book was with Adam Shire, who's the founder of Siri. And he talks about how they were acquired by Steve Jobs, how that whole process happened.
[00:21:51] But also he talks about how Steve Jobs had a bit of a different vision for Siri than how it's manifested itself over the last 10, 12 years. And I think as Siri opens up and allows us as developers to use it in our apps versus having to go through Siri to go to something else. That's really the vision that Steve Jobs largely had and that the Siri developers had even before being acquired by Apple.
[00:22:15] So I think the big change next year is instead of going through Alexa or Siri or Google Assistant to get to a voice experience, you're going to open your app and the voice experience is just going to be there. Whoa. Oh, man. What an interesting point to finish on. But before I do, I think we should bring WillowTree into this because I know the book has consumed much of your life over the last few years.
[00:22:38] But as the president of WillowTree, how has your company approached the design and development of voice interfaces to help ensure they are trustworthy and add tangible value to your client's business? Because this is much more than a book. This is your work, isn't it? Yeah. And that's why we got so interested in it. I think over the next couple of years, virtually every website and every app especially is going to have to be completely redesigned to be voice first.
[00:23:05] WillowTree was acquired this year by Telus, which is one of the large telcos in Canada and who also has a subsidiary, Telus International, that WillowTree is now part of, which is one of the largest BPOs or call center companies in addition to being an AI services company and a content moderation company.
[00:23:29] The vision there is that as voice becomes the pervasive way that we interact with our apps, it fundamentally changes how customer service happens. Because a lot of the customers, what we consider customer service today is really going to merge into the app experience. So it's going to be one experience. Now, there will be situations. And I know you had someone on recently talking about this exact same thing of the merging of humans into the customer service center. There will be situations where you start talking to the app and it doesn't know the answer.
[00:23:58] And then you want to hit a button and get to a human. The great thing we're going to be able to do is when you talk to that human, you're coming from the app. So we already know who you are. We already know you're authenticated. You don't have to re-authenticate. You know what you've been doing in the app. And so the human experience is going to get so much better than it is today. And that's really why we did the partnership with Telus to bring this combined end-to-end customer experience to our clients in one place, which I think is the next big thing that's going to happen for most of our clients.
[00:24:29] So in times ahead and before I let you go, I always like to have a bit of fun with my guests. I'm now going to ask you to leave either a song that means something to you that you enjoy that we can add to our Spotify playlist or a book that you'd recommend that everyone reads. And I'll add that to our Amazon wishlist. All I'm going to ask is what part in GIF would you like to leave everyone with and why? Yeah. Well, so I'll say this. The best book I've read in the last five years is Progress Principle by Teresa Amabile.
[00:24:59] She's a researcher from Boston. I don't know if you've read it, Neil. But we spend so much time thinking about strategy and kind of even execution. But what really I think most of business success comes down to is the team you build around you, wherever you are in the org. Like you could be the CEO of your team or the next team or the next team or the next team. And she's done a ton of research of what makes for great teams. It's the progress principle. I love it. I will say this. I was born in 1970. The first album I ever bought was Combat Rock by The Clash.
[00:25:29] Straight to Hell, I think, is my favorite song of all time. But I can't leave you without a shout out to The Clash. Oh, man. Now you've mentioned The Clash. We've got to – I'm going to give you both there. We'll put the book on the wishlist on the song on the Spotify playlist. You cannot go wrong with The Clash. For everyone listening wanting to find out more information about your book or look at the work you're doing at Willow Tree, could you point everyone listening in the direction of a few websites where they can get everything they need? Yeah.
[00:25:58] So willowtreeapps.com is our company website. You can find out more about the book at tobiasdengel.com. So it's T-O-B-I-A-S-D-E-N-G-E-L.com. Or just find me on LinkedIn, Tobias Dengel. I'm the only one. So easy to find. Well, it really does feel like we're on the precipice of a complete reinvention of how we interact with devices and machines. We've just got to look around at different generations and how they're interacting. We can see it happening right now.
[00:26:27] And when we start looking at things powered by the latest AI advancements, voice technology, for me, is undoubtedly poised to drastically alter the way we live and how companies do business. But I love how you busted a few myths today and how we're doing things wrong. And referring to smart speakers when they're smart mics, for example. But another thing, a great point you mentioned, I think we should also raise is how we have female smart digital assistants.
[00:26:56] And we're teaching children to demand things from a female or just yell at females, not use manners and please and thank yous, etc. So many big talking points. But just thank you for sharing that with me today. Thanks, Neil. It's been wonderful. I think today we traverse the landscape of voice tech together, understanding and have achieved an understanding in its potential to revolutionise our digital interactions. And also, of course, the crucial role of trust in its adoption.
[00:27:24] And as we continue to navigate this exciting era of technological advancement, remember that the integration of voice technology into our digital experiences is not just about the technology itself, but about building a foundation of trust and understanding with users. And please make sure you join me again next time as we continue to explore how technology is addressing real world challenges, transforming our lives and businesses. But until then, keep embracing innovation, pushing the boundaries of what's possible.
[00:27:54] And until next time, don't be a stranger.

