2815: Unravelling the Fallacy of Scale in AI with Trevor Back | The Tech Talks Network

In today's episode of Tech Talks Daily, we explore the intriguing intersection of artificial intelligence, machine learning, and real-world applications with Trevor Back, the Chief Product Officer at Speechmatics. But what does the journey from computational astrophysics to spearheading innovation in speech technology look like?

Trevor's unique trajectory from academia to being a pivotal figure at DeepMind and now leading Speechmatics unveils a narrative that challenges and reshapes our understanding of AI's potential.

The conversation delves deep into critically examining AI's current dependency on large datasets. Trevor articulates why the industry's fixation on scale is a fallacy that we must move beyond. Instead, he advocates for a shift towards more efficient learning methods. These approaches enhance quality and inclusivity and pave the way for models that comprehend human intent more accurately. But how can speech technology evolve to facilitate seamless voice interactions that render technology invisible, seamlessly integrating into the fabric of our daily lives?

Trevor shares insights from his extensive experience, highlighting the transformative power of models capable of learning at higher levels of abstraction. This capability is not just about refining AI's understanding of language; it's about building systems that genuinely grasp context and intent, offering personalized and genuinely valuable interactions.

Furthermore, the discussion touches upon the significance of inclusive and accessible AI. By prioritizing sample efficiency, Speechmatics is at the forefront of developing technology that understands diverse voices and accents, reflecting a commitment to making speech technology human-like for all users.

Yet, the journey is not without its challenges. Trevor candidly addresses the limitations inherent in large language models, including their tendency towards hallucination and the lack of personalization. The future of speech technology, as envisioned by Trevor, is one where voice interactions become so natural and intuitive that technology fades into the backdrop of our lives. But what does it take to reach that future? And how close are we to making this vision a reality?

As we conclude, Trevor shares a personal touch, revealing a book that has inspired him and a favorite song that resonates with his journey. These reflections add depth to our understanding of the man behind the innovation and offer us a glimpse into the broader cultural and intellectual influences that shape the development of AI and speech technology.

Now, we turn the conversation over to you. How do you see the role of AI and speech technology evolving shortly? What are your thoughts on moving beyond the fallacy of scale towards more efficient, personalized, and inclusive AI systems? Join the discussion and share your insights as we continue to explore the profound impact of technology on our lives and businesses.

[00:00:00] Have you ever wondered why despite the vast amounts of data that's fed into artificial

[00:00:06] intelligence systems, we're still grappling with technology that misinterprets our commands

[00:00:12] or fails to grasp the very nuance of our human speech?

[00:00:16] Well, in today's episode, we're going to attempt to peel back the layers of this pervasive

[00:00:20] myth in the tech world, the fallacy of remote work is ever expanding. The security and efficiency of your managed file transfer solution are paramount. This is where KiteWork sets a new benchmark for surpassing legacy MFT tools with its unparalleled security measures and user-centric design. They've even been awarded the prestigious FedRAMP moderate authorisation, a recognition that is not easily

[00:01:43] obtained and they've held it since 2017 by the Department of Defence. So what I'm the chief product officer near a company called Speechmatics. It's based in Cambridge in the UK. I joined about nine months ago or so, and Speechmatics is a speech recognition company whose mission is to understand every voice. We offer the most accurate speech-to-tech system in the industry. My background is, I was basically obsessed with space as a kid,

[00:03:04] and so I ended up doing a PhD in computational astrophysics, One of the things I try and do every episode is demystify some of these complex technologies, talk about it in a language everyone can understand, and also talk about some of the areas that we don't see in our newsfeed. To set the scene for our conversation today, could you start by explaining the concept of the fallacy of scale in AI and why at speechmatics you advocate for a shift away from large

[00:04:22] data sets towards more efficient learning methods?

[00:04:25] Because it's something that we don't readmatics is in the speech and the audio world where this challenge has always been present, this challenge of having smaller data sets. It costs a lot more to label an

[00:05:42] audio data set than it does to label text or images. You need to have this human sit with much less training data, we're still able to achieve much higher accuracy on those. And then it also means that for people with accents or dialects or speech impediments or localization or those types of things, we're still able to achieve very high accuracies for those individuals with less well-represented groups because we're more efficient on this data problem.

[00:07:04] So I'm curious, how does focusing

[00:07:06] on smaller, more efficient data sets, and transcribe their voice in a very accurate way where other systems are not able to handle it as well. It also means we can train on any new languages with just a handful of labeled data. We've recently released Hebrew and Persian, where it's only taken an order of weeks to get from, we don't have a system there, to a few hours of training data

[00:08:22] to now a highly accurate system that's

[00:08:24] being used by customers.

[00:08:26] But ultimately, it also enables us base model needs what's called a context window of around 300 milliseconds to be able to recognize those words. But a word on its own doesn't provide much context. Imagine if I said Green Park in a sentence, do I mean the underground station in London or do I just mean a park with a lot of greenery? The way we infer those types of semantics as people is we try and

[00:09:42] understand the broader context of what's being models for the different person that's using the system. And so it's really important that these systems

[00:11:02] are able to understand the broader context,

[00:11:04] are able to sort of learn from our history of interaction, sarcasm in the future for example. Yeah and I think that frustration with voice tech was illustrated perfectly Alex, the new season of Kirby enthusiasm, Larry David completely losing it with Siri, I urge anyone to check that out that hasn't seen it. But another huge topic right now is increasing minority representation in speech, crucial for all the right reasons. And how does

[00:12:21] speechmatics approach contribute to that goal? And what are some of the challenges that you've screens and all that, we'd just be able to interact with it with our voice. But if we don't include inclusivity and accessibility as a key design principle in how we develop these systems, you know, so many people will just be left behind. And so speech matrix as like, you know, core part that drives our sample efficiency, this sort of fallacy of

[00:13:40] scale problem, because it allows us to learn from the smaller data sets, it allows us to see this evolving? Yeah, the AI field is advancing so incredibly quickly. But with every new technology, there's always limitations. And a clear and obvious limitation of large language models at the moment is their hallucinations. This is when they make things up, but they also sound overly confident about what they've made up. And this makes them less accurate, less reliable.

[00:15:05] And so a really big question in the AI field at the moment is whether this limitation can and limitations. Another clear limitation that matters a lot to speechmatics is that most of these large language models leverage mostly just text, right? And text is by far the largest data source being used today. Some projects like Gemini from Google DeepMind are moving towards what's being called

[00:16:21] large multimodal models or LMMs,

[00:16:25] which is using text, audio, images, and video.

[00:16:28] And so here is speechmatics. Just moving to life, what are you talking about here? What kind of applications or potential applications do you see leveraging speech technology in the future? Any use cases or ideas that you can share with everyone just to help them see what might be on the horizon? At this stage, I like to let my inner geek out and start talking about science fiction

[00:17:40] and that's a great rather than a sheep and other human. And I think we kind of passed a version of the Turing test for how we interact with chat-based systems through text because we're happy to wait for somebody to message back via a keyboard. We know that takes time. We were nowhere near being able to pass the Turing test for talking to an AI system.

[00:19:02] And that's because you don't get this seamlessness of interaction.

[00:19:05] You don't get this interruptions. Yeah, well, so, you know, AI and technology, humans are inherently bad at thinking about things in exponential ways, right? So we always think linearly. And so we're always surprised by the rate of progress over 10 years and disappointed by over a year, right? So I do think that, as you were saying earlier, large language models have taken the world by storm and

[00:21:22] frustrating that people didn't. And so what I hope Speechmatics will be able to achieve is to make that so seamless that billions of people will use voice-based AI systems going forward and, you know, it'll pass that toothbrush test for technology.

[00:21:35] Wow, so much food for thought there.

[00:21:38] And we started the podcast talking about your origin story from an early DeepMind employee over a decade experience in machine learning and AI. And it's really focused on that visionary rally cry towards what a future with AI is going to look like, especially the cultural, societal, political ramifications of this new technology and tries to bring the technology to life in a way that a broader audience can

[00:23:01] understand the implications of how this technology is going to roll out. You'll be able to read more about us, but there's also the ability to try us on what we call our portal, which is a place where you can try us for free. You can put in any audio or you can use your microphone or you can put in your favorite YouTube video and you can just see how accurate our system is. You can translate it to hundreds of different languages.

[00:24:20] And so yeah, please do come and try us.

[00:24:22] And if you can't find that, then you can add me, Trevor, back on LinkedIn or on X. clearly illuminated the path forward for me today where efficiency, contacts and inclusivity where they all lead the charge towards creating AI systems that don't just hear us but listen, comprehend and understand us. And as we look forward to a future where technology seamlessly integrates into the very fabric

[00:25:40] of our daily lives like Samantha in the movie Her, I think it's clear that the innovations