2966: How Pindrop is Pioneering Role in Combating Deepfake Technology
Tech Talks DailyJuly 18, 2024
2966
29:3617.14 MB

2966: How Pindrop is Pioneering Role in Combating Deepfake Technology

In this episode of the Tech Talks Daily Podcast, we sit down with Vijay Balasubramaniyan, the CEO and Founder of Pindrop, to discuss the alarming surge in AI attacks and the emerging threats posed by deepfake technology. With a staggering 450% increase in AI attacks in just the first four months of this year, the urgency to understand and combat these threats has never been greater.

Vijay offers an in-depth analysis of the drivers behind this surge, including the rise of voice cloning tools and the innovative, yet dangerous, uses of synthetic media. We delve into the recent instances of deepfake technology being used in political campaigns and elections, such as the false audio tape of President Biden, and the potential impact these technologies could have on the upcoming elections.

As the founder of the industry's first deepfake detection engine, Vijay shares the challenges faced in developing this groundbreaking technology and how it effectively detects synthetic media. He also highlights the crucial role of state and federal officials in defending against these threats, emphasizing the need for coordinated efforts to ensure the safety and integrity of our digital landscape.

Vijay discusses the importance of concerted state-federal coordination in combating the spread of synthetic media. He provides insights into the legislative actions being considered for 2023 and beyond, aimed at addressing the challenges posed by deepfake technology.

[00:00:00] Have you ever wondered how AI and deepfake technology could disrupt our entire lives? Not just in entertainment but in the political and business arenas as well. Because in the first four months of this year alone there was an alarming 450% increase in AI attacks.

[00:00:22] And with the rise of synthetic media, including deepfakes, we're now facing a new world of challenges that could potentially undermine trust. And that is across various sectors. So today I've asked Vijay to join me. He's the CEO and founder of a company called Pindrop.

[00:00:41] And they've got the industry's first deepfake detection engine. And Vijay has been at the forefront of combating these threats for some time. And in our conversation today we'll explore how deepfake technology is evolving, the risks that it poses, and also the measures being taken to protect our society.

[00:01:02] Now today's episode is made possible thanks to my sponsor who helps me cover the expense of producing this daily tech podcast and ensuring that you get 365 interviews at least a year from me. Legacy DRM failed to securely enable external collaboration on sensitive files. Can we admit that?

[00:01:24] For those organizations that face that risk trust contradiction, they must share content with untrusted parties yet protect their data. So it's time for a more modern DRM solution. Something that solves this dilemma without compromising security or productivity.

[00:01:42] So why not experience native editing across any file type with KiteWork? Their platform supports seamless reading, navigation and editing of all file formats, not just Microsoft Office and PDFs, empowering you to enjoy dynamic features like commenting without the need for plugins and special software.

[00:02:00] So you can elevate your collaboration with KiteWork's robust editing capabilities. So why not say goodbye to deployment headaches, file transfer risks, collaboration barriers and productivity constraints and start experiencing the modern way to collaborate on sensitive content without sacrificing control or security.

[00:02:20] And you can do all that by simply visiting KiteWorks.com to get started today. But now on with today's show. So buckle up and hold on tight as I beam your ears all the way to Atlanta, Georgia, where

[00:02:32] we're going to talk about all these pressing issues and maybe uncover how we can safeguard our future in the age of AI. So a massive warm welcome to the show. Can you tell everyone listening a little about who you are, Vijay, and what you do? Thank you, Neil.

[00:02:51] Thank you for having me on the show. My name is Vijay Balasubramanian. I'm the CEO and founder of Pindrop. At Pindrop, we provide multi-factor authentication and deep fake detection on audio and voice. And so if you...

[00:03:05] The way to think about it is traditionally when you call a bank or a healthcare or an insurance company, they ask you a litany of questions to identify who you are. And what we're able to do is avoid any of those questions and say that it's actually

[00:03:20] Neil on the other end of this interaction. And over time, that question has also become, is this Neil or a machine pretending to be Neil? And so that's what we do.

[00:03:31] We do it for eight of the top 10 banks, five of the top seven insurances, some of the biggest healthcare providers, some of the biggest retailers. They're all customers of ours. Fantastic. Well, I'm so glad you agreed to join me today because there's a lot of hype around artificial

[00:03:48] intelligence at the moment. You mentioned banking and insurance. I think just every industry is being impacted by AI. But I want to talk about that hype today. I want to talk about the 450% increase in AI attacks in the first four months of this year alone.

[00:04:05] So I've got to ask, what are the primary drivers behind this surge? How can organizations effectively respond? Because I know there's a lot of nerves around this increase at the moment. So one of the things with attackers is they're always trying to figure out new tools to be

[00:04:23] able to do their jobs more effectively. And AI is the tool that allows for the maximum amount of scaling. And so just to give you a sense, last year we've been doing deepfake detection for eight years and we have patents from seven years back.

[00:04:44] So we've been studying this phenomenon for a really long time. But at the advent of generative AI, there's a lot of great stuff happening with generative AI, customer experience, employee productivity, all kinds of great new revenue streams.

[00:05:01] But the deep, dark secret of generative AI is the fact that you're now blurring the lines between what is human and what is machine. And in a post-COVID world, you have only remote interactions. We are on this Zoom call talking and it's all remote.

[00:05:19] And that's the norm right now. And so with this norm, what happened is we thought, okay, now that we have to start protecting against deepfake, so we started building technology to make sure that within our products we're constantly monitoring for this.

[00:05:34] So last year when we were monitoring, we'd see one attack every single month. So it was interesting and was really cool to see what the attackers were doing, but it was just one attack every single month.

[00:05:46] And this year in the first four months, it's gone from that one attack every single month to an attack per customer per day. And there are certain customers of ours who get six attacks, deepfake attacks every single day. So it's been a massive explosion.

[00:06:02] And the reason for this explosion is multifold. One is for the longest time here in the US when unemployment and PPP loans were going on, fraudsters found an easy way to make money. This is during the heyday of COVID. Now they've all lost that source of income.

[00:06:21] So they're trying to find ways in which they can make money. And the easiest way is to social engineer a minimum wage call center worker to handing over the keys to the kingdom of someone's healthcare records, someone's bank account. So that's what they're doing.

[00:06:37] And we've seen a 60% increase in the amount of fraud coming into these voice channels. The second is all the generative AI tools that are there. When we started Pindrop, there was one voice cloning software. It was called Liar Bird.

[00:06:53] And it was cool, but it was the only software. At the end of last year, there were 120 voice cloning apps. And by March of this year, that number has jumped to 358. So there's just so many voice ways in which you can clone someone's voice so easily.

[00:07:13] And it went from, I remember Google brought John Legend in, had him record for hours, like I think 20 odd hours before they could introduce John Legend's voice on your Google Home. So you could say, what's the temperature in San Francisco?

[00:07:29] And John Legend in his lilting voice would say, it's a balmy, whatever, 73 degrees Fahrenheit, right? Because we were talking about temperature. And fast forward now, it requires three to five seconds of your audio to mimic your voice. So the thing that's changed is fraudsters want to make money.

[00:07:52] So they've started going to the easiest ways to do this, which is voice. Two, there is a whole plethora of tools available there. And these tools have allowed these fraudsters to scale up their attacks in ways they could never do before.

[00:08:07] And that's what you're seeing this increase do. Does that help? It really does. And before you came on the podcast today, I was doing a little research on this and something else that will be of major importance to everybody listening around the world is

[00:08:19] the fact that it's the election year all around the world. There's something like 4 billion people going to vote, going to the polls this year. So I'm curious, how do you see deepfake technology evolving?

[00:08:31] And what would you say are the most significant risks it poses to society as a whole, particularly in politics, as I just mentioned, but also entertainment and corporate America, business, etc.? Yeah. So I think deepfakes break trust, online trust in all kinds of ways.

[00:08:48] You know, the first place they break trust is commerce, right? Like any time you do commerce right now, it's a remote interaction by which you're trying to do something remotely. And so now if something else, a machine can impersonate you so quickly, all commerce is

[00:09:06] broken. Then you have media, both news media and social media. And this is where politics, you know, it was a Tom Hanks selling stuff or was it actually dental plan ads I think he was selling? Was it him selling dental plan ads or not?

[00:09:24] And so those are the situations. So all of news media and we've seen reports that 90% of all the videos that are coming out of the Israel-Hamas war are fake. So all of these news media don't know what's real, what's not. We're seeing this in social media as well.

[00:09:42] We were brought into an investigation where a very famous celebrity, sorry, a YouTube and a Instagram influencer was starting to sell blindness cures. And it wasn't him, right? Again, it was an impersonation of him. But again, like all of media, I think you've broken trust there.

[00:10:04] And then finally, communications, right? We're on Zoom calls, basic communication. We've seen such an explosion of deepfake attacks targeting the elderly here in North America. It's crazy. But what these fraudsters are doing to the question of scale is they're saying, OK, let's go to a particular county.

[00:10:25] Let's target all the senior citizens of that particular county, play recordings or not deepfakes of their grandkids saying they're in trouble. And what we're finding is the people are getting attacked Friday to Sunday. And on Monday, all these senior citizens are showing up at the local police law

[00:10:45] enforcement office saying, hey, I lost 20,000. I lost 200,000. So I have not been doing security for a really long time. I don't know the last time something broke, you know, commerce, media and communications, three pillars in one fell swoop. And that's what we're contending against.

[00:11:07] So it is really crazy. And I'll dive into the election side of things as well because we have a great example there. But it fundamentally breaks all of these three things. And if we talk about the impact of deepfakes on things like the upcoming elections,

[00:11:23] obviously, there's a lot of potential for misinformation. What measures are being taken that you're seeing to prevent misinformation from spreading in such an important year? Yeah, so, you know, you're exactly right. You know, half the world, I think, is going to vote this year.

[00:11:39] So it's a crazy, crazy number. And so it's funny in the US, you know, we have the election cycle as well and it's going to happen in November. But in January of the election year, we had the first case of misinformation.

[00:11:53] So we have something called the Republican primaries and President Biden got on the phone and called a whole bunch of people in New Hampshire asking them not to come vote on Tuesday and to save their vote for November, which is which doesn't make any sense at all.

[00:12:12] But essentially that's election manipulation, right? And so the first case of election manipulation happened in January. We were the ones who were brought in to investigate, you know, what was the AI application? We discovered the AI application, that AI application went ahead and shut down the user.

[00:12:32] And then a month later, this is a circus performer who came out saying, yeah, you know, I used it. But for some strange reason, after a day, it got shut down. And that was us shutting the person down. But that's the thing, right?

[00:12:44] Like he, the President Biden coming and saying, you know, stop voting is a really scary thing. And the fact is that, you know, all of this went down in a matter of 24 hours. But you won't have 24 hours come November. And so there's a lot of this.

[00:13:03] We're starting to see very basic responses here. They're trying to do tabletop exercises to say what happens if there is a deepfake. But here is the thing, right? Like truly nefarious deepfakes aren't going to impersonate President Biden.

[00:13:18] They're going to impersonate the local polling officer saying, oh, my God, there's a hurricane. Don't come to vote. Right. Or things like that. And so we think the response is really poor and it's just across the board. Really, really poor.

[00:13:35] People are like, OK, we're like chickens with our head cut off. There is I mean, it's too hard to think about this thing. You know, at some level all is lost. So let's just roll the dice in this particular case.

[00:13:49] And so that's unfortunately been the response so far. We're seeing few folks being very forward facing. So we interfaced with the New Hampshire attorney general who worked really, really fast when this happened in New Hampshire here.

[00:14:05] And we've been you know, he's been bringing us to attorney general panels to get everyone educated and things like that. But, you know, it's one where you need programmatic ways in which you determine that something bad is happening and react really quickly to it.

[00:14:21] And that's not yet there. And I've got to ask as the founder of the industry's first deepfake detection engine, what challenge did you face in developing this technology and how does it work to detect synthetic media? Anything you can share around that?

[00:14:37] Because I've got a feeling there's going to be a story there, right? Yeah. You know, it's interesting, right? Like so we've been for the longest time solving the right human problem.

[00:14:47] And what I mean by that is, OK, Neil's saying he's on the other end of a line and we need to identify it's Neil. And so we do that by saying, oh, yeah, it's Neil's device. It's Neil's voice. It's Neil's behavior.

[00:15:02] We're seeing all of these signals, all of these multiple factors by which we can authenticate it's truly Neil on the other end. And it was a really it still is a very important problem because if you look at

[00:15:13] the MGM hack, the way the MGM hack went down is the ransomware group AlphaQ managed to get someone who had Okta credentials, call them up, got all their credentials and then hacked MGM. And this is a 44 billion dollar organization that was losing eight million

[00:15:30] every single day because their entire operations were being crippled. Right. And so all it took. And if you actually look, the AlphaQ guys come out and say, yeah, we took down MGM and all it took was a 10 minute phone conversation. Right. And so that is a fundamental issue.

[00:15:47] And solving the right human problem was always our issue. And we kept seeing the sophistication with generative AI. We have to solve not just the right human problem, but the real human problem. So is this a human with a pulse?

[00:16:02] And so we started knowing that this was coming about eight years back. And there are these all these challenges called automatic speaker verification spoofing challenge where, you know, you submit your systems, people try to spoof your systems and you try to do it.

[00:16:16] And about two years back, we are the ones who discovered the deepfakes in Anthony Bourdain's Roadrunner documentary when, you know, the director didn't want to say which of the remaining deepfakes once he got flack.

[00:16:30] Right. And so the fact is that at all those times, it was cool party tricks. But we kept developing our technology because we were like, OK, this is a real threat to determine whether it's human or machine. And then generative AI changed everything, just the scale, the speed.

[00:16:45] So right now, our entire thing is how do we keep track of all of these different AI applications? Because not only are we able to say it's a deepfake, we're able to say which deepfake it is, right?

[00:16:56] Is which of those 358 voice cloning apps are being used to create this deepfake? And the way we're able to do this is because of the humanness and 10,000 years of evolution. So the way the way to think about it is, you know, deepfake

[00:17:13] detection systems like ours look for either spatial or temporal anomalies. What I mean by that is, for example, when you say San Francisco, right? And they are you saying those channeling noise into an actual phoneme or a letter. And, you know, that's a really hard thing.

[00:17:37] The reason we're able to do that is we developed an overbite because we started cultivating soft foods and things like that. But that's we have an entire human system that produces these voice. A lot of these systems don't know how to represent noise really well.

[00:17:52] And so what ends up happening is when you look at these fricatives, these are known as fricatives, you start seeing, oh my God, they're always representing these fricatives in this really inhuman way. It's not quite human and they have a very specific signature to it.

[00:18:06] So that's an example of a spatial anomaly on the video side of things. A spatial anomaly is where is the lighting? If there are lights on these two sides, where are the shadows falling and things like that, you can detect that.

[00:18:20] On the temporal side, and this is where things get really interesting. It's when you say things like, hello, Paul. My mouth is wide open when I say hello and my mouth shuts down when I say Paul.

[00:18:31] The speed with which I can do that because I'm human, I can only do that in a certain kind of speed. And through those variations, my entire vocal tract, my voice, my oral cavity, my nasal gravity has to go through certain configurations because I have physical limitations.

[00:18:49] And a lot of these machines don't care about things like that. So what ends up happening is if you come to Pindrop, every conference room is named after a fraudster we've caught. And one of the conference rooms is called Giraffe Man because when we analyze the

[00:19:01] audio, we're like, the only person who could have produced this audio is someone with a seven foot long neck and that's not humanly possible. And so, you know, those are the kinds of things. And these temporal anomalies are incredible in the audio side because you have

[00:19:17] 8000, even in the lowest fidelity channel, which is telephony, which is when you make a phone call. It has 8000 samples of your voice every single second. So if you have two seconds, you have 16000 and so on and so forth.

[00:19:30] So you have a lot of anomalies and we're able to use these anomalies to say this is Neil or this is a machine pretending to be Neil. I love that. And I've got to ask, are there any lessons you think that we could learn from previous

[00:19:45] voice cloning and deep fakes? Maybe they've been used in recent political campaigns or elections or just things that you're seeing out there and as it continuously evolves, any lessons we can learn from what you've seen? Lots of lessons.

[00:19:59] So, you know, the funny thing is when you think about how a deep fake is made, lots of people think it's one system, but it's actually a modular system. There are, you know, at the very least nine different components, distinct components that go into making these deep fakes.

[00:20:16] The interesting thing is a lot of the newer deep fake engines have modified one of these nine different components, like what's known as the vocoder, which is how do I translate bits and bytes to a voice, right?

[00:20:28] Like, how do I get more sophisticated and make it more human sounding? But they've not translated some of the other underlying engines that delve in speech production. And so by knowing those architectures and those systems, even if someone

[00:20:42] changes something else, we're able to say, oh, this is still machine generated because all of these other eight systems are throwing the same fingerprints that they used to throw while this one is throwing a slightly off fingerprint.

[00:20:55] And we need to learn that new fingerprint, but we're still able to know it's a machine. And so knowing the history of voice cloning, at least what we have found is super important to be able to detect as these systems evolve.

[00:21:11] Because what we've seen is that these systems change, like the entire system changes once every 10, 20 years. Like people don't change all of the components all at once. They change these individual things. And that helps us in determining, OK, how these systems are evolving and therefore how

[00:21:30] to keep track of them. And what roles do you see state and even federal officials playing in defending against the threats posed by deepfake technology and how effective have these efforts been so far?

[00:21:45] Because it's kind of interesting how a lot of crimes that were committed maybe on streets or in public physical places are now taking place online. So what are you seeing here? Yeah, so this is where I think everyone needs to start adopting deepfake detection

[00:21:59] technologies. They're known as liveness because they're checking whether you are live or not, right? Like you're a real human or not. And so liveness is going to become really important just across the board, right?

[00:22:12] Like when someone flashes their driver's license and their face, is it really a human or is it someone sitting behind two phones stacked next to each other, which is what's happening where they are having their, you know, this fake license and or this person's

[00:22:31] license and they're using their face. You doing a face swap and the face that's coming out as being projected to the laptop is actually Neil's face. So everything matches when they move.

[00:22:41] It moves. It does everything like as if you would do it because all it's doing is it's swapping out that fraudster's face with yours. You're seeing this in romance cams, right? Like fraudsters who look eerily like Ryan Gosling, you know, with just a slight twist.

[00:22:57] And the slight twist is not intentional. It's just that the face swap is not quite getting the face quite right or they've used not a great flattering photo of Ryan Gosling. And so it's all of these situations, even with all of their flaws, there are people saying

[00:23:12] like there are all of these people saying, both men and women saying, oh my God, this is you look so similar and I've always had a crush on him. And and then they're like, OK, if I need to come to the US, can you give me send me

[00:23:27] twenty thousand dollars? And they're like, yeah, yeah, for sure. Let me send you twenty thousand dollars. And and the thing is that every system will now need to answer this as the fundamental question. Am I talking to a human or am I talking to a machine?

[00:23:41] Because I don't know. And is that on a Zoom call? Is that on a communication call? Is that when people are trying to open up a new account online? Is that when people are getting information about their local celebrity, local

[00:23:54] politician, you're going to need liveness as the fundamental check across all of these platforms? And if we it seems like we need almost a concerted coordination between state and federal agencies, that seems to be imperative right now, especially if we're serious about combating the spread of synthetic media.

[00:24:15] But what strategies do you think can enhance this collaboration and ensure that it does happen? Yeah, no, it's a lot. And, you know, in the US, there's a lot of organizations are doing some incredible work

[00:24:27] here, like the FTC here has started to get together all of the, you know, not just, hey, technologies like us which do liveness detection, but systems that do voice cloning and say, hey, can you help the technologies that do liveness detection by giving them early

[00:24:45] access to your models so that you can see what's happening? And so there are efforts on. It's not an easy problem. So it's just and, you know, we're having all kinds of people, right, like people in, you

[00:24:58] know, commerce, people in media, people in acting and, you know, celebrities. So everybody's trying to come together to start building out these things. But most importantly, the state and local organizations, the federal and state organizations are starting to look at, OK, how do we start applying these technologies

[00:25:19] initially for some basic use cases? How do we protect individual members within the Senate and the Congress? And then how do we go beyond that to start protecting across the board? But, you know, these things take time. We've been stressing the bad guys are collaborating.

[00:25:37] We've seen in the dark web how the bad guys are teaching each other, oh, use this tool for video, use this tool for audio, combine it this way, stack two phones like this. They are writing out playbooks for each other to learn and become better.

[00:25:53] So they are absolutely collaborating on their end. Our stress is we need to collaborate, right? Like the government organizations need to collaborate with federal organizations, need to then pass it on to state and local bodies. And this has to be a coordinated effort.

[00:26:13] And we're starting to see a few of these efforts, like the FTC starting an effort and things like that. But again, it's going to take some time to get there. And I am impatient.

[00:26:25] I'm like, we see these attacks every single day and we're like, now we're not going fast enough. Well, thank you so much for joining me on the podcast today and bringing this topic to life. It's great to shine a light on it because it's so important, especially in

[00:26:39] this election year. So thank you for sharing your insights. But before I let you go, I'm going to ask you to leave all my listeners with one final gift. And that is either a book that we can add to our Amazon wish list or a song for

[00:26:51] our Spotify playlist. I don't mind which, but what would you like to leave and why? Yeah, so for me, I'd like to leave the song Keep Talking by Pink Floyd. You know, if you look at the initial part, it's got Stephen Hawking saying for millions

[00:27:07] of years, mankind lived like the animals. And then something happened that unleashed the power of their imagination. We learned to talk. And I think it's an incredible human thing for us to speak and we need to protect that. Wow, what a great song.

[00:27:22] I'll be getting added straight to Spotify playlist. And for anyone listening, just want to dig a little bit deeper on all things Pindrop. Explore what we're talking about here. Where would you like to point everyone listening? So it's very, very simple. It's www.pindrop.com.

[00:27:38] If you want to look at what we do in deepfakes, just forward slash deepfake. And those are the two places that you should go. As I said at the beginning of the podcast, in the first four months of this year alone,

[00:27:49] there was an alarming 450 percent increase in AI attacks. And we have seen so many different ones from the obvious false deepfake audio tape with Biden urging voters to skip the New Hampshire primaries. But I think this is going to continue in everything from entertainment to politics to

[00:28:08] business. So thank you so much for joining me on the podcast and sharing a light on this increasingly critical topic. So thanks for joining me again today. Thanks so much, Neil. Really, really appreciate it.

[00:28:20] Having spoken with Vijay there, I think it's clear that the rise of deepfake technology and AI attacks pose a significant risk to our society. From undermining trust in commerce and media to potentially impacting political elections. Make no mistake, the implications are profound.

[00:28:39] However, with pioneers like Vijay and his team at Pindrop leading the charge, I think there's a lot of hope for robust defences against these threats. And the need for coordinated efforts between state and federal agencies and the private sector has never been more critical.

[00:28:58] But what are your thoughts on the rise of deepfakes and the impact on our world and business? What do you think we could do to better prepare for and combat these emerging threats? As always, share your thoughts with me.

[00:29:11] Tech blog writer Outlook.com, Twitter, LinkedIn, Instagram, just at Neil C. Hughes. Let me know your thoughts. But thank you for listening today. And until next time, don't be a stranger.