How is AI transforming the way we interact with sound, voice, and digital personas? In this episode of Tech Talks Daily, I speak with Alex Bordanova, Chief Product and Technology Officer at Voicemod, about the company's groundbreaking approach to AI-driven voice transformation and its expanding role beyond gaming.
Originally developed to enhance voice chat experiences for gamers, Voicemod has evolved into a powerful tool for real-time voice modulation, combining traditional digital signal processing (DSP) with cutting-edge AI. Alex shares how their technology enables users to alter their voice characteristics—changing timbre, pitch, and resonance—while maintaining clarity and intelligibility. With millions of users creating custom voices every month, Voicemod is at the forefront of AI-powered audio innovation.
We discuss the launch of Voicemod Key, a new hardware solution bringing voice transformation to console gaming and VR, allowing seamless integration between microphones and gaming platforms. Alex also highlights the importance of ethical AI practices, emphasizing Voicemod's commitment to using professionally recorded and consented datasets to train its AI models responsibly.
Beyond gaming, Voicemod is forging partnerships with major entertainment brands, collaborating with Warner Bros, Rovio, and other global IP holders to bring iconic voices to life. As audio technology catches up to the advancements in visual media, what does the future hold for voice augmentation in entertainment, communication, and beyond?
[00:00:03] Imagine transforming your voice in real time, turning everyday conversations into an immersive, dynamic experience if you're online or just having a bit of fun with it. Well, today my guest is from a company called Voicemod, and they're a company at the forefront of AI-driven voice transformation.
[00:00:22] Originally rooted in the gaming world, Voicemod has expanded its reach, combining traditional DSP effects with cutting-edge AI to enable real-time voice changes, sound effects and much more. But today I want to learn more about the journey of Voicemod from its early days in online gaming to its current role of redefining audio experiences across multiple platforms.
[00:00:47] I want to learn more about the technology that powers Voicemod, how their Voicemod hardware is bringing voice transformation to everything from consoles and VR, and the ethical considerations that they prioritize to ensure that their AI is used responsibly. But what does it take to match the quality of audio with the immersive video visual worlds of gaming and media?
[00:01:10] And how are Voicemod tackling the challenges of expanding voice transformation across platforms while also maintaining ethical practices? Well, let's find out and get him on the podcast now. So a massive warm welcome to the show. Can you tell everyone listening a little about who you are and what you do? So I'm Alex Bordanova. I'm the chief product and technology officer at Voicemod, meaning that I take care of product and engineering. Yeah.
[00:01:40] So I just have the little mission of bringing to life all the ideations, all the user needs, all the problems that we detect and the opportunities that it comes with it. Well, it's a pleasure to have you on the podcast today. And Voicemod has positioned itself as somewhat of a leader in AI voice transformation. But for people listening today and hearing about you for the first time, just to set the scene, can you just expand exactly what Voicemod is, why the world needs it,
[00:02:10] and how it might help foster creativity and connection in gaming and online spaces and so much more? For people hearing about it for the first time, tell me more about it. So Voicemod kicked in with the gaming space in the times in which online conversations were starting to sprout in game chats, right? So we started to see games like PUBG, Fortnite having voice chats and the need to collaborate together.
[00:02:39] And Voicemod was a layer that will boost the fun and the chances to have better timing while speaking on this game chat. So at the beginning, it was just a bit of DSP effects like a cathedral effect or like a radio effect.
[00:03:00] And more and more, it turned to be something more connected with the AI space because at some point, we cannot understand audio without the AI boosting, right? AI basically what allows to do is to change the timbre of the voice, which is a thing that it could not be done with old technology from the previous times.
[00:03:24] And today we also have a lot of users not just changing their voice, but also playing sounds, memes, right? That type of very interesting sound effects that's fun music that you can find on the internet, videos on TikTok, all that type of stuff. Also brings a new layer of audio augmentation to have more fun online. And that's also what the users do.
[00:03:49] Please tell me you can do the scream voice and I could go and play a game and say, do you like scary movies, Alex? Absolutely. Yeah. Yeah, yeah, yeah. We have it. I mean, as you can perceive, there's always going to be great moments throughout the year with some voices like the scream one are like very, very used, right? And when it comes to Halloween, for instance, the scream voice is one of those that it's a good friend that always comes in. And we can see that in the statistics.
[00:04:18] This type of voice is just like kicking. Oh, man, I could have so much fun with that. That sounds amazing. But of course, as it's a tech podcast, I want to find out more about how this works. So you've got real time AI voice transformation technology. How does this tech work? And what is it that makes it unique compared to other solutions on the market right now? Yeah. So as I mentioned, we come from being the kings in space with DSP, right?
[00:04:45] DSP is understood as like classical processing, like EQ, reverb, that type of effects that we put very easily for the users with one click to embrace a new character. Right. But in the last years, AI has been a great addition because it allowed us to, for the first time, to not just have the effects of space or the effects of a device like a mask or like a megaphone. Right.
[00:05:15] But we also, with this technology, allow the users to change how they sound. Because there's a very intrinsic component to sound like someone else, which is about the natural situation of the throat and the tongue and the age of the vocal cords and the stiffness of the different muscles of the mouth.
[00:05:40] Like so many different elements that for traditional processing are not affordable to get. And AI, for the first time, is allowing us to do so. Right. So we are now blending these two spaces. So having another chamber of the voice plus having those effects allow us to do things. For instance, now we have very trendy the squid game doll. Right.
[00:06:06] That type of voice, which is like a doll speaking, which is very creepy. But it also has the SP effects because it's being put with a megaphone right on stage. That type of things for us is the perfect blending because it allowed our users to play the video games from Roblox that are all about this game now. We know because we saw that like kicking into our numbers again.
[00:06:31] And it's very, very simple for us to get that type of of timber and add the DSP effects and allow the users to have instant fun with with this voice with one click. And doing a little research before you came on the podcast there, I think I read that there's more than 40 million gamers and streamers creating Sonic identities out there. Is that right? Is that figure right?
[00:06:55] So what we have at this point with VoiceMod is the ability to create voices inside our product. And it's about 40,000 a month, the amount of users that are creating new content, right? And then sharing that because we're not just cooking our own content, but we are also having that tool that we call VoiceLab. So everyone can create their own content and then share it with the community.
[00:07:24] And something else I wanted to raise, we're not just talking about the PC master race of gamers out there. The launch of VoiceModKey is also somewhat of a milestone for console gamers like myself. So what inspired the development of the hardware device and how does that bring real-time voice transformation to platforms for people on PS5 and Xbox, etc.? Yes, a good question.
[00:07:47] So we mastered the ability to do voice transformation and soundboard display on PC, right? We are the leaders in the market with this. And we also know there are two other markets that are coming, which are very big. One is mobile and the other one is console. And we are dealing with that in separate ways. So for the console one, we released this piece of hardware, which is VoiceModKey.
[00:08:16] VoiceModKey is basically a device that allows every user to connect their Xbox. We have also users already using that with VR headsets. So it can be used with anything that has a mini jack input, basically. And what we do is we have this device. This device is connected to your mobile phone.
[00:08:38] And although the processing happens today in the mobile phone, this device is allowing us to be a man in the middle, right? So we intervene the signal from the microphone. We process it. And then we put it out into the Xbox or PS5. And it's working very nice because it's a way for the first time to be on the couch and enjoying VoiceMod with your friends on Discord, your friends on Fortnite, right? It's essentially the same experience, but running on your pocket.
[00:09:08] And audio has always been described as the missing piece in creating a full gaming immersion or immersive experience. So how would you say VoiceMod addresses this gap? Ultimately, if we look beyond here, what role do you see audio playing in enhancing that gaming experience that we all love? I need to say that I've been enough in the world to understand that audio is always going to be lagging when comparing to the visual side.
[00:09:37] It comes to movies. It comes when it comes to video games. And, well, everywhere you can see the same type of pattern. And more and more in this last year, video games are, like, just doing the same sort of process. And we start to see that, like, the atmospheric impact of music, the atmospheric impact of being immersed into the video games,
[00:10:02] which you have a 360 sort of feeling when playing, not to speak about, like, horror video games. It can be anything, like Call of Duty, right? Introducing these new mechanics to understand the perception, the position of the different elements in game, being a base layer to have a compelling experience. So how so you're playing a video game in which you are, I'm going to say, for instance, World of Warcraft,
[00:10:30] and you are like a tauren, this big pulse, right? These pulse that have disappeared. But then you listen, or an orc, and you listen to the voice of a weak man like me, for instance. That would be very shocking. We are willing to go one step farther and boost the video games with the ability to feel like everyone around you is matching the visuals, right?
[00:10:58] So being who you are and showing off who you are in a way that is totally matching the visual and the sonic side is part of what we are building. And we can tell, I mean, the engagement with the video games in which the people is totally matching voice and visual is a must.
[00:11:20] You can tell because, for instance, VRChat, which is one of those video games in which the visual and the sonic side are matching, I would say, 90% of the users already. Most of them are using voice mode because they want to look and they want to sound the same way. And that's precisely our value. Makes such a great point now about how audio is often neglected.
[00:11:44] And I think everybody listening to this podcast will be holding a smartphone capable of shooting 4K video that looks amazing. But the sound on that video without an additional microphone or anything is tinny, is ruined by wind. Audio is often neglected, isn't it? Yeah. I've been there. So many situations in which I was recording a past in my life, like movies.
[00:12:09] And I had to do a very, well, long runs just to put the cables in place, just to try to hide microphones in place. Because otherwise we would be putting like all the attention on the visual side. But then the sound would be like just totally out of quality. And then is when you realize that the experience is not full. This is when the reality kicks in and breaks the mirage, right?
[00:12:37] It breaks the magic of being in place. That's the type of things that are critical to audio. If audio is not perfect, if it's not crispy, if it's not really in place, you will break the feeling that you are inside the space. And that is it. I think that's the superpower of audio, right? To be totally complete in the picture. 100%. And voice, of course, is a key aspect of both identity and communication.
[00:13:06] So how would you say voice mod helps users maybe express themselves more authentically and creativity? And why is that so important in these digital environments where gamers and everybody online, to a certain degree, people on Reddit are almost pretending to be somebody else a lot of the time? Yeah. Yeah. Yeah.
[00:13:25] The situation with the voice, I think that you nailed it here, is that when it comes to visuals, the message is not as critical, right? But when it comes to voice, it needs to be super easy to understand. Otherwise, if you put that into compromise, then the last message that you as a speaker you want to put out is broken. Yeah.
[00:13:51] I mean, it's not just that it sounds cool and matches the visuals, but it also needs to be intelligible. Intellectibility is one of those factors that we try to preserve as much as possible. And then we work on other layers, right? Then we try to add on the connection with the visual side. Then we try to add with, for instance, if we speak about gender switch, which is one of the things that are more critical when it comes to voice processing.
[00:14:20] Having, in my case, if I would sound like a woman, right? Having me sounding like a real woman, that's one part of the problem. But sometimes when we go too far into this, suddenly intelligibility starts to break, right? It sounds more natural, but when you try to grasp the message below, then you start having troubles.
[00:14:40] And there's like so many different components that you need to take into account when you try to create a perfect voice for things like a woman or like, for instance, like another with a very deep tone of voice. That you go too deep, then it breaks the message again, right? That type of components are very relevant. And VoiceMod has evolved significantly since it was first founded.
[00:15:05] So can you tell me a bit more about the evolution of the company's technology, especially with the advancements of AI over the last few years? How has that changed everything for you as well? So the company started, founded by three brothers. And at the beginning, they were pursuing a vision in which mobile was the main line for them. That happened with the arrival of the first iPhone 3G. That was the first model that ran VoiceMod.
[00:15:33] It was an IP application. And then through time, they discovered that there was a market fit in the desktop space. So they transitioned, they pivoted from one very specific application, or I think I have several, but one that was basically running the whole line. And they pivoted into desktop. This is where they finally found the way to monetize the old experiences.
[00:16:01] It was probably one of the first companies in the world that proved that all the experiences could be monetized at scale. And we did not stop there. I mean, we wanted to have this mobile phone and the hands with VoiceMod. And we also did all this exercise to decouple the SDK. So the same SDK that was running on desktop is running our mobile.
[00:16:25] And that's been one of the biggest efforts that we did with the arrival of AI, because AI processing is way more strict. We have another type of demand when it comes to processing on mobile than on desktop, because devices are way different, not just in terms of architecture, but in terms of power. So we are very keen on running on every device. And this is what we achieved this last year.
[00:16:53] We are the first company that can run this real-time voice conversion on mobile with a very, very wide range of devices. So this same SDK now, it's being offered to third parties as well to be integrated into their voice sets. So we're speaking with different video games companies, so they can run VoiceMod inside as well. And other type of companies that are communication, that have communication products.
[00:17:22] So people can also change their voice and play sounds while they are enjoying their experience on their game sets. So the isolation of the SDK to be able to be consumed in every platform is being one of the biggest breakthroughs of last year. And there will be people listening far more sensible than me that will also warn about the more mischievous use of this technology.
[00:17:45] And I think I should also highlight, I think it's a fairly trained certification for ethically trained AI speech models. And this is one of your major distinctions for VoiceMod. So can you tell me a bit more about that, why earning that certification is so important and how it sets VoiceMod apart in the industry? That's one of the big things that we've been discussing for the last years, right?
[00:18:09] As AI started to kick in, we started to see the potential misuse of our technology. So from the beginning, we need to differentiate about two different areas in which the AI could be potentially misused. One is when grabbing the information from people and using that into creating new voices.
[00:18:38] And the other one comes for the bad use of that technology, right? So for the first one, we always wanted to be sure that the technology that we were setting in place was trained with data sets that were given consent. That we're totally aware that we were going to be using that technology, that data sets recorded by professionals on a studio that were getting paid quickly, right?
[00:19:08] And that's what fairly trained certification has shown, right? That we are totally aligned with the ethical standards that we want to have for our company. On the misuse, well, that's more on us to create content that is funny, right? That it's all about entertainment. Our users are using VoiceMod to have fun online and this is precisely the type of content that we are creating.
[00:19:36] But also on the tools that we have to create content are directed into this position in which the voices that are built are just to have fun with their friends. And I was reading before you came on the podcast today that you've seen strategic growth through IP partnerships with companies far and wide from Warner to Rovio, for example. So how do those partnerships enhance your offerings?
[00:19:59] And ultimately, if we look further into the future, what's next for VoiceMod in terms of technology, roadmap, SDK, innovations, etc.? Anything you can share about that road ahead? Well, we, from the beginning, connect with our users. We know what they like. We got their requests. We have all this information about what type of voices they want to use in their daily lives.
[00:20:23] And it comes to no big surprise that this is the type of IPs that we were expecting, right? And Warner with Rick and Morty or movies, Batman, Superman, that type of content is the one that our users were looking forward to enjoy. So more and more, we started to pave that avenue, opening conversations with major IP owners.
[00:20:48] I cannot really disclose the ones that we are having now on conversations, but I can say those are the biggest IP owners in the world. So there's like a bunch of them in conversations, and we expect to have those in VoiceMod. Basically, what we have is a model, a business model on the top of the subscription and the license that you can pay and enjoy forever.
[00:21:13] So when you get that new soundboard from Batman, we just like add it to your collection of sounds, Robio with voices. And we have that offered to our users to use inside the product with this premium content that just is used by the ones that are paid on the top of the description.
[00:21:35] We are very excited for what is coming, because there's big major names in the game that are more and more interested to also enjoy the opportunity to milk the IPs in another way that they never did before, which is all around the other experiences. That's brand new for them. We're opening this where nobody else came in before, right?
[00:21:59] And we are shaping that with our partners, because it's not easy for the first conversations, but more and more we are opening that. Wow, you left us with a few teasers there. Sounds like we're going to have to get you on later in the year to find out more about some of those. People listening all around the world, I suspect there's a lot of people looking forward to entertaining themselves and their friends thinking, I need to play with this.
[00:22:25] So if they're on a PC, Mac, phone or Xbox, PlayStation, whatever it is they've got, what's the best way of getting up and running and finding out more information about anything we talked about today? Yeah, just like babysit to voicemart.net. I mean, you will just like be directed based on the platform that you're running, you will be directed to the right product. I mean, every voicemart is one single product that runs on multiple devices.
[00:22:53] And it's very simple to just put your feet on the right space. Well, as I said, I'm going to be checking that out. I might even add something to this podcast if I can find something suitable. So watch this space there. I will be getting you on later in the year to find out more information on what's coming and new features, etc. But more than anything, just thank you for getting me excited for the weekend and giving me something to play with and sharing your story. Thanks again. Thanks. Hello, Neil. Thanks. Thanks for bringing me here.
[00:23:23] So as we wrap up this discussion with Alex, I think it's evident that VoiceMod is leading a transformation in how we experience and interact with audio. From gaming to broader applications, their blend of traditional DSP and AI technology is setting a new standard for immersive sound. But I must admit, when listening to our conversation, I was just thinking of the great fun that I could have with this. But what excites you most about the future of real-time voice transformation?
[00:23:51] Are there untapped possibilities that could redefine how we communicate and play? I'd love to hear your thoughts on this. As always, don't hesitate to join the conversation. Email me, techblogwriteroutlook.com, x, LinkedIn, Instagram, just at Neil C. Hughes. But until next time, stay curious. Keep exploring the ways that technology is reshaping our lives and also allowing us to have a little fun along the way. Speak with you all tomorrow with another guest. Bye for now.
[00:24:25] Bye for now.

