Cobalt Shares Hard Lessons From the State of Pen Testing Report | The Tech Talks Network

What happens when artificial intelligence starts accelerating cyberattacks faster than most organizations can test, fix, and respond?

In this episode of Tech Talks Daily, I sat down with Sonali Shah, CEO of Cobalt, to unpack what real-world penetration testing data is revealing about the current state of enterprise security. With more than two decades in cybersecurity and a background that spans finance, engineering, product, and strategy, Sonali brings a grounded, operator-level view of where security teams are keeping up and where they are quietly falling behind.

Our conversation centers on what happens when AI moves from an experiment to an attack surface. Sonali explains how threat actors are already using the same AI-enabled tools as defenders to automate reconnaissance, identify vulnerabilities, and speed up exploitation. We discuss why this is no longer theoretical, referencing findings from companies like Anthropic, including examples where models such as Claude have demonstrated both power and unpredictability. The takeaway is sobering but balanced. AI can automate a large share of the work, but human expertise still plays a defining role, both for attackers and defenders.

We also dig into Cobalt's latest State of Pentesting data, including why median remediation times for serious vulnerabilities have improved while overall closure rates remain stubbornly low. Sonali breaks down why large enterprises struggle more than smaller organizations, how legacy systems slow progress, and why generative AI applications currently show some of the highest risk with some of the lowest fix rates. As more companies rush to deploy AI agents into production, this gap becomes harder to ignore.

One of the strongest themes in this episode is the shift from point-in-time testing to continuous, programmatic risk reduction. Sonali explains what effective continuous pentesting looks like in practice, why automation alone creates noise and friction, and how human-led testing helps teams move from assumptions to evidence. We also address a persistent confidence gap, where leaders believe their security posture is strong, even when testing shows otherwise.

We close by tackling one of the biggest myths in cybersecurity. Security is never finished. It is a constant process of preparation, testing, learning, and improvement. The organizations that perform best accept this reality and build security into daily operations rather than treating it as a one-off task.

So as AI continues to accelerate both innovation and attacks, how confident are you that your security program is keeping pace, and what would continuous testing change inside your organization? I would love to hear your thoughts.

Useful Links

Thanks to our sponsors, Alcor, for supporting the show.

[00:00:04] What happens when the same AI tools that help teams move faster also help attackers move smarter? That is one of the many questions that sit at the heart of today's conversation. My guest today is from a company called Cobalt. We're going to talk about how AI is changing the pace and shape of modern cyber attacks and why many security teams are still struggling to keep up. And I think we've seen headlines recently, including claims from Anthropic,

[00:00:33] suggesting that threat actors are starting to automate large parts of the attack lifecycle, predictably using AI. And many researchers are rightly questioning the detail, but the broader signal is impossible to ignore. The same technologies that give our businesses speed and scale and creativity are also giving cyber criminals new ways to probe, test and exploit systems.

[00:00:59] And Cobalt sits in a pretty rare position here. They run thousands of real-world penetration tests every year across web apps, APIs, cloud environments, and now LLMs. And this gives them a clear view into where defenses are improving, where execution is stalling, and where AI is quietly opening up new blind spots.

[00:01:25] So today you can expect to hear why vulnerability resolution has improved dramatically over the last decade, yet closure rates remain stubbornly flat. So if you are responsible for keeping pace with the business without leaving doors open for attackers, this episode will hopefully help you rethink what modern security actually looks like. Before I bring today's guest on, a quick thank you to my friends over at Denodo.

[00:01:54] AI is evolving fast, but the elephant in the room is initiatives are still failing. Not because the models aren't good, but because the data foundation isn't ready. That's why organizations are increasingly turning to Denodo and logical data management. Denodo unifies your data across every cloud and every system without the need for massive replication. So you can power trustworthy AI, accelerate lake house optimization,

[00:02:24] and build data products that make self-service real for every team. So if you're ready to make AI actually work, visit denodo.com and put logical data management to work today. But who is my guest? Well, it's time for me to officially introduce you to her now. So a massive warm welcome to the show. Can you tell everyone listening a little about who you are and what you do? My name is Sina Yamani.

[00:02:54] I'm the CEO of Cobalt. I've been in the cybersecurity industry for over 20 years now. And funny enough, I started actually as an investment banker. I then later moved on to operational roles. I've run marketing, engineering, product, strategy. And I took on this role at Cobalt in September of 2024. I'm based in Lexington, Massachusetts, which is just outside of Boston. Fantastic. And I'm looking forward to speaking with you today.

[00:03:24] So much that we're going to cover in just 30 minutes. And one of the big stories I keep hearing about is the reoccurring claims from companies like Anthropic that threat actors are beginning to automate attacks using AI. And cybersecurity has always been a game of cat and mouse. But from what Cobalt is seeing in real pen testing data, how real is that shift? And where are security teams most exposed? What are you seeing here? You know, Neil, this is absolutely a real threat.

[00:03:53] Any threat actor can use the same legitimate AI-enabled tools to find weaknesses in company systems just as defenders do. Right? And they can do it faster than ever before. AI, we know, is really good at conducting recon, finding known vulnerabilities.

[00:04:13] And with human involvement, AI can also, you know, harvest credentials, conduct social engineering attacks, and extract large amounts of information, private data. We're quickly getting to this point where AI can do, AI agents can do 70 to 80% of the work.

[00:04:35] These agents can follow complex instructions, take automated actions, and chain together tasks with just minimal human guidance. So, yes, the threat is very real. However, I do want to point out that hallucinations are still very commonplace. In fact, Anthropic found that Claude occasionally made up credentials and claimed to have extracted secret data that was, in fact, public.

[00:05:05] Right? So, we're still seeing, you know, these types of hallucinations. So, the, you know, the other thing is that the amount of automation versus human involvement also depends on the sophistication of the attack. So, the more complex exploits will require human ingenuity.

[00:05:26] So, definitely, we, there's a huge threat now with AI making it, you know, faster and easier to exploit and exfiltrate sensitive data. You know, you ask, what are security teams doing about it? Well, what we're hearing from security teams is that they are shifting from point-in-time testing of their defenses to continuously testing.

[00:05:49] With expanding attack surface, the changing threat landscape, they know that new vulnerabilities are always going to appear. And so, they need to test often, fix quickly, and assume that they're always being attacked. And to bring to life what we're talking around here, before you came on the podcast today, I was looking at some of your research.

[00:06:13] And the data shows that the median time to resolve serious vulnerabilities has dropped from 112 days to 37 days since 2017, which is great. But, of course, annual closure rates are still stuck around 55%. So, I've got to ask, why is execution failing to keep pace with the innovation there? Is there a big reason for this? Anything that you uncover there? Yeah, a couple of things.

[00:06:39] I mean, organizations have always struggled to fix any kind of CVE with less than half of all exploitable vulnerabilities ever being fixed. But the good news is that the serious vulnerabilities are actually being addressed, right? So, why is this the case? Why is execution failing to keep up with innovation? Well, this is especially hard for companies that have a lot of tech debt, right? Companies are always under pressure to release new features.

[00:07:07] And for companies that have a lot of sort of legacy code, old systems, actually making those changes can take time. And so, there's always that struggle, right? How much do I spend fixing old things versus focusing on new releases? But, you know, the stats for Gen AI app flaws are even worse.

[00:07:27] Just 21% are currently being resolved with the risks of not doing so, including things like prompt injection, model manipulation, and data leakage. You know, we speak with security leaders all the time. And, you know, specifically the report that you were referring to our annual state of pentesting. We talked to a lot of our customers and found that 36% admit that Gen AI is moving faster than their teams can manage.

[00:07:57] And that's a sobering reality as organizations continue to embed AI deep into their core business operations. With the field of LLM security still evolving, many organizations actually lack the in-house talent to properly assess and remediate these new nuanced vulnerabilities. So, I think that really continues to be a problem for enterprises. They're using this new technology.

[00:08:21] They have to use it to speed up the pace of innovation, but they just don't have the in-house talent to properly assess the risk. And something else that stands out in that report is the very clear gap between large and small organizations, with enterprises taking more than twice as long to fix some very serious issues. So, what is it about scale that slows remediation down so dramatically?

[00:08:48] As I mentioned, large enterprises tend to have a lot of legacy code and tech debt makes it difficult to quickly fix issues. Right? As systems get older, fewer developers understand how to safely modify them. And fixing vulnerabilities in legacy systems is significantly more expensive and labor-intensive, and therefore it's often not addressed. So, I think that's what you see with large organizations, right?

[00:09:14] With smaller companies, one thing to consider is that they have more modern architectures and often more modern software development practices, really where they're embedding security into development. So, that makes it easier for them to fix issues. You know, the other thing we see is smaller customers often come to us because they need a pen test to be able to meet the needs of a customer or a partner.

[00:09:42] So, in these cases, every day the issue persists means a day of lost revenue. And so, their incentive to change, you know, to fix these codes is often because they want to win a deal or they want to be able to work with a new partner. So, that's something else to consider in terms of why smaller companies are able to remediate faster.

[00:10:07] And almost every organization is adopting generative AI now, but far less are regularly testing AI and LL systems. So, a question I've got to ask, which is almost a podcast episode completely on its own. Why is AI security lagging behind AI deployment inside most businesses? What is the big cause here? Really, I think it's just this lack of in-house talent. I mean, there's a lack of security expertise globally, as we all know, right?

[00:10:37] So, that's just the base case. And then, you know, you look at LLM security and that is something that most organizations do not have the talent to properly address. So, this leads to misplaced trust on vendors supplying foundational models who are often struggling to prioritize security fixes themselves. So, the reality is many organizations are just flying blind by using previous generation sort of automated scanners.

[00:11:04] Things like, you know, vulnerability scanners or DAS scanners that did a reasonable job in previous years, but don't have the expertise in our new AI area to really find the complex context-dependent flaws that are inherent in LLMs, right? Like sophisticated prompt objections or business logic abuse.

[00:11:29] Mitigating these risks really requires human ingenuity, human creativity, and an adversarial mindset to discover. That's something that's – that talent is really hard to find. And I think that's the biggest issue as to why AI security is lagging behind AI deployment.

[00:11:48] And something else I really wanted to highlight today is that LLM pen tests show the highest proportion of serious vulnerabilities and yet the lowest remediation rates. And what I find particularly shocking about this is it comes at a time where many businesses this year are excited about launching thousands upon thousands of agents out there as well.

[00:12:09] But what is it that makes these issues harder to fix and what risks does that leave sitting inside those production systems as we continuously add more onto that as well? Yeah, Neil, it is absolutely a scary stat. This is a very new area, as I previously mentioned, and advancements in attacks and defenses are changing daily, right?

[00:12:32] So this is why it's so essential to have a human perform this type of testing since they're always leaning into the latest attack methodologies. Because there's just so much change, finding things and fixing things can be complex. And it's not just about adding some input sanitization to a line of code, but it's about updating and iteratively testing your RAG to reduce the risk of hallucinations, right?

[00:12:58] So we've tested many, many LLMs, as you can imagine, with our large customer base. And I'm just surprised at the number of safety issues that arise, especially if you think about applications related to education or physical and emotional well-being. When the LLM is hallucinating or putting out just false information, it has a real impact on the humans that are interacting with that LLM.

[00:13:27] So the risk is actually quite scary. And I think really what we encourage enterprises to do is to really seek out experts who can help them with this. You know, automated scanners will not find everything, all of the vulnerabilities in LLMs, particularly because you've got that behavioral portion that you need humans to test.

[00:13:51] So it's really important that LLMs are not tested in the same way as any other application. And enterprise really take a look at what is the risk that they are introducing by using them and working with people that understand how to test them so that they're really, you know, releasing code that is safe for their customers.

[00:14:15] And another scary stat in the report there is that many leaders believe that their security posture is strong. But pen testing shows a different story and less than half of the vulnerabilities are actually remediated. So why does this confidence gap persist even among experienced security teams? What are you seeing here? Yeah, Neil, overall in the data set, there's a disappointing tendency to hit the low hanging fruit and then move on to other business priorities.

[00:14:43] But I do want to highlight that there are some bright spots. We found that about 60% of organizations actually resolve most of their serious findings. So while the average is that companies on the whole are only addressing about half of the vulnerabilities discovered, the individual behavior can be quite distinct, right?

[00:15:04] And in speaking with the companies that are successfully resolving over 90% of their vulnerabilities, they have one thing in common. And that's a continuous security program that is programmatically reducing risk. And the message I'm hearing talking to you today and having looked at the report is moving from ad hoc pen testing to programmatic risk reduction.

[00:15:30] But for people listening, to give them that valuable takeaway, what does that shift look like in practice? And how does continuous pen testing better align with the pace of modern software delivery? Because that pace is breathtakingly fast. But there's also the reality it might never move this slow again. So it is a tricky balance, isn't it? Absolutely.

[00:15:51] And not only is new code being released daily, but the threat landscape is also changing daily, right? And that's why continuous pen testing is so important, because it helps to maintain a strong security posture. But the assumption here is that the pen testing is not generating a continuous set of false positives or findings that have not been verified, right?

[00:16:18] One of the worst things you can do is hand over continuously garbage information to developers to fix. That just kills the cooperation between security and development. So automation is a key role to play here in continuous pen testing, right? There's no other way to efficiently do it.

[00:16:39] But it's super important to have a human in the loop to validate the findings and to use human creativity to find those issues that automation just cannot, right? So true resilience here really requires human-led pen testing to find those complex creative exploits that automated defenses miss so that organizations can move from assumption to evidence.

[00:17:05] Skilled pen testers use AI, use automation, but they leverage their experience to strip out that noise that tools produce and to address the threats that really do matter. Another important part of continuous pen testing is making sure that you're using the data to continuously improve, right? One of the key benefits of our model here at Cobalt is that we use findings from each test to improve the next.

[00:17:31] We use the data to identify trends, improve efficiency, and ultimately provide more accurate and actionable findings. A quick thank you to the sponsor that supports every podcast across the Tech Talks network and every episode. Because their help allows me to publish 60 interviews a month with founders and technologists who are keeping this industry moving. And this month I'm partnering with Alcor.

[00:17:56] And if you've ever tried to hire engineers in another country, you probably know just how painful it can be. Different laws, patchy support, and partners who don't truly understand engineering roles. So Alcor approaches this from a different tech point of view. They specialize in Eastern Europe and Latin America, and they're able to combine EOR capabilities with recruiting.

[00:18:21] So you get one partner handling everything, and they help you choose the best location for your stack, find developers with the right depth of experience, and run proper assessments so they can onboard people quickly. And they also give you a model that respects both transparency and margin. Most of your spend goes directly to your engineers, and the fee will decrease as the team expands.

[00:18:44] And you can even transition everyone in-house at that time when you're ready without having to worry about a penalty. And that structure is why a mix of early stage and unicorn stage companies use them as they scale. So if you want to take a look, visit alcor.com slash podcast, or tap on the link in the show notes. But now, on with today's show. Cool.

[00:19:07] And if we were to dare to look at the weeks, months, and years ahead, I think it's safe to predict that AI will inevitably accelerate both innovation and attack techniques. So again, for the people listening, security leaders in particular, how should they be rethinking pen testing so it does keep up with business velocity without creating new blind spots? Anything else that you can add to that that would just help them?

[00:19:32] Yeah, you know, we talked with the need to have this always-on pen testing program that leverages AI, but is also human-led. For many companies, this sounds daunting, right? The reality is that all of your assets need to be treated in the same way. Like, let's be real. No one has unlimited security budget or unlimited time to address all the findings.

[00:19:55] So what we tell security leaders, some of the key things they need to think about when rethinking their security program. One is create a program that makes sense for your organization based on your risk profile as well as your risk tolerance. The frequency, the scope, and the depth of testing are all variables that we can adjust to meet customer needs, right?

[00:20:22] So it's not continuously pen testing every single asset all the time. Really create a program that makes sense for your particular organization. The second thing we tell security leaders is demand quality from your vendors, right? As I had previously mentioned, there's nothing like a load of garbage findings to really turn off development teams. And then the third thing we tell our vendors is don't forget about your supply chain.

[00:20:49] Zero trust extends to the supply chain, and they need to aggressively manage vendors. Look at third-party models, platforms, and data, right? Organizations should demand evidence of security testing and vulnerability management, and ensure contractual agreements clearly define security responsibilities, right?

[00:21:09] So those are some of the key things that we tell our customers about really how to create these continuous pen testing program and what they should expect from it. Well, I've absolutely loved chatting with you today. And before I let you go, I always ask my guests, or I give them an opportunity to stand on my virtual soapbox and maybe bust a few myths or misconceptions that they see on their LinkedIn news feeds or in the news or anything at all around their work and their industry.

[00:21:39] There's a lot of myths out there at the moment, but what frustrates you that we can finally lay to rest today? Are there any myths about your job or your field of expertise that we can lay to rest once and for all? What would it be? I think it would be that security is never done. That would be the biggest myth, right? You see a lot of people talking about, oh, I've deployed this tool or I've done this technique and, you know, we've really improved our security posture.

[00:22:05] And yes, that may be true, but the reality is security is never done. It is a 24-7 job. It's a very hard job to do. And we talked about why, right? It tax surface is constantly changing with, you know, first it was, you know, DevSecOps. And releasing code quickly. Now you've got AI, which puts that on steroids, right? You've also got the attack surface, which is constantly changing, right?

[00:22:31] Either the attack surface, first of all, is increasing continuously with, you know, new code out there. But, you know, the attack vectors are changing. The threats are changing. The techniques are changing. And it's, you know, companies will never be 100% secure, right? So this is an ongoing job.

[00:22:52] Really, what companies need to have is a mindset of constant preparation, continuous risk assessment, and proactive testing of defenses to help them increase their security posture. The companies that are most successful at securing their digital footprint weave security into their daily operations and have metrics that show that they're getting better over time, right?

[00:23:19] So really, the key message is security is never done. You're never going to be 100% secure. The best you can do is be proactive about addressing risks, monitoring and manage risks, and making sure that you're getting better over time. Fantastic advice. Fantastic advice. So many big takeaways there.

[00:23:42] And for people listening, if they want to dig a little bit deeper on the report that we referenced there or keep up to speed with some of the big announcements, connect with you or your team, where would you like to point everyone listening that want to carry on this conversation we started today? Yeah, reach out to us. Look at our website, cobalt.io. So we've got a learning center where we've got lots of great information, including a link to the state of pen testing report that we do annually. So that's a great place.

[00:24:10] We'd love to continue the conversation. There's a lot here to unpack. Well, there were so many big key takeaways that we mentioned today from that state of the pen testing report, and we only scratched the surface. So for people listening, I'll have links to everything you mentioned there, including the report. Go check it out. Let me know your thoughts. Please let Cobalt know as well. Keep that conversation going. It's such a crucial topic right now.

[00:24:35] But more than anything, just thank you for sitting down with me today and bringing some of these big stats to life and also giving actionable takeaways as well. Thank you. Thank you, Neil, for having me on your show. It's been a pleasure. So where does this leave security leaders as AI accelerates both innovation and attack techniques? One of the many things that stayed with me today from this conversation is the idea that speed alone does not equal progress.

[00:25:03] Yet we're fixing serious vulnerabilities faster than we did a decade ago. And yes, tooling has improved. But execution is still lagging, especially in larger organizations that are still carrying years upon years of technical debt and competing priorities. And one of the big takeaways, I think, is simple but uncomfortable. Security is never finished.

[00:25:29] There isn't a moment where the job is done and the risk disappears. If only it did. And the teams that are making real progress, they're the ones weaving security into its daily operations, measuring improvement over time and accepting that preparation is work in progress. It's ongoing. So please check out the show notes where you'll find Cobalt's state of penetration pen testing reports and the research referenced throughout today's episode.

[00:25:59] I'd also love to hear your take on everything you heard. Are AI-driven threats forcing a rethink of how your organization approaches testing and remediation? Have you tried pen testing LLMs? Are you still relying on models built for a slower era? As always, techtalksnetwork.com. You'll find everything you need there, including how to get in touch with me. But other than that, it's time for me to go. So thank you, everyone, for listening.

[00:26:28] And I'll speak with you all again tomorrow. Bye for now.