3111: Unlocking the Power of Federated Learning in Business | The Tech Talks Network

What if your organization could unlock the full potential of AI without ever compromising on privacy or sharing sensitive data? In this episode of Tech Talks Daily, I am joined by Alexander Alten, Co-Founder and CEO of Scalytics, to explore how he is building the next-generation infrastructure layer for AI agents.

Alexander brings a wealth of expertise, having led data and product teams at industry giants like Cloudera, Allianz, and Healthgrades. With a background in startups such as X-Warp and Infinite Devices, he has a proven track record of developing customer-centric, data-driven solutions that not only disrupt conventional norms but also fuel measurable growth.

During our conversation at the IT Press Tour in Malta, Alexander introduces Scalytics Connect, a modern AI data platform designed to accelerate insights while preserving privacy. He unpacks the challenges of breaking down data silos and explains why centralizing data may not always be the optimal solution. We also demystify federated learning, shedding light on its potential to empower businesses, particularly in regulated industries, to collaborate on AI models without exposing their data.

The discussion extends to the value of open-source technologies and why they often emerge as long-term winners, citing examples like MySQL, Postgres, and WordPress. Alexander shares how Scalytics leverages open-source principles to provide scalable and transparent machine learning solutions for businesses looking to outperform in an increasingly data-driven world.

As AI continues to redefine the way we work and innovate, Alexander's insights provide a roadmap for navigating the complexities of decentralized machine learning, privacy-first AI, and scalable technology.

Could his approach to AI and data collaboration be the key to unlocking your organization's potential? Tune in to find out, and don't forget to share your thoughts on the future of AI-powered innovation.

[00:00:04] We're all living in an age where data is the driving force behind innovation. Yep, there's a lot of hype around AI, but without data, AI is next to useless.

[00:00:15] And I think the challenges of managing decentralized and regulated data environments have never been more pressing. Data is the lifeblood of every organization.

[00:00:27] So today I want to dive into the world of federated learning, decentralized computing and the future of AI driven insights.

[00:00:36] Because I want to explore how new emerging frameworks are actually breaking down data silos while also maintaining security and compliance.

[00:00:47] Providing solutions that bridge the gap between data access and regulatory constraints.

[00:00:54] So please, I invite you all to join me as we unravel some of these advancements and try and understand how they're shaping the next big wave in the IT industry.

[00:01:04] And to do that, I've got the perfect guest to join us today.

[00:01:08] So enough rambling from me. Let me introduce you to today's guest.

[00:01:13] So a big thank you for joining me today on Tech Talks Daily.

[00:01:17] Can you tell everyone listening a little about who you are, what you do?

[00:01:19] Hey, thank you for the invite. And yeah, so I'm Alex.

[00:01:23] So I'm co-founder of Skylytics and an open source project called Apache Vaillang.

[00:01:27] And we built an AI agent infrastructure based on federated learning.

[00:01:32] Sounds complicated, but we think that it's the next big wave which comes in the IT industry going from centralized to decentralized data.

[00:01:41] One of the things I love doing on this podcast is finding a little bit more about the story behind a company.

[00:01:47] So what's your origin story?

[00:01:49] Can you tell me what inspired its creation and what specific challenges in the data space you set out to solve with Skylytics?

[00:01:56] So yeah, I worked at Cloudera and we built a set Hadoop in 2012, something like that.

[00:02:06] So 13, 14, I left in 2060.

[00:02:10] And that was quite cool because we put a lot of pressure on databases and data marks and promised that you don't have a data silo.

[00:02:19] It's a bigger data silo.

[00:02:21] And as I left, I realized when I worked with Eon, you have not only one, you have typically hundreds or thousands of different data points.

[00:02:28] And it's really hard to get a real use out of it.

[00:02:33] And I visited Qatar on an exchange program in 2018 and met Joche, co-founder.

[00:02:45] And we, yeah, it was a remit.

[00:02:49] So we met first in Palo Alto, Stanford in 2014.

[00:02:54] And he started to investigate distributed computing and created an ID paper called Rheem.

[00:03:03] That is quite a little bit known in the scene.

[00:03:05] And Rheem was introduced on the Spark Summit in 2017 as a decentralized data processing engine.

[00:03:14] So that's quite really hard to do that.

[00:03:15] And he come really far.

[00:03:17] And that was the idea as Reconnected 2018 to say, the first thing, oh, that could be something for cloud computing.

[00:03:24] You have different clouds and it's expensive.

[00:03:26] And then you put it in the middle.

[00:03:27] And then you get and choose the workloads.

[00:03:29] And what do you use?

[00:03:30] And then, yeah, that was the main point.

[00:03:32] And then we started into decentralized computing.

[00:03:34] Didn't kick off so real.

[00:03:36] Then we thought about, yeah, scalitics, gate analytics.

[00:03:41] Okay, we built a scalitics stack around it.

[00:03:43] Didn't cook after.

[00:03:45] And as an event, more or less into the AI world.

[00:03:48] Because we realized pretty fast, AI will be easier, will be here, and will be forever here.

[00:03:54] But nobody really can, nobody is wrong.

[00:03:56] But not so much people can really deal with this.

[00:03:58] Because when you train models, you need to have data sets.

[00:04:01] Data is not centralized.

[00:04:03] It's just, you put it, because it generates somewhere.

[00:04:04] So, and that's something, yeah, which drives us to build an framework, which allows you to have new called AI agents, but small AI models for dedicated cars.

[00:04:16] And so you can communicate the security together and create a more sophisticated model.

[00:04:22] It's called federated learning, it's it.

[00:04:23] And obviously, AI has dominated headlines this year.

[00:04:27] It will dominate them next year as well.

[00:04:29] But data is required to bring that AI to life.

[00:04:32] And data silos is a topic we don't talk about enough.

[00:04:36] And scalitics, at scalitics, you emphasize breaking down those data silos without sharing data.

[00:04:43] So how does your platform achieve this?

[00:04:45] And why is this approach so important in the business and regulatory environment that we find ourselves right now?

[00:04:51] That's quite good.

[00:04:52] So typically, when you want to do something, you have to extract the later ETL, put it somewhere central, and you do something with that.

[00:04:58] So, and the problem is, first, you duplicate data.

[00:05:01] Second, you have different data sets after that.

[00:05:02] You have to clean it.

[00:05:03] It's a bunch of work.

[00:05:04] And you have ETL in the middle.

[00:05:05] And every admin knows ETL breaks always on Sundays.

[00:05:09] And then you have to rebuild everything for Monday because the report has to be done on Monday.

[00:05:13] So, and scalitics breaks your analytical query into subparts to the connected platforms

[00:05:20] and executed this on the platforms.

[00:05:23] It can be a Postgrease or a MySQL.

[00:05:25] It can be an Oracle or any GDPC.

[00:05:28] Of course, Spark slash Databricks and Kafka slash Confluent is Apache Flink.

[00:05:34] So, and the data stays on the source.

[00:05:37] Let's say you have two different Databricks clusters, which is not unusual.

[00:05:41] When you go to larger companies, you have nine or ten different stages.

[00:05:44] But everything has a different data set.

[00:05:47] And what they use is ETL parts in the middle to update the data and sync it, which is quite costly

[00:05:52] and needs a lot of resources.

[00:05:55] When you use Apache by Young in that case, so you can write your machine learning model,

[00:06:00] like it means, and you can execute subparts on the specific Databricks clusters.

[00:06:05] And the data never leaves this premise.

[00:06:06] So, that's quite important.

[00:06:08] When you think about healthcare, when you think about regulated industries who need to have data

[00:06:12] in a certain secured environment and they can't copy it to a cloud.

[00:06:17] It's not even that.

[00:06:18] When you think about CRM systems, combine HubSpot with Salesforce are not possible.

[00:06:24] So, you need API calls in the middle, typically ETL stuff, and a lot of things.

[00:06:27] So, when you use Skylitics and API connectors, then you could directly work with data from HubSpot.

[00:06:34] In fact, I know a company in Sweden or Denmark is working on that to connect HubSpot with Apache by Young

[00:06:43] and build up a CRM monitoring system out of it.

[00:06:47] Yeah.

[00:06:48] I don't know how far they are, but it looks like quite interesting.

[00:06:51] And, of course, decentralized machine learning and AI are often seen as incredibly complex

[00:06:56] and, indeed, cutting-edge concepts.

[00:06:58] But that complexity often scares a lot of business leaders.

[00:07:02] So, how would you demystify these for business leaders that are maybe listening,

[00:07:06] unfamiliar with the technical details or maybe afraid to approach?

[00:07:10] Yes.

[00:07:11] When you go in from a scientific prospect, federal learning is complex.

[00:07:15] That's what we strive to demystify.

[00:07:18] So, when you work with Skylitics, so it's quite as easy as you write an SQL query.

[00:07:23] So, at the moment, we are able to interpret any language and translate it to any language which is connected to.

[00:07:30] That means you could write your AI agent stuff in SQL, and you can work on TensorFlow or Spark or even Flink.

[00:07:38] And we would break it down into subparts and machine code on the different platforms.

[00:07:44] And that is the heavy lifting we do.

[00:07:46] For you as a customer, it should be as easy as it is because when it's too complicated, as you said,

[00:07:54] it's a risk and nobody wants to take it.

[00:07:57] You need more teams.

[00:07:57] And that would be really abstract and say, okay, no.

[00:08:00] Our part is from enterprise-driven company.

[00:08:04] We, as I said, we reduce the burden you have as a customer, but we want to make you successful.

[00:08:10] When you are successful, you are successful.

[00:08:12] So, you know, that's what...

[00:08:13] Yeah.

[00:08:13] And in your presentation today, one of the things that stood out to me is you argue that centralizing data doesn't make sense in most cases.

[00:08:21] So, can you walk me through the limitations and risks of centralized data systems and how you offer that better alternative?

[00:08:28] Because it's refreshing to hear because somebody is just going after the centralized data, and it's quite contrary to what you're saying here.

[00:08:34] So, when you have centralized data, you basically create a new data lake.

[00:08:37] So, it could be in the most stages when we work with companies, it's a data swamp.

[00:08:42] Yeah.

[00:08:42] A lot of things are in, and nobody knows really what to do, and it's just data because the marketing or the sales guys that store all data there.

[00:08:50] So, SAP was exactly the same paradigm.

[00:08:52] And the second thing is you can't have access to data you need.

[00:08:55] So, especially when you go into the streaming world and you have Kafka, Kafka typically can't access SAP.

[00:09:01] How should it be done?

[00:09:03] So, and in SAP, typically, when you go into larger enterprises, they have CRM systems or CRM data.

[00:09:10] They have operational data.

[00:09:11] They have potentially HR or sales or finance or whatever.

[00:09:17] But when you want now to build an, let's say, an AI model for pre-auditing to reduce the cost for your yearly audit from the big three or big four.

[00:09:28] Or at the moment, it's not possible.

[00:09:30] You need to extract in Excel sheets, and you know it potentially that they have a bunch of Excel, and you think around and you lose control.

[00:09:36] Or not control, but potentially insights.

[00:09:38] And using now a system like Scalytics would allow you to connect to different data sources or to the right data at the right time.

[00:09:47] And you can build your pre-modeling auditing stuff without moving data around.

[00:09:54] And, you know, that's the main thing.

[00:09:56] So, centralizing is, especially when we talk about the finance world and the HR world, is not often possible.

[00:10:01] And it's a gray zone, to be honest.

[00:10:03] Yeah.

[00:10:04] That's why most energy companies don't use so much centralized data lakes because they are critical infrastructures.

[00:10:12] They have certain regulations.

[00:10:14] They need to be data secure.

[00:10:15] The same is in healthcare.

[00:10:16] The same is in space.

[00:10:17] The same is in automotive, et cetera, pp.

[00:10:19] So, and using the centralized data is not that it's wrong.

[00:10:22] But for the right use case, it's quite an important step.

[00:10:27] But not for all use cases.

[00:10:29] And just to put all data into a central place doesn't make so much sense,

[00:10:32] even when you go into, let's say, autonomous driving.

[00:10:35] It wouldn't auto drones.

[00:10:36] It doesn't make sense to copy IoT data into a data warehouse on Amazon,

[00:10:41] do something and give it back to the drone.

[00:10:43] The drone is away.

[00:10:44] It needs to be central.

[00:10:45] And one of the things I always try and do on this podcast is demystify some of those complex phrases

[00:10:51] that business leaders may hear about.

[00:10:53] Again, they may find them overwhelming and daunting.

[00:10:55] And federated learning is a term that's completely gaining traction in the AI era and this year and next year.

[00:11:03] But can you explain exactly how you see federated learning?

[00:11:06] What is it?

[00:11:07] How does it work?

[00:11:08] And what are the advantages for businesses?

[00:11:10] What kind of ROI will it deliver on their projects, particularly in heavily regulated industries?

[00:11:15] Can you shed some light on this just to help people understand it?

[00:11:18] Like in the early days of big data, you put everything together and then you try to make the best out of it.

[00:11:25] And we see it's in the same thing.

[00:11:28] And centralizing data in data marts or data lake or whatever you name it,

[00:11:34] it's quite hard for developers to build successfully AI models or machine learning in that case

[00:11:41] because they don't have access to the right data.

[00:11:43] And then when you look at data catalogs, they have hopefully indexed a lot of your data or some of your data.

[00:11:50] Then you need to talk to people to get access to.

[00:11:54] And then when you think about regulated industries, you need to audit who has access to the data point for what reason.

[00:12:02] So now really the problem starts.

[00:12:06] So that is what we try to do.

[00:12:09] So we work mainly concentrate on an AI agent framework, which means that in Escalitix Edge, we call it Edge, you call it Agent or whatever.

[00:12:19] But this is a closed system which does a dedicated task and learns for this dedicated path or dedicated data.

[00:12:26] But that has no access to all data sets.

[00:12:28] So you have multiple agents in different stages and they communicate securely over a network.

[00:12:33] And then the output of this or of those models will be centralized or will be computed centralized on a bigger AI model.

[00:12:44] And this model will update the smaller models and then you have the cycle.

[00:12:47] So that means the real insight comes from the data on-premise, on data silos and never leaves this certain area, which is regulatory safe.

[00:12:57] But the insights from that can be used to build something new.

[00:13:01] Yeah.

[00:13:02] That is what we're building or what we, yeah.

[00:13:06] And also something that stood out in your presentation today is how you integrate open source principles.

[00:13:12] And just listening to you, I could see you're quite passionate about that.

[00:13:15] So why do you believe open source often emerges as this ultimate winner in technology?

[00:13:20] And how does this align with your overall vision at Scalitix?

[00:13:24] So when we look at the market, open source always wins.

[00:13:27] Yeah.

[00:13:28] Always.

[00:13:28] So you look at the most successful companies in the world based on open source.

[00:13:33] You look at WordPress.

[00:13:35] It's an open source project.

[00:13:37] Roughly, I think, 80% of all CRM systems run on WordPress.

[00:13:41] You call it MySQL, PostgreSQL.

[00:13:43] Big databases are used everywhere in the world.

[00:13:46] Open source-based.

[00:13:46] And we think, or we know that open source first is a model which has certain parts of security because the code is audible.

[00:13:55] You know what's in.

[00:13:56] When you find the bug, you can fix it.

[00:13:58] It's not like a closed system.

[00:13:59] And secondly, a lot of people bring new IDs.

[00:14:02] And these new IDs shape the product.

[00:14:04] And then you can take it and you can build a product out of it.

[00:14:06] Yeah, yeah.

[00:14:07] So that's the main sense of open source.

[00:14:09] And secondly, most important is quite also a guarantee system.

[00:14:15] When Scalitix would die out of money or that doesn't make it, the open source part is simply there.

[00:14:21] You as a customer, when you would have a Scalitix stack, you would be compatible with Apache Vyan.

[00:14:26] That means, okay, only is quite a little bit in, yeah, not really the only in the meaning of only, but you need to hire a few developers and you could run theoretically or practically the stuff on your own.

[00:14:38] You don't lose anything.

[00:14:39] When you have a closed system and the closed system, yeah, goes off the market, what do you do?

[00:14:44] You're in a bad place quite quickly.

[00:14:46] In that case, yeah.

[00:14:47] And Scalitix is designed to thrive in regulated industries.

[00:14:51] So to bring to life some of what we're talking about here, are you able to share any of the examples of how the platform supports businesses in sectors, whether it be finance, healthcare, energy, or any regulated industry?

[00:15:04] Any kind of use cases or examples you could share to bring to life what we're talking about?

[00:15:08] Oh, yes.

[00:15:09] So as an example, ESA, they have a system, it's called Agora.

[00:15:13] And Agora is based or is using Apache Vyan, which is the open source part of, and they use it for satellite image processing.

[00:15:21] So satellite images is quite a beast because images are different sources.

[00:15:25] You have radar penetration.

[00:15:27] You have real pictures like photograph pictures.

[00:15:30] You have Earth's penetration, radioactive, radio waves, whatever.

[00:15:36] And to build a system for scientists to get a real inside of certain parts of an area is hard.

[00:15:43] So we heard from TU Berlin for a scientific project.

[00:15:49] When they have PhDs working on different tasks, they need most of the time to get the right image and to do the analysis.

[00:15:57] And when they have the right image, mostly the time is over.

[00:16:00] Yes.

[00:16:01] And then they can't really work on that.

[00:16:03] So in Cable, the system based on graph database that you can tap into data in a more elegant way.

[00:16:10] So then it gets faster.

[00:16:11] You can do research.

[00:16:11] As an example, maybe how big is the forest in Arizona or the forest in Arizona 10 years ago and now.

[00:16:20] It's some problems when you go to coastlines, how the coastline changed over years.

[00:16:26] When you track fish swarms or whatever.

[00:16:29] So that's something which is called Earth observation.

[00:16:32] And it's using satellite images to process data to get insight out of it.

[00:16:37] That's one of them.

[00:16:39] So in the finance industry, it's using our stack, especially for anti-fraud, anti-terror AML.

[00:16:47] And also in the defense industry, everyone who has different data sets needs something to build more insights out of.

[00:16:56] Of certain data sets, which are not really accessible or can't be integrated into something.

[00:17:02] And I think this year we've also seen an almost AI gold rush with businesses rushing to adopt AI.

[00:17:08] Some have found it difficult to find that elusive ROI from their projects.

[00:17:13] And it's such a balancing act, making sure everything fits.

[00:17:15] So how are you able to accelerate insights while also maintaining compliance and privacy?

[00:17:21] I would imagine it is quite a balancing act.

[00:17:24] Is there anything you can share on how you do that?

[00:17:26] Of course, it's quite a cool thing.

[00:17:27] So when you use federated learning framework like Apache by Young or Scalutics, the commercial part of, you don't have a new platform.

[00:17:37] You don't need to free resources.

[00:17:39] Yes, you need a laptop or maybe a smaller system to run the main model.

[00:17:45] But that's all.

[00:17:45] You have already your system.

[00:17:46] So you have Apache Kafka installed typically and you have your Spark, Databricks, Hadoop, Data Warehouse, whatever stuff in your company.

[00:17:53] So you have everything what you need to build useful insights.

[00:17:56] But now what you miss is the piece to put all that together and to create something useful.

[00:18:01] So at the moment, the most companies in the world who earn the most money out of AI are the consulting companies.

[00:18:07] So I heard that Accenture makes four or five billion a year just for PowerPoint.

[00:18:13] Yeah.

[00:18:14] But, you know, that's not delivering anything.

[00:18:18] So what we do when we work with you, then we support the implementation phase and also train your stuff.

[00:18:26] And we help you to build better or faster insights to show you how it is at the end.

[00:18:31] But when you think about it, Skylytics is not more than a connector to your different data sources, allowing you to execute data where it is and gain new insights so that it's simply spoken.

[00:18:46] The complexity is behind the technology, of course.

[00:18:49] But what we're doing is more like an infrastructure framework in that case.

[00:18:55] 2025 is now literally just a couple of weeks ahead.

[00:18:58] So how do you see the future of data collaboration evolving and what role do you see yourself at Skylytics playing in shaping this landscape?

[00:19:07] And what's next on your innovation roadmap?

[00:19:09] You're probably locked down as to what you can share.

[00:19:11] But are there any teasers that you can share about the road ahead for you and how you see it?

[00:19:15] Yes, of course.

[00:19:16] So the next step is that we dig deeper into regulation and provide a model which helps you to fulfill regulation laws in a more efficient way.

[00:19:27] So we know when you have an AI model running and which data is assessed and we have a transparency out of the way.

[00:19:33] Then we can directly help you to create compliance rules, traceability, and a little bit of transparency.

[00:19:44] So it's not 100% transparency.

[00:19:46] It's not achievable yet because of a lot of research.

[00:19:49] But what we do, we can tell you who has accessed what, to what reason, when.

[00:19:57] And that it's at least, I would say, 40-50% of the transparency way because now you have at least a history of how your machine learning or your AI model excel at data.

[00:20:09] So that is what we plan for next year.

[00:20:12] And, of course, next year is our greater or the bigger rollout.

[00:20:18] So we will have a few customers to make real large infrastructures for AI modeling.

[00:20:25] It's also said some service agencies are interested to run our stack to build models for their customers exactly in a secure way that it's not used.

[00:20:35] Yeah, I said to have centralized data.

[00:20:38] Well, it sounds like you've got an exciting year ahead.

[00:20:40] We will be staying in touch.

[00:20:42] I hope to get you back on next year, see how things are evolving.

[00:20:45] But anyone listening wants to dig a little bit deeper, maybe want to check out your white paper, find out more about how you might be able to help them.

[00:20:52] Any way you'd like to point everyone listening to find out more information?

[00:20:55] Yeah, you can check scalytics.io is our website and also vyan.apache.org, which is the open source project.

[00:21:01] We have a meeting that's there.

[00:21:02] And, of course, when you are in GitHub, just check the GitHub, give us the R, and we are happy.

[00:21:07] Well, so many big talking points from our conversation today.

[00:21:10] And I hope anybody listening checks you out, learns a little bit more about what you're doing, love what you've accomplished so far.

[00:21:16] And it is an incredibly exciting year next year.

[00:21:18] So we'll be chatting next year.

[00:21:20] But more than anything, thank you for stopping by and speaking with me today.

[00:21:22] Thank you for the invite.

[00:21:23] It was nice to meet you here in Malta.

[00:21:25] And I hope we talk soon.

[00:21:26] I think as we wrap up our discussion today, one thing is certainly clear.

[00:21:29] The shift from centralised to decentralised data systems holds immense promise, especially for industries that are navigating some of the complex challenges that we raise today.

[00:21:40] And by enabling smarter, more secure data collaboration, these technologies may be poised to redefine how businesses harness the power of AI.

[00:21:50] But what are your thoughts on the rise of federated learning and its potential to transform regulated industries?

[00:21:58] As always, let me know.

[00:22:00] Email me, techblogwriteroutlook.com, Instagram, X, LinkedIn, just at Neil C. Hughes.

[00:22:06] Let me know and let's keep this conversation going.

[00:22:09] But I'm afraid we've reached the end of this episode.

[00:22:13] So thank you as always for tuning in to Tech Talks Daily.

[00:22:16] Until next time, please stay curious, stay inspired and remember to join me again tomorrow.

[00:22:22] We'll do it all again with another guest on a completely different topic.

[00:22:26] Hopefully, I will speak with you all then.

[00:22:28] Bye for now.