Federated Learning: Rethinking AI Infrastructure with Scalytics | The Tech Talks Network

As AI agents begin to influence how businesses operate, there's growing urgency around building infrastructure that supports their complexity without adding new risks. In this episode of IT Infrastructure as a Conversation, I speak with Alexander Alten, Co-Founder and CEO of Scalytics, about the architecture powering the next generation of AI and machine learning systems.

Alexander’s journey includes leadership roles at Cloudera, Allianz, and Healthgrades, and a deep commitment to building scalable, privacy-respecting technologies. At Scalytics, he's helping organizations avoid the limitations of centralizing data by building distributed systems that support federated learning. Rather than extracting and duplicating data across systems, Scalytics enables analysis directly at the source, making it easier for businesses in regulated industries to innovate with confidence.

Recorded live at the IT Press Tour in Malta, our conversation dives into the origins of Scalytics Connect, the company's AI agent infrastructure that leverages open-source frameworks like Apache Wayang. We explore why ETL pipelines often create fragility instead of flexibility, how decentralization supports both compliance and collaboration, and why open-source technologies continue to outperform closed systems over the long term.

For any CIO, CTO, or data architect looking to align AI capabilities with real-world constraints, Alexander’s perspective offers a refreshingly pragmatic path forward. His framework simplifies the complexity of federated machine learning while preserving data sovereignty, auditability, and future-proof flexibility.

If your organization is struggling with data silos, regulatory friction, or the scaling of AI models, this episode offers insight into a model that avoids duplication, improves trust, and accelerates results by treating infrastructure as the foundation for intelligent systems.

[00:00:00] Today I want to dive into the world of Federated Learning, Decentralized Computing and the future of AI-driven insights. Because I want to explore how new emerging frameworks are actually breaking down data silos while also maintaining security and compliance. Providing solutions that bridge the gap between data access and regulatory constraints.

[00:00:27] So please I invite you all to join me as we unravel some of these advancements and try and understand how they're shaping the next big wave in the IT industry. And to do that, I've got the perfect guest to join us today. So enough rambling from me. Let me introduce you to today's guest. So a big thank you for joining me today on Tech Talks Daily. Can you tell everyone listening a little about who you are, what you do?

[00:00:52] Rethinking AI, Ph.D.: Hey, thank you for the invite. And yeah, so I'm Alex, co-founder of Scalytics and an open source project called Apache Vaillang. And we built an AI agent infrastructure based on Federated Learning. Sounds complicated, but we think that it's the next big wave which comes in the IT industry going from centralized to decentralized data. Rethinking AI, Ph.D.: One of the things I love doing on these podcasts is finding a little bit more about the story behind a company.

[00:01:20] So what's your origin story? Can you tell me what inspired its creation and what specific challenges in the data space you set out to solve with Scalytics? Rethinking AI, Ph.D.: So yeah, I've worked at Cloudera and we built a set Hadoop in 2012, something like that.

[00:01:39] So 13, 14, I left in 2060. And that was quite cool because we put a lot of pressure on databases and data marks and promised that you don't have a data silo. Rethinking AI, Ph.D.: And as I left, I realized when I worked with Eon, you have not only one, you have typically hundreds or thousands of different data points.

[00:02:01] And it's really hard to get a real use out of it. And I visited Qatar on an exchange program in 2018 and met Joche, co-founder. Rethinking AI, Ph.D.: And we, yeah, it was a remit. So we met first in Palo Alto, Stanford in 2014.

[00:02:27] And he started to investigate distributed computing and created an ID paper called the Rheem Origin. That is quite a little bit known in the scene. And Rheem was introduced on the Spark Summit in 2017 as a decentralized data processing engine. Rethinking AI, Ph.D.: So that's quite really hard to do that. And he come really far. And that was the idea as Reconnected 2018 to say in the first thing, oh, that could be something for cloud computing.

[00:02:56] Rethinking AI, Ph.D.: You have different clouds and it's expensive. And then you put it in the middle and then you can choose the workloads and what do you use? Rethinking AI, Ph.D.: And then, yeah, that was the main point. And then we started into decentralized computing, didn't kick off so real. Rethinking AI, Ph.D.: So we thought about, yeah, scalitics, gate analytics. Okay. We built a scalitics stack around it, didn't cook off too. And then we went more or less into the AI world.

[00:03:21] Rethinking AI, Ph.D.: Because we realized pretty fast, AI will be easier, will be here and will be forever here. But nobody really can, nobody is wrong. Rethinking AI, Ph.D.: But not so much people can really deal with this because when you train models, you need to have data sets. Data is not centralized. It's just you put it because it generates somewhere.

[00:03:37] So, and then it's something which drives us to build a framework, which allows you to have new called AI agents, but small AI models for dedicated cars and so you can communicate the security together and create a more sophisticated model. It's called federated learning.

[00:03:55] Rethinking AI, Ph.D.: And obviously AI has dominated headlines this year. It will dominate them next year as well. But data is required to bring that AI to life. And data silos is a topic we don't talk about enough. And scalitics, at scalitics, you emphasize breaking down those data silos without sharing data. So how does your platform achieve this? And why is this approach so important in the business and regulatory environment that we find ourselves right now?

[00:04:24] Rethinking AI, Ph.D.: That's quite good. So typically, when you want to do something, you have to extract the later ETL, put it somewhere central and then do something with that stuff. So, and the problem is first you duplicate data. Second, you have different data sets after that. You have to clean it. It's a bunch of work and you have ETL in the middle. And every admin knows ETL breaks always on Sundays. And then you have to rebuild everything for Monday because the report has to be done on Monday.

[00:04:45] Rethinking AI, Ph.D.: So, and scalitics breaks your analytical query into subparts to the connected platforms and executed this on the platforms. It can be a Postgrease or a MySQL. It can be an Oracle or any GDBC.

[00:05:01] Rethinking AI, Ph.D.: Of course, spark slash Databricks and Kafka slash Confluent is Apache link. So, and the data stays on the source. Let's say you have two different Databricks clusters, which is not unusual. When you go to larger companies, you have nine or 10 different stages, but everything has a different data set.

[00:05:20] Rethinking AI, Ph.D.: And what they use is ETL parts in the middle to update the data and sync it, which is quite costly and needs a lot of resources. When you use Apache Bayan in that case, so you can write your machine learning model like it means, and you can execute subparts on the specific Databricks clusters. And the data never leaves this premise. So that's quite important.

[00:05:41] Rethinking AI, Ph.D.: When you think about healthcare, when you think about regulated industries who need to have data in a certain secured environment and they can't copy it to a cloud. It's not even that when you think about CRM systems, combine HubSpot with Salesforce, it's not possible. So you need API calls in the middle, typically ETL stuff, and a lot of things. So when you use scalitics and API connectors, then you could directly work with data from HubSpot.

[00:06:07] Rethinking AI, Ph.D.: So in fact, I know a company in Sweden or Denmark is working on that to connect HubSpot with Apache Bayan and build up a CRM monitoring system out of it. Rethinking AI, Ph.D.: Yeah. Rethinking AI, Ph.D.: I don't know how far they are, but it looks like quite interesting. Rethinking AI, Ph.D.: And of course, decentralized machine learning and AI are often seen as incredibly complex and indeed cutting edge concepts, but that complexity often scares a lot of business leaders.

[00:06:34] So how would you demystify these for business leaders that are maybe listening, unfamiliar with the technical details or maybe afraid to approach? Rethinking AI, Ph.D.: Yes. When you go in from a scientific prospect, federal learning is complex. That's what we strive to demystify. So when you work with an SQLite, so it's quite as easy as you write an SQL query. Rethinking AI, Ph.D.: So at the moment we are able to interpret any language and translate it to any language which is connected to.

[00:07:03] That means you could write your AI agent stuff in SQL and you can work on TensorFlow or Spark or even Flink. Rethinking AI, Ph.D.: And we would break it down into subparts and machine code on the different platforms. And that is the heavy lifting we do. For you as a customer, it should be as easy as it is because when it's too complicated, as you said, it's a risk and nobody wants to take it. You need more teams.

[00:07:30] And that would be really abstract and say, okay, no, our part is from enterprise driven company. We, yeah, I said we reduce the burden you have as a customer, but we want to make you successful. When you are successful, we are successful. You know, it will. Rethinking AI, Ph.D.: Yeah. And in your presentation today, one of the things that stood out to me is you argue that centralizing data doesn't make sense in most cases.

[00:07:54] So can you walk me through the limitations and risks of centralized data systems and how you offer that better alternative? Because it's refreshing to hear because somebody are just going after the centralized data and it's quite contrary to what you're saying here. Rethinking AI, Ph.D.: So when you have centralized data, you basically create a new data lake. So it could be in the most stages when we work with companies, it's a data swamp. Rethinking AI, Ph.D.: Yeah. A lot of things are in and nobody knows really what to do. And it's just data because the marketing or the sales guys had stored all data there.

[00:08:22] So as you build Hadoop, it's exactly the same paradigm. And the second thing is you can't have access to data you need. So especially when you go into the streaming world and you have Kafka, Kafka typically can't access SAP. How should it be done? Rethinking AI, Ph.D.: Yeah. So and in SAP, typically when you go into larger enterprises, you have CRM systems or CRM data. They have operational data. They have potentially HR or sales or finance or whatever. Rethinking AI, Ph.D.: Yeah.

[00:08:50] So, but when you want now to build an, let's say an AI model for pre-auditing to reduce the cost for your yearly audit from the big three or big four. At the moment, it's not possible. You need to extract in Excel sheets and you know it potentially that they have a bunch of Excel and you think around and you lose control. Or not control, but potentially insights. And using now a system like Skeletics would allow you to connect to different data sources or to the right data at the right time.

[00:09:20] And you can build your pre-modeling auditing stuff without moving data around. And you know that it's the main thing. So centralizing is, especially when we talk about the finance world and the HR world, is not often possible. And it's a gray zone, to be honest. Yeah.

[00:09:42] Rethinking AI, Ph.D.: Yeah. To say autonomous driving. You wouldn't order drones.

[00:10:09] It doesn't make sense to copy IoT data into a data warehouse on Amazon. Do something and give it back to the drone. The drone is away. It needs to be central. And one of the things I always try and do on this podcast is demystify some of those complex phrases that business leaders may hear about. Again, they may find them overwhelming and daunting. And federated learning is a term that's completely gaining traction in the AI era and this year and next year. But can you explain exactly how you see federated learning?

[00:10:39] What is it? How does it work? And what are the advantages for businesses? What kind of ROI will it deliver on their projects, particularly in heavily regulated industries? Can you shed some light on this just to help people understand it? Like in the early days of big data, you put everything together and then you try to make the best out of it. And we see it's in the same thing.

[00:11:01] And centralizing data in data marts or data lake or whatever you name it, it's quite hard for developers to build successful AI models or machine learning in that case. Because they don't have access to the right data. And then when you look at data catalogs, they have hopefully indexed a lot of your data or some of your data. And then you need to talk to people to get access to.

[00:11:26] And then when you think about regulated industries, you need to audit who has access to the data point for what reason. So now really the problem starts. So that is what we try to reduce. So we mainly concentrate on an AI agent framework, which means that in Escalitix Edge, we call it Edge, you call it Agent or whatever.

[00:11:52] But this is a closed system, which does a dedicated task and learns for this dedicated path of dedicated data. But that has no access to all data sets. So you have multiple agents in different stages and they communicate securely over a network. And then the output of this, all of those models will be centralized or will be computed centralized on a bigger AI model. And this model will update the smaller models and then you have the cycle.

[00:12:20] So that means the real insight comes from the data on on-premise, on data silos and never leaves this certain area, which is regulatory safe. But the insights from that can be used to build something new. Yeah. That it's what we're building or what we, yeah. And also something that stood out in your presentation today is how you integrate open source principles. And just listening to you, I could see you're quite passionate about that.

[00:12:48] So why do you believe open source often emerges as this ultimate winner in technology? And how does this align with your overall vision at Escalitix? So when we look at the market, open source always wins. Yeah. Always. So you look at the most successful companies in the world based on open source. You look at WordPress, this open source project. Roughly, I think 80% of all CM systems run on WordPress. You call it MySQL, PostgresSQL.

[00:13:16] Big databases are used everywhere in the world. Open source space. And we think or we know that open source first is a model which has certain parts of security because the code is audible. You know what's in. When you find the bug, you can fix it. It's not like a closed-out system. And secondly, a lot of people bring new IDs. And these new IDs shape the product. And then you can take it and you can build a product out of it. Yeah, yeah. So that's the main sense of open source.

[00:13:42] And secondly, most important is quite also an guarantee system. When Escalitix would die out of money or that doesn't make it, the open source part is simply there. You as a customer, when you would have Escalitix stack, you would be compatible with Apache Viyang. That means, okay, only is quite a little bit in, not really the only in the meaning of only, but you need to hire a few developers and you could run theoretically or practically the stuff on your own. You don't lose anything.

[00:14:12] When you have a closed system and the closed system goes off the market, what do you do? You're in a bad place quite quickly. Bad case, yeah. Sure. And Escalitix is designed to thrive in regulated industries. So to bring to life some of what we're talking about here, are you able to share any of the examples of how the platform supports businesses in sectors, whether it be finance, healthcare, energy or any regulated industry? Any kind of use cases or examples you could share to bring to life what we're talking about?

[00:14:41] Oh, yes. So as example, ESA, they have a system, it's called Agora. And Agora is based or is using Apache Viyang, which is the open source part of, and they use it for satellite image processing. So satellite images is quite a beast because images are different sources. You have radar penetration. You have real pictures like photograph pictures. You have earth penetration, radioactive radio waves, whatever.

[00:15:09] And to build a system for scientists to get a real inside of certain parts of an area is hard. So we heard from TU Berlin for a scientific project when they have PhDs working on different tasks, they need most of the time to get the right image and to do the analysis. And when they have the right image, mostly the time is over. Yes. And then they can't really work on that.

[00:15:36] So they build a system based on graph database that you can tap into data in a more elegant way. So then it gets faster. You can do research, as an example, maybe how big is the forest in Arizona or the forest in Arizona 10 years ago. And now it's some problems when you go to coastlines, how the coastline changed over years, when you track fish swarms or whatever. So that's something which is called Earth Observation.

[00:16:05] And it's using satellite images to process data to get inside out of it. That's one of them. So the finance industry is using our stack, especially for anti-fraud, anti-terror AML, and also in the defense industry. Everyone who has different data sets needs something to build more insights out of certain data sets which are not really accessible or can't be integrated into something.

[00:16:35] And I think this year we've also seen an almost AI gold rush with businesses rushing to adopt AI. Some have found it difficult to find that elusive ROI from their projects. And it's such a balancing act, making sure everything fits. So how are you able to accelerate insights while also maintaining compliance and privacy? I would imagine it is quite a balancing act. But is there anything you can share on how you do that? Of course, it is quite a cool thing.

[00:17:00] So when you use a federated learning framework like Apache by Young or Skylutics, the commercial part of, you don't have a new platform. You don't need to free resources. Yes, you need a laptop or maybe a smaller system to run the main model. But that's all. You have already your system. So you have Apache Kafka installed typically and you have your Spark, Databricks, Hadoop, Data Warehouse, whatever stuff in your company. So you have everything what you need to build useful insights.

[00:17:29] But now what you miss is the piece to put all that together and to create something useful. And at the moment, the most companies in the world who earn the most money out of AI are the consulting companies. So I heard Accenture makes $4 or $5 billion a year just for PowerPoint. Yeah. But, you know, that's not delivering anything. So when we work with you, we support the implementation phase and also train your staff.

[00:17:59] And we help you to build better or faster insights to show you how it is at the end. When you think about it, Skylutics is not more than a connector to your different data sources, allowing you to execute data where it is and gain new insights so that it's simply spoken. The complexity is behind the technology, of course. But that it's what we're doing is more like an infrastructure framework in that case.

[00:18:28] 2025 is now literally just a couple of weeks ahead. So how do you see the future of data collaboration evolving? And what role do you see yourself at Skylutics playing in shaping this landscape? And what's next on your innovation roadmap? You're probably locked down as to what you can share. But are there any teasers that you can share about the road ahead for you and how you see it? Yes, of course.

[00:18:49] So the next step is that we dig deeper into regulation and provide a model which helps you to fulfill regulation laws in a more efficient way. So we know when you have an AI model running and which data is assessed and we have a transparency auditable way. Then we can directly help you to create compliance rules, traceability, and a little bit of transparency. So it's not 100% transparency.

[00:19:18] It's not achievable yet because of a lot of research. But what we do, we can tell you who has accessed what, to what reason, when. And that it's at least, I would say, 40-50% of the transparency way because now you have at least a history of how your machine learning or your AI model extract data. So that is what we plan for next year.

[00:19:45] And, of course, so next year is our greater or the bigger rollout. So we will have a few customers to make real large infrastructures for AI modeling. It's also said some service agencies are interested to run our stack to build models for their customers exactly in a secure way that it's not used. Yeah. I said to have centralized data. Well, it sounds like you've got an exciting year ahead. We will be staying in touch.

[00:20:15] I hope to get you back on next year, see how things are evolving. But anyone listening wants to dig a little bit deeper, maybe want to check out your white paper, find out more about how you might be able to help them. Any way you'd like to point everyone listening to find out more information? Yeah, you can check scalytics.io is our website and also vayan.apache.org, which is the open source project. We have a meeting there. And, of course, when you are in GitHub, just check the GitHub, give us the R, and we are happy. Well, so many big talking points from our conversation today.

[00:20:43] And I hope anybody listening checks you out, learns a little bit more about what you're doing, love what you've accomplished so far. And it is an incredibly exciting year next year. So we'll be chatting next year. But more than anything, thank you for stopping by and speaking with me today. Thank you for the invite. It was nice to meet you here in Malta, and I hope we talk soon. I think as we wrap up our discussion today, one thing is certainly clear. The shift from centralised to decentralised data systems holds immense promise, especially for industries that are navigating some of the complex challenges that we raise today.

[00:21:13] And by enabling smarter, more secure data collaboration, these technologies may be poised to redefine how businesses harness the power of AI. But what are your thoughts on the rise of federated learning and its potential to transform regulated industries? As always, let me know. Email me, techblogwriteroutlook.com, Instagram, X, LinkedIn, just at Neil C. Hughes. Let me know, and let's keep this conversation going.

[00:21:42] But I'm afraid we've reached the end of this episode. So until next time, please stay curious, stay inspired, and remember to join me again tomorrow. We'll do it all again with another guest on a completely different topic. Hopefully, I will speak with you all then.