Or try one of the following: 詹姆斯.com, adult swim, Afterdawn, Ajaxian, Andy Budd, Ask a Ninja, AtomEnabled.org, BBC News, BBC Arabic, BBC China, BBC Russia, Brent Simmons, Channel Frederator, CNN, Digg, Diggnation, Flickr, Google News, Google Video, Harvard Law, Hebrew Language, InfoWorld, iTunes, Japanese Language, Korean Language, mir.aculo.us, Movie Trailers, Newspond, Nick Bradbury, OK/Cancel, OS News, Phil Ringnalda, Photoshop Videocast, reddit, Romanian Language, Russian Language, Ryan Parman, Traditional Chinese Language, Technorati, Tim Bray, TUAW, TVgasm, UNEASYsilence, Web 2.0 Show, Windows Vista Blog, XKCD, Yahoo! News, You Tube, Zeldman
Snowflake software update caused 13-hour outage across 10 regions | InfoWorld
Technology insight for the enterpriseSnowflake software update caused 13-hour outage across 10 regions 19 Dec 2025, 2:36 pm
A software update knocked out Snowflake’s cloud data platform in 10 of its 23 global regions for 13 hours on December 16, leaving customers unable to execute queries or ingest data.
Customers saw “SQL execution internal error” messages when trying to query their data warehouses, according to Snowflake’s incident report. The outage also disrupted Snowpipe and Snowpipe Streaming file ingestion, and data clustering appeared unhealthy.
[ Related: More Snowflake news and insights ]
“Our initial investigation has identified that our most recent release introduced a backwards-incompatible database schema update,” Snowflake wrote in the report. “As a result, previous release packages errantly referenced the updated fields, resulting in version mismatch errors and causing operations to fail or take an extended amount of time to complete.”
The outage affected customers in Azure East US 2 in Virginia, AWS US West in Oregon, AWS Europe in Ireland, AWS Asia Pacific in Mumbai, Azure Switzerland North in Zürich, Google Cloud Platform Europe West 2 in London, Azure Southeast Asia in Singapore, Azure Mexico Central, and Azure Sweden Central, the report said.
Snowflake initially estimated service would be restored by 15:00 UTC that day, but later revised it to 16:30 UTC as the Virginia region took longer than expected to recover.
The company offered no workarounds during the outage, beyond recommending failover to non-impacted regions for customers with replication enabled.
It said it will share a root cause analysis (RCA) document within five working days.
“We do not have anything to share beyond this for now,” the company said.
Why multi-region architecture failed to protect customers
The type of failure that hit Snowflake — a backwards-incompatible schema change causing multi-region outages — represents a consistently underestimated failure class in modern cloud data platforms, according to Sanchit Vir Gogia, chief analyst at Greyhound Research. Schema and metadata sit in the control plane layer that governs how services interpret state and coordinate behavior across geographies, he said.
“Regional redundancy works when failure is physical or infrastructural. It does not work when failure is logical and shared,” Gogia said. “When metadata contracts change in a backwards-incompatible way, every region that depends on that shared contract becomes vulnerable, regardless of where the data physically resides.”
The outage exposed a misalignment between how platforms test and how production actually behaves, Gogia said. Production involves drifting client versions, cached execution plans, and long-running jobs that cross release boundaries. “Backwards compatibility failures typically surface only when these realities intersect, which is difficult to simulate exhaustively before release,” he said.
The issue raises questions about Snowflake’s staged deployment process. Staged rollouts are widely misunderstood as containment guarantees when they are actually probabilistic risk reduction mechanisms, Gogia said. Backwards-incompatible schema changes often degrade functionality gradually as mismatched components interact, allowing the change to propagate across regions before detection thresholds are crossed, he said.
Snowflake’s release documentation describes a three-stage deployment approach that “enables Snowflake to monitor activity as accounts are moved and respond to any issues that may occur.” The documentation states that “if issues are discovered while moving accounts to a full release or patch release, the release might be halted or rolled back,” with follow-up typically completed within 24 to 48 hours. The December 16 outage affected 10 regions simultaneously and lasted well beyond that window.
“When a platform relies on globally coordinated metadata services, regional isolation is conditional, not absolute,” Gogia said. “By the time symptoms become obvious, rollback is no longer a simple option.”
Rollback presents challenges because while code can be rolled back quickly, state cannot, Gogia said. Schema and metadata changes interact with live workloads, background services, and cached state, requiring time, careful sequencing, and validation to avoid secondary corruption when reversed.
Security breach and outage share common weakness
The December outage combined with Snowflake’s security troubles earlier in 2024 should fundamentally change how CIOs define operational resilience, according to Gogia. In mid-2024, approximately 165 Snowflake customers were targeted by criminals using stolen credentials from infostealer infections.
“These are not separate incidents belonging to different risk silos. They are manifestations of the same underlying issue: control maturity under stress,” Gogia said. “In the security incidents, stolen credentials exploited weak identity governance. In the outage, a backwards-incompatible change exploited weak compatibility governance.”
CIOs need to move beyond compliance language and uptime averages to ask behavioral questions about how platforms behave when assumptions fail, Gogia said. “The right questions are behavioral. How does the platform behave when assumptions fail. How does it detect emerging risk. How quickly can blast radius be constrained.”
AI-assisted coding creates more problems – report 19 Dec 2025, 9:00 am
AI code generation appears to have a few kinks to work out before it can fully dominate software development, according to a new report by CodeRabbit. When compared to human-generated code, AI code created 1.7 times more problems discovered in pull-request analysis, according to the report.
AI coding assistants have become a standard part of the software development workflow, but developers have raised alarms, the report said. On average, pull requests for AI-generated code found 10.83 issues per pull request, while human-generated code contained an average of 6.45, said CodeRabbit. Pull requests for AI-coauthored code also showed a higher spike in issues. However, according to CodeRabbit, distribution was the more important story: AI-generated pull requests had a much heavier tail, meaning they produced far more “busy” reviews. And AI pull requests were harder to review in multiple ways. Teams adopting AI coding tools should expect higher variance and more frequent spikes in pull-request issues that demand deeper scrutiny, according to the report.
Overall, pull requests of AI-generated code found the highest number of issues related to logic and correctness. But within every major category including correctness, maintainability, security, and performance, AI co-authored code consistently generated more issues than code generated by humans alone, said the report.
In the report released on December 17, CodeRabbit said it had analyzed 470 open source GitHub pull requests including 320 AI-co-authored pull requests and 150 that were likely generated by humans alone. In the blog post introducing the report, the company said the results were, “Clear, measurable, and consistent with what many developers have been feeling intuitively: AI accelerates output, but it also amplifies certain categories of mistakes.” The report also found security issues increasing consistently in AI co-authored pull requests. While none of the noted vulnerabilities were unique to AI-generated code, they appeared significantly more often, increasing the overall risk profile of AI-assisted development. AI makes dangerous security mistakes that development teams must get better at catching, advised the report.
There were, however, some advantages with AI, said the report. Spelling errors were almost twice as common in human-authored code (18.92 vs. 10.77). This might be because human coders write far more inline prose and comments, or it could just be that developers were “bad at spelling,” the report speculated. Testability issues also appeared more frequently in human code (23.65 vs. 17.85).
Nonetheless, the overall findings indicate that guardrails are needed as AI-generated code becomes a standard part of the workflow, CodeRabbit said. Project-specific context should be provided up-front, with models accessing constraints, such as invariants, config patterns, and architectural rules. To reduce issues with readability, formatting, and naming, strict CI rules should be applied. For correctness, developers should require pre-merge tests for any non-trivial control flow. Security defaults should be codified. Also, developers should encourage idiomatic data structures, batched I/O, and pagination. Smoke tests should be done for I/O-heavy or resource-sensitive paths. AI-aware pull-request checklists should be adopted, and a third-party code review tool should be used.
Other findings from the report include the following:
- Severity escalates with AI, with more critical and major issues happening.
- AI introduced nearly two times more naming inconsistencies; unclear naming, mismatched terminology, and generic identifiers appeared frequently.
- AI code “looks right” at a glance but often violates local idioms or structure.
- AI-generated code often created issues correlated to real-world outages.
- Performance regressions are rare but are disproportionately AI-driven.
- Incorrect ordering, faulty dependency flow, or misuse of concurrency primitives appeared far more frequently in AI pull-requests.
- Formatting problems were 2.66 times more common in the AI pull requests.
Why your next cloud bill could be a trap 19 Dec 2025, 9:00 am
A few months ago, I worked with a global manufacturer that considered itself conservative on AI. They focused on stabilizing their ERP migration to the cloud, modernizing a few key customer-facing apps, and tightening security. The CIO’s position on generative AI was clear: “We’ll get there, but not this year. We’re not ready.”
On paper, they were officially “not doing AI.” In reality, they were already deeply involved. Their primary cloud provider had quietly integrated AI-native features into the services they were currently using. The search service they adopted for a new customer portal came with semantic and vector modes turned on by default. Their observability platform was now AI-assisted, altering logs and telemetry processing. Even their database service had a new “AI integration” checkbox in the console, which developers began enabling because it appeared useful and was inexpensive to try.
Six months later, their infrastructure bill had risen sharply, and their architecture had become so integrated with that provider’s vector engine and AI tools that shifting away had become dramatically harder. Key data stores were now optimized around that provider’s vector engine. Workflows were wired into proprietary AI agents and automation tools. The CIO’s team woke up to a hard truth: They had unintentionally become an AI-focused organization, more locked in than ever.
Whether you asked for it or not
For years, we have talked about cloud-first strategies, with the big hyperscalers competing on compute, storage, databases, and global reach. Generative AI changed the game. The center of gravity is shifting from generic infrastructure to AI-native platforms: GPUs, proprietary foundation models, vector databases, agent frameworks, copilots, and AI-integrated everything.
You can see the shift in how providers talk about themselves. Earnings calls now highlight GPU and AI accelerator spending as the new core investment. Homepages and conferences lead with AI platforms, copilots, and agentic AI, while traditional IaaS and PaaS take a back seat. Databases, developer tools, workflow engines, and integration services are all being refactored or wrapped with AI capabilities that are enabled by default or just a click away.
At first glance, this appears to be progress. You see more intelligent search, auto-generated code, anomaly detection, predictive insights, and AI assistants integrated into every console. However, behind the scenes, each of these conveniences typically relies on proprietary APIs, opinionated data formats, and a growing assumption that your workloads and data will stay within that cloud.
A bigger problem than you realize
Lock-in is not new. We have always had to balance managed services with portability. The difference now is the depth and systemic nature of AI-native lock-in. When you couple your workloads to a provider’s proprietary database, you can often extract the data and re-platform with effort. When you couple your entire data platform, embeddings, fine-tuned models, agent workflows, and security posture to a single AI stack, the cost and time to exit increase by an order of magnitude.
Training and inference pipelines are expensive to rebuild. Vector indexes and embeddings may be tied to a provider’s specific implementation. Agent frameworks are increasingly integrated with that cloud’s eventing, identity, and security systems. Once you start relying on a provider’s proprietary model behavior and tool ecosystem, you are no longer just “using compute.” You are buying into their approach to AI.
What worries me most is that many enterprises are drifting into this locked-in position rather than choosing it. Teams turn on AI-native features because they come bundled with existing services. Line-of-business units experiment with AI assistants hooked into core data without a broader architectural or financial strategy. Over a few release cycles, the outlook moves from just experimenting to “We can’t move off this platform without a multi-year, multi-million-dollar transformation.”
What ‘AI-ready’ really means
Providers market their platforms as “AI-ready,” implying flexibility and modernization. In practice, “AI-ready” often means “AI–deeply embedded” into your data, tools, and runtime environment. Your logs are now processed through their AI analytics. Your application telemetry routes through their AI-based observability. Your customer data is indexed for their vector search.
This is convenient in the short term. In the long term, it shifts power. The more AI-native services you consume from a single hyperscaler, the more they shape your architecture and your economics. You become less likely to adopt open source models, alternative GPU clouds, or sovereign and private clouds that might be a better fit for specific workloads. You are more likely to accept rate changes, technical limits, and road maps that may not align with your interests, simply because unwinding that dependency is too painful.
The rise of alt clouds is a signal
While hyperscalers race to become vertically integrated AI platforms, we are also seeing the emergence of alternative clouds. These include GPU-first providers, specialized AI infrastructure platforms, sovereign and industry-specific clouds, and environments run by managed service providers. These alt clouds are not always trying to be “AI everything.” In many cases, they prioritize providing raw GPU capacity, clearer economics, or environments where compliance, data residency, or control are the main value propositions.
For companies not prepared to fully commit to AI-native services from a single hyperscaler or in search of a backup option, these alternatives matter. They can host models under your control, support open ecosystems, or serve as a landing zone for workloads you might eventually relocate from a hyperscaler. However, maintaining this flexibility requires avoiding the strong influence of deeply integrated, proprietary AI stacks from the start.
Three moves to stay in control
First, be deliberate about where and how you adopt AI-native services. Don’t let free trials or default settings define your architectural strategy. For each major AI-integrated service a provider pushes—a vector database, agent framework, copilot, or AI search—ask explicitly: What will it cost us to switch later? What data formats, APIs, and operational dependencies does this introduce, and how difficult will it be to replicate them with another provider, an alt cloud, or a self-managed stack?
Second, design your AI and data strategy from the start with portability in mind, even if you don’t plan to move soon. Use open formats for embeddings whenever possible, store raw data in portable structures, and separate application logic from proprietary AI orchestration. When evaluating AI services, consider alternatives such as open source models, GPU-first alt clouds, or private and sovereign clouds that avoid a single provider’s AI ecosystem. It’s entirely reasonable to move some workloads away from providers that are heavily focused on AI if their AI services do not align with your current or upcoming needs.
Third, prioritize AI costs and dependency as key governance issues alongside security and compliance. Incorporate observability into AI deployment to track which teams enable AI-native features, understand how these affect costs, and identify long-term platform risks. Before choosing a cloud provider that is rapidly shifting toward AI-native solutions, step back and ask if their AI services truly match the problems you need to solve over the next three to five years. If AI is on your radar but not yet essential, consider a more neutral infrastructure approach and selective AI implementation rather than adopting every new AI-native feature your provider offers.
The bottom line is simple: AI-native cloud is coming, and in many ways, it’s already here. The question is not whether you will use AI in the cloud, but how much control you will retain over its cost, architecture, and strategic direction. Enterprises that pose tough questions now, focus on portability, and maintain real options across hyperscalers, alt clouds, and private environments will turn AI into a strategic advantage instead of a costly pitfall.
Agents, protocols, and vibes: The best AI stories of 2025 19 Dec 2025, 9:00 am
From autonomous agents to vibe coding, 2025 was the year generative AI stopped being theoretical and started doing real work—with a little fun along the way. Our readers gravitated toward features and tutorials that explored how to move AI into production software and reshape developer workflows, and to columnists who forced uncomfortable (and sometimes amusing) questions about the role of humans in the AI-driven workplace. Here’s a look back at some of InfoWorld’s most popular AI coverage this year.
The year agents took off
2025 may be remembered, among other things, as the year AI agents moved beyond research concepts and toy demos to drive real-world applications and platforms. Agents can now handle everyday software tasks, integrate into developer workflows, and are embedded into large-scale enterprise infrastructure. Some of the year’s most popular articles looked at how AI agents were being used in production:
- Agentic coding with Google Jules
Software developers are among AI’s most enthusiastic fans, and Google Jules is an agentic coding assistant with real heft. It fixes bugs, adds documentation, and integrates with your GitHub repos. - How LinkedIn built an agentic AI platform
The careers behemoth built an enterprise-scale agent AI deployment, using an agentic platform that leverages distributed application techniques. Here’s a candid look at the real architectural decisions and practical engineering patterns used for agentic systems at scale. - Multi-agent AI workflows: The next evolution of AI coding
Now multi-agent systems are emerging, with coordinated workflows capable of completing complex coding tasks. Agents are starting to interoperate in real development contexts by sharing state, governance, and human-in-the-loop control mechanisms. - How AI agents will transform the future of work
AI agents are already reengineering software development, business processes, and customer experiences. What’s next?
Multi-agent systems? New protocols make it possible
As autonomous agents are embedded in real workflows, the next challenge is getting them to talk to each other and the tools they depend on. This year, open standards like the Model Context Protocol moved from experimental specs to practical infrastructure, enabling agents to share context, invoke external services, and participate in coordinated multi-agent workflows across environments:
- A developer’s guide to AI protocols: MCP, A2A, and ACP
The big three emerging agent communication standards—MCP for tool and data access, Agent-to-Agent (A2A) for peer collaboration, and ACP for messaging—help agents interoperate and operate in real systems. Here’s a guide to all three. - AWS’ Serverless MCP Server to aid agentic development of managed applications
AWS released a serverless implementation of the Model Context Protocol, giving AI agents real, context-aware access to cloud tools and data to help design, deploy, and troubleshoot applications, bringing MCP into the world of practical engineering workflows. - 10 MCP servers for devops
MCP is also being integrated into devops tooling, including server offerings from GitHub, AWS, Grafana, and Akuity.
Why code when you can vibe?
If AI agents are increasingly doing the heavy lifting of writing and coordinating code, it’s fair to ask what’s left for the rest of us to do. Enter vibe coding—a playful, almost rebellious approach to coding with AI. Some of the year’s most popular reads captured the excitement, the absurdity, and the potential dangers of working with AI-generated code:
- Vibe code or retire
Like it or not, vibe coding is here, and developers need to take it seriously. This article takes the stance that embracing AI-driven code generation isn’t optional; it’s a survival skill for modern developers. - Writing code is so over
Nick Hodges takes it a step further, declaring that traditional coding is becoming obsolete, just as hand-coded assembly vanished after reliable compilers became available. Will developers soon command systems in spoken English rather than hand-typed code? - Is vibe coding the new gateway to technical debt?
This article strikes a nerve, addressing vibe coding’s potential to amass more bad code, particularly when used as a substitute for real learning and experience.
Cloud native explained: How to build scalable, resilient applications 19 Dec 2025, 9:00 am
What is cloud native? Cloud native defined
The term “cloud-native computing” encompasses the modern approach to building and running software applications that exploit the flexibility, scalability, and resilience of cloud computing. The phrase is a catch-all that encompasses not just the specific architecture choices and environments used to build applications for the public cloud, but also the software engineering techniques and philosophies used by cloud developers.
The Cloud Native Computing Foundation (CNCF) is an open source organization that hosts many important cloud-related projects and helps set the tone for the world of cloud development. The CNCF offers its own definition of cloud native:
Cloud native practices empower organizations to develop, build, and deploy workloads in computing environments (public, private, hybrid cloud) to meet their organizational needs at scale in a programmatic and repeatable manner. It is characterized by loosely coupled systems that interoperate in a manner that is secure, resilient, manageable, sustainable, and observable.
Cloud native technologies and architectures typically consist of some combination of containers, service meshes, multi-tenancy, microservices, immutable infrastructure, serverless, and declarative APIs — this list is not exhaustive.
This definition is a good start, but as cloud infrastructure becomes ubiquitous, the cloud native world is beginning to spread behind the core of this definition. We’ll explore that evolution as well, and look into the near future of cloud-native computing.
Cloud native architectural principles
Let’s start by exploring the pillars of cloud-native architecture. Many of these technologies and techniques were considered innovative and even revolutionary when they hit the market over the past few decades, but now have become widely accepted across the software development landscape.
Microservices. One of the huge cultural shifts that made cloud-native computing possible was the move from huge, monolithic applications to microservices: small, loosely coupled, and independently deployable components that work together to form a cloud-native application. These microservices can be scaled across cloud environments, though (as we’ll see in a moment) this makes systems more complex.
Containers and orchestration. In could-native architectures, individual microservices are executed inside containers — lightweight, portable virtual execution environments that can run on a variety of servers and cloud platforms. Containers insulate the developers from having to worry about the underlying machines on which their code will execute. That is, all they have to do is write to the container environment.
Getting the containers to run properly and communicate with one another is where the complexity of cloud native computing starts to emerge. Initially, containers were created and managed by relatively simple platforms, the most common of which was Docker. But as cloud-native applications got more complex, container orchestration platforms that augmented Docker’s functionality emerged, such as Kubernetes, which allows you to deploy and manage multi-container applications at scale. Kubernetes is critical to cloud native computing as we know it — it’s worth noting that the CNCF was set up as a spinoff of the Linux Foundation on the same day that Kubernetes 1.0 was announced — and adhering to Kubernetes best practices is an important key to cloud native success.
Open standards and APIs. The fact that containers and cloud platforms are largely defined by open standards and open source technologies is the secret sauce that makes all this modularity and orchestration possible, and standardized and documented APIs offer the means of communication between distributed components of a larger application. In theory, anyway, this standardization means that every component should be able to communicate with other components of an application without knowing about their inner workings, or about the inner workings of the various platform layers on which everything operates.
DevOps, agile methodologies, and infrastructure as code. Because cloud-native applications exist as a series of small, discrete units of functionality, cloud-native teams can build and update them using agile philosophies like DevOps, which promotes rapid, iterative CI/CD development. This enables teams to deliver business value more quickly and more reliably.
The virtualized nature of cloud environments also make them great candidates for infrastructure as code (IaC), a practice in which teams use tools like Terraform, Pulumi, and AWS CloudFormation, to manage infrastructure declaratively and version those declarations just like application code. IaC boosts automation, repeatability, and resilience across environments—all big advantages in the cloud world. IaC also goes hand-in-hand with the concept of immutable infrastructure—the idea that, once deployed, infastructure-level entities like virtual machines, containers, or network appliances don’t change, which makes them easier to manage and secure. IaC stores declarative configuration code in version control, which creates an audit log of any changes.

There’s a lot to love about cloud-native architectures, but there are also several things to be wary of when considering it.
Foundry
How the cloud-native stack is expanding
As cloud-native development becomes the norm, the cloud-native ecosystem is expanding; the CNCF maintains a graphical representation of what it calls the cloud native landscape that hammers home to expansive and bewildering variety of products, services, and open source projects that contribute to (and seek to profit from) to cloud-native computing. And there are a number of areas where new and developing tools are complicating the picture sketched out by the pillars we discussed above.
An expanding Kubernetes ecosystem. Kubernetes is complex, and teams now rely on an entire ecosystem of projects to get the most out of it: Helm for packaging, ArgoCD for GitOps-style deployments, and Kustomize for configuration management. And just as Kubernetes augmented Docker for enterprise-scale deployments. Kubernetes itself has been augmented and expanded by service mesh offerings like Istio and Linkerd, which offer fine-grained traffic control and improved security
Observability needs. The complex and distributed world of cloud-native computing requires in-depth observability to ensure that developers and admins have a handle on what’s happening with their applications. Cloud-native observability uses distributed tracing and aggregated logs to provide deep insight into performance and reliability. Tools like Prometheus, Grafana, Jaeger, and OpenTelemetry support comprehensive, real-time observability across the stack.
Serverless computing. Serverless computing, particularly in its function-as-a-service guise, offers to strip needed compute resources down to their bare minimum, with functions running on service provider clouds using exactly as much as they need and no more. Because these services can be exposed as endpoints via APIs, they are increasingly integrated into distributed applications, operating side-by-side with functionality provided by containerized microservices. Watch out, though: the big FaaS providers (Amazon, Microsoft, and Google) would love to lock you in to their ecosystems.
FinOps. Cloud computing was initially billed as a way to cut costs — no need to pay for an in-house data center that you barely use — but in practice it replaces capex with opex, and sometimes you can run up truly shocking cloud service bills if you aren’t careful. Serverless computing is one way to cut down on those costs, but financial operations, or FinOps, is a more systematic discipline that aims to aligns engineering, finance, and product to optimize cloud spending. FinOps best practices make use of those observability tools to best determine what departments and applications are eating up resources.
How cloud-native architecture is adapting to AI workloads
Enterprises deploy larger AI models and make use of more and more real-time inference services. That’s putting demands on cloud-native systems and forcing them to adapt to remain scalable and reliable.
For instance, organizations are re-engineering cloud environments around GPU-accelerated clusters, low-latency networking, and predictable orchestration. These needs align with established cloud-native patterns: containers package AI services consistently, while Kubernetes provides resilient scheduling and horizontal scale for inference workloads that can spike without warning.
Kubernetes itself is changing to better support AI inference, adding hardware-aware scheduling for GPUs, model-specific autoscaling behavior, and deeper observability into inference pipelines. These enhancements make Kubernetes a more natural platform for serving generative AI workloads.
AI’s resource demands are amplifying traditional cloud-native challenges. Observability becomes more complex as inference paths span GPUs, CPUs, vector databases, and distributed storage. FinOps teams contend with cost volatility from training and inference bursts. And security teams must track new risks around model provenance, data access, and supply-chain integrity.
Application frameworks for building distributed cloud-native apps
Microsoft’s Aspire is one of the most visible examples of a shift towards application frameworks to simplify how teams build distributed systems. Opinionated frameworks like Aspire provide structure, observability, and integration out of the box so developer don’t need to stitch together containers, microservices, and orchestration tooling by hand.
Aspire in particular is a prescriptive framework for cloud-native applications, bundling containerized services, environment configuration, health checks, and observability into a unified development model. Aspire provides defaults for service-to-service communication, configuration, and deployment, along with a built-in dashboard for visibility across distributed components.
While Aspire was originally aligned with Microsoft’s .NET platform,Redmond now sees it as having a polyglot future. This positions Aspire as part of a broader trend: frameworks that help teams build cloud-native, service-oriented systems without being locked into a single language ecosystem. Several other frameworks are gaining traction: Dapr provides a portable runtime that abstracts many of the plumbing tasks in cloud-native distributed applications, and Orleans offers an actor-model-based framework for large-scale systems in the .NET world, and Akka gives JVM teams a mature, reactive toolkit for elastic, resilient services.
Frameworks and tools in the expanding cloud-native ecosystem
While frameworks like Aspire simplify how developers compose and structure distributed applications, most cloud-native systems still depend on a broader ecosystem of platforms and operational tooling. This deeper layer is where much of the complexity—and innovation—of cloud-native computing lives, particularly as Kubernetes continues to serve as the industry’s control plane for modern infrastructure.
Kubernetes provides the core abstractions for deploying and orchestrating containerized workloads at scale. Managed distributions such as Google Kubernetes Engine (GKE), Amazon EKS, Azure AKS, and Red Hat OpenShift build on these primitives with security, lifecycle automation, and enterprise support. Platform vendors are increasingly automating cluster operations—upgrades, scaling, remediation—to reduce the operational burden on engineering teams.
Surrounding Kubernetes is a rapidly expanding ecosystem of complementary frameworks and tools. Service meshes like Istio and Linkerd provide fine-grained traffic management, policy enforcement, and mTLS-based security across microservices. GitOps platforms such as Argo CD and Flux bring declarative, version-controlled deployments to cloud-native environments. Meanwhile, projects like Crossplane turn Kubernetes into a universal control plane for cloud infrastructure, letting teams provision databases, queues, and storage through familiar Kubernetes APIs. These tools illustrate how cloud-native development now spans multiple layers: developer-focused application frameworks like Aspire at the top, and a powerful, evolving Kubernetes ecosystem underneath that keeps modern distributed applications running.
Advantages and challenges for cloud-native development
Cloud native has become so ubiquitous that its advantages are almost taken for granted at this point, but it’s worth reflecting on the beneficial shift the cloud native paradigm represents. Huge, monolithic codebases that saw updates rolled out once every couple of years have been replaced by microservice-based applications that can be improved continuously. Cloud-based deployments, when managed correctly, make better use of compute resources and allow companies to offer their products as SaaS or PaaS services.
But cloud-native deployments come with a number of challenges, too:
- Complexity and operational overhead: You’ll have noticed by now that many of the cloud-native tools we’ve discussed, like service meshes and observability tools, are needed to deal with the complexity of cloud-native applications and environments. Individual microservices are deceptively simple, but coordinating them all in a distributed environment is a big lift.
- Security: More services executing on more machines, communicating by open APIs, all adds up to a bigger attack surface for hackers. Containers and APIs each have their own special security needs, and a policy engine can be an important tool for imposing a security baseline on a sprawling cloud-native app. DevSecOps, which adds security to DevOps, has become an important cloud-native development practice to try to close these gaps.
- Vendor lock-in: This may come as a surprise, since cloud-native is based on open standards and open source. But there are differences in how the big cloud and serverless providers works, and once you’ve written code with one provider in mind, it can be hard to migrate elsewhere.
- A persistent skills gap: Cloud-native computing and development may have years under its belt at this point, but the number of developers who are truly skilled in this arena is a smaller portion of the workforce than you’d think. Companies face difficult choices in bridging this skills gap, whether that’s bidding up salaries, working to upskill current workers, or allowing remote work so they can cast a wide net.
Cloud native in the real world
Cloud native computing is often associated with giants like Netflix, Spotify, Uber, and AirBNB, where many of its technologies were pioneered in the early ’10s. But the CNCF’s Case Studies page provides an in-depth look at how cloud native technologies are helping companies. Examples include the following:
- A UK-based payment technology company that can switch between data centers and clouds with zero downtime
- A software company whose product collects and analyzes data from IoT devices — and can scale up as the number of gadgets grows
- A Czech web service company that managed to improve performance while reducing costs by migrating to the cloud
Cloud-native infrastructure’s capability to quickly scale up to large workloads also make it an attractive platform for developing AI/ML applications: another one of those CNCF case studies looks at how IBM uses Kubernetes to train its Watsonx assistant. The big three providers are putting a lot of effort into pitching their platforms as the place for you to develop your own generative AI tools, with offerings like Azure AI Foundry,Google Firebase Studio, and Amazon Bedrock. It seems clear that cloud native technology is ready for what comes next.
Learn more about related cloud-native technologies:
- Platform-as-a-service (PaaS) explained
- What is cloud computing
- Multicloud explained
- Agile methodology explained
- Agile development best practices
- Devops explained
- Devops best practices
- Microservices explained
- Microservices tutorial
- Docker and Linux containers explained
- Kubernetes tutorial
- CI/CD (continuous integration and continuous delivery) explained
- CI/CD best practices
React2Shell is the Log4j moment for front end development 19 Dec 2025, 2:37 am
Attackers have upped the ante in their exploits of a recently-disclosed maximum severity vulnerability in React Server Components (RSC), Next.js, and related frameworks.
Financially-motivated attackers have found a way to use the flaw, dubbed React2Shell (CVE-2025-55182), to execute arbitrary code on vulnerable servers through a single malicious HTTP request. This allows them to quickly and easily gain access to a corporate network and deploy ransomware, according to researchers at cybersecurity company S-RM and the Microsoft Defender Security Research Team.
Attackers initially exploited the vulnerability to introduce backdoor malware and crypto miners; this new method represents an escalation, and experts say it reveals a fundamental security flaw in front end development.
“For too long, we’ve treated front end development as low end, low risk work,” said David Shipley of Beauceron Security. “This is to front end of applications what Log4j was to the back end, a massive opportunity for attackers.”
How attackers easily get ‘highly privileged’ access
React is widely used in enterprise environments, with Microsoft researchers identifying “tens of thousands of distinct devices across several thousand organizations” running React or React-based applications.
React2Shell is a pre-authentication remote code execution (RCE) vulnerability affecting React Server Components (RSC), the open-source framework Next.js, and other related frameworks. It has been rated a 10 on the Common Vulnerability Scoring System (CVSS) because it is easy to exploit, puts numerous exposed systems at risk, and is highly susceptible to automated attacks since it doesn’t require authentication to execute.
The vulnerability specifically impacts the Flight protocol, a core feature in the React development library and Next.js. RSC contains packages, frameworks, and bundlers that allow React apps to run parts of their logic on the server rather than in the browser.
Flight allows server and client to communicate; when the client requests data, the server receives and parses a payload, executes server-side logic, and returns a human-readable software package.
With the React2Shell vulnerability, impacted RSCs fail to validate incoming payloads, allowing threat actors to inject malicious components that React identifies as legitimate. Attackers can send HTTP requests to trick the server into running compromised code, potentially giving them “highly privileged” access to unpatched systems, according to the S-RM researchers.
According to initial reporting on React2Shell, nation-state actors began exploiting the vulnerability within hours of public disclosure. While early impact was limited to the installation of persistent backdoors into networks and crypto currency mining, React2Shell is now being used as the initial access vector in a ransomware attack.
S-RM notes that it is likely being used by “less sophisticated actors” targeting public-facing web servers.
The Microsoft researchers warn of the dangers of this vulnerability: It can be exploited with just one HTTP request; default configurations are vulnerable, meaning there’s no special setup and attackers don’t have to wait for user mistakes; exploitation doesn’t require authentication because it occurs pre-authentication; and proof-of-concept exploits show near-100% reliability.
“For all the over-talk on zero trust, here’s a great example of where it would’ve been useful,” said Beauceron’s Shipley. “Way too much trust and access was built into the React model. And attackers figured out how to exploit it.”
What to look for
In an attack tracked by S-RM, immediately after the threat actor gained access to a targeted company’s network, they ran a hidden PowerShell command, establishing command and control (C2) by downloading a Cobalt Strike PowerShell stager, a tactic regularly used by red teamers, and installing a beacon to allow them to communicate with their external servers. They then disabled real-time protection in Windows Defender Antivirus.
The ransomware binary was dropped and executed “within less than one minute of initial access,” the S-RM researchers report. The attackers modified encrypted files, left recovery notes, created text files that included the target’s public IP address, and cleared event logs and backup snapshots.
The researchers noted that they did not observe lateral movement to other systems or attempts to steal data. The compromised server was taken down the day after it was discovered.
S-RM advises enterprises using RSC to verify that it is a fully-patched version; however, React has warned that even initially released patches (versions 19.0.2, 19.1.3, and 19.2.2) are vulnerable.
Beyond patching, organizations should perform forensic reviews to check for:
- Unusual outbound connections that could indicate C2 was executed;
- Disabling of antivirus and endpoint protection, or log clearing or tampering;
- Unusual spikes in resource use, which could indicate crypto miners;
- Windows event logs or endpoint detection and response (EDR) telemetry indicating attackers executed files in memory from binaries related to Node or React.
- Indicators of compromise (IOC) detailed in the advisory, both host-based and network-based.
Front end is no longer low-risk
This vulnerability reveals a fundamental gap in the development environment that has largely been overlooked, experts say.
“There is a dangerous comforting lie we tell ourselves in web development: ‘The frontend is safe.’ It isn’t,” notes web engineer Louis Phang. He called this a “logic error in the way modern servers talk to clients,” that turns a standard web request into a weapon. It is the result of developers focusing on reliability, scalability, and maintainability, rather than security.
For years, all that happened when a front end developer made a mistake was that a button that looked wrong, a layout was broken, or, in a worst-case scenario, Cross-Site Scripting (XSS), which allows attackers to inject malicious scripts into web pages, was possible, Phang said. With React rendering on the server, front end code has privileged access, and vulnerabilities serve as a backdoor into databases, keys, and data.
“React2Shell signifies the end of the front end developer as a low-risk role,” Phang contended.
Beauceron’s Shipley agreed, noting that the need for server-side heavy lifting changed the risk, but the tech stack didn’t respond accordingly.
“First, we had confusion about whether it was severe or not, then some were downplaying how much exploitation would happen, and now we’re in a feeding frenzy,” he said.
It’s concerning how long it’s taking to rouse the technology environment to deal with this threat; it could ultimately be a side effect of cuts to security teams and budgets and developer burnout, he noted.
“This is a concerning trend heading into 2026, which will be even more intense for zero days thanks to AI,” Shipley predicted.
Python type checker ty now in beta 18 Dec 2025, 8:00 pm
Touted as an extremely fast Python type checker and language server, ty has moved to beta.
Developers can install ty with uv tool install ty@latest, or via a Visual Studio Code extension. A stable release is eyed for 2026, according to ty steward Astral.
Written in Rust, ty is positioned as an alternative to tools such as Mypy, Pyright, and Pylance. In a December 16 blog post, Astral founder Charlie Marsh said ty’s architecture is built around “incrementality,” enabling necessary computations to be selectively re-run when a user edits a file or modifies an individual function. “This makes live updates extremely fast in the context of any editor or long-lived process,” Marsh said.
In developing ty, Astral focused on performance; being correct, pragmatic, and ergonomic; and being built in the open, by the Astral core team alongside active contributors under the MIT license, said Marsh. The type checker also features a diagnostic system inspired by the Rust compiler’s own error messages. A single ty diagnostic can pull in context from multiple files simultaneously to explain not only what is wrong but why, and, often, how to fix it, said Marsh. Even compared to Rust-based language servers like Pyrefly, ty can run orders of magnitude faster when performing incremental updates on large projects, Marsh stressed. Following the beta release, the company will prioritize supporting early adopters, he said.
What’s next for Azure infrastructure 18 Dec 2025, 9:00 am
As 2025 comes to an end, it seems fitting to look at how Microsoft’s Azure hyperscale cloud is planning to address the second half of the decade. As has become traditional, Azure CTO Mark Russinovich gave his usual look at that future in his presentations at Ignite, this time split into two separate talks on infrastructure and software.
The first presentation looked at how the underlying infrastructure of Azure is developing and how the software you use is adapting to use the new hardware. Understanding what lies underneath the virtual infrastructure we use every day is fascinating, as it’s always changing where we can’t see. We don’t worry about the hardware under our software, as all we have access to are APIs and virtual machines.
That abstraction is both a strength and weakness of the hyperscale cloud. Microsoft continually upgrades all aspects of its hardware without affecting our code but we are forced to either wait for the cloud platform to make those innovations visible to everyone, or to move code to any of a handful of regions that have new hardware first, increasing the risks that come from reduced redundancy options.
Still, it’s worth understanding what Microsoft is doing, as the technologies it’s implementing will affect you and your virtual infrastructure.
Cooling CPUs with microfluidics
Russinovich’s first presentation took a layered approach to Azure, starting with how its data centers are evolving. Certainly, the scale of the platform is impressive: It now has more than 70 regions and over 400 data centers. They’re linked by more than 600,000 kilometers of fiber, including links across the oceans and around the continents, with major population centers all part of the same network.
As workloads evolve, so do data centers, requiring rethinking how Azure cools its hardware. Power and cooling demands, especially with AI workloads, are forcing redesigns of servers, bringing cooling right onto the chip using microfluidics. This is the next step in liquid cooling, where current designs put cold plates on top of a chip. Microfluidics goes several steps further, requiring a redesign of the chip packaging to bring cooling directly to the silicon die. By putting cooling right where the processing happens, it’s possible to increase the density of the hardware, stacking cooling layers between memory, processing, and accelerators, all in the same packaging.
The channels are designed using machine learning and are optimized for the hotspots generated by common workloads. Microsoft is doing the first generation of microfluidics etchings itself but plans to work with silicon vendors like Intel and AMD to pre-etch chips before they’re delivered. Microfluidic-based cooling isn’t only for CPUs; it can even be used on GPUs.
Boosting Azure Boost
Beyond silicon, Microsoft is enhancing Azure’s Open Hardware-based servers with a new iteration of its Azure Boost accelerators. Now fitted to more than 25% of Microsoft’s server estate and standard with all new hardware, Azure Boost is designed to offload Azure’s own workloads onto dedicated hardware so that user tenants and platform applications get access to as much server performance as possible. Code-named Overlake, the latest batch of Azure Boost accelerators adds 400Gbps of networking, giving 20Gbps of remote storage and 36Gbps of direct-attached NVMe storage at 6.6 million IOPS.
Under the hood is a custom system on a chip (SoC) that mixes Arm cores and a field-programmable gate array (FPGA) running the same Azure Linux as your Kubernetes containers. There’s added hardware encryption in Azure Boost to ensure compatibility with Azure’s confidential computing capabilities, keeping data encrypted across the boundary between servers and the Azure Boost boards.
Azure goes bare metal
One advantage of moving much of the server management to physical hardware is that Microsoft can now offer bare-metal hosts to its customers. This approach was originally used for OpenAI’s training servers, giving direct access to networking hardware and remote direct memory access to virtual machines. This last feature not only speeds up inter-VM communications, it also improves access to GPUs, allowing large amounts of data to move more efficiently. Azure’s RDMA service doesn’t just support in-cabinet or even in-data-center operations; it now offers low-latency connectivity within Azure regions.
Bare-metal servers give applications a significant performance boost but really only matter for big customers who are using them with regional RDMA to build their own supercomputers. Even so, the rest of us get better performance for our virtual infrastructures. That requires removing the overhead associated with both virtual machines and containers. As Russinovich has noted in earlier sessions, the future of Azure is serverless: hosting and running containers in platform-as-a-service environments.
That serverless future needs a new form of virtualization, which goes beyond Azure’s secure container model of nested virtual machines, giving access to hardware while keeping the same level of security and isolation. Until now that’s been impossible, as nested virtualization required running hypervisors inside hypervisors to enforce necessary security boundaries and preventing malicious code from attacking other containers on the same hardware.
A new direct virtualization technique removes that extra layer, running user and container VMs on the server hypervisor, still managed by the same Azure Host OS. This approach gets rid of the performance overheads that come from nested hypervisors and gives the virtualized clients access to server hardware like GPUs and AI inference accelerators. This update gives you the added benefit of faster migration between servers in case of hardware issues.
This approach is key to many of Microsoft’s serverless initiatives, like Azure Container Instances (ACI), to give managed containers access to faster networking, GPUs, and the like. This should improve performance. Russinovich demonstrated a 50% improvement for PostgreSQL along with a significant reduction in latency. By giving containers access to GPUs, ACI gains the ability to host AI inferencing workloads so you can bring your open source models to containers. This should allow you to target ACI containers from AI Foundry more effectively.
Custom hardware for virtual networks
AI has had a considerable influence on the design of Azure data centers, especially with big customers needing access to key infrastructure features and, where possible, the best possible performance. This extends to networking, which has been managed by specialized virtual machines to handle services like routing, security, and load balancing.
Microsoft is now rolling out new offload hardware to host those virtual network appliances, in conjunction with top-of-the-rack smart switches. This new hardware runs your software-defined network policies, managing your virtual networks for both standard Azure workloads and for your own specific connectivity, linking cloud to on-premises networks. The same hardware can transparently mirror traffic to security hardware without affecting operations, allowing you to watch traffic between specific VMs and look for network intrusions and other possible security breaches without adding latency that might warn attackers.
Speeding and scaling its storage
The enormous volume of training data used by AI workloads has made Microsoft rethink how it provisions storage for Azure. Video models require hundreds of petabytes of image data, at terabytes of bandwidth and many thousands of IOPS. That’s a significant demand for already busy storage hardware. This has led to Microsoft developing a new scaled storage account, which is best thought of as a virtual account on top of the number of standard storage accounts needed to deliver the required amount of storage.
There’s no need to change the hardware, and the new virtual storage can encompass as many storage accounts as you need to scale as large as possible. As the storage is shared, you can get very good performance as data is retrieved from each storage account in parallel. Russinovich’s Ignite demo showed it working with 1.5 petabytes of data in 480 nodes, with writes running at 22 terabits per second and reads from 695 nodes at 50 terabits per second.
While a lot of these advances are specialized and focused on the needs of AI training, it’s perhaps best to think of those huge projects as the F1 teams of the IT world, driving innovations that will impact the rest of us, maybe not tomorrow, but certainly in the next five years. Microsoft’s big bet on a serverless Azure needs a lot of these technologies to give its managed containers the performance they need by refactoring the way we deliver virtual infrastructures and build the next generation of data centers. Those big AI-forward investments need to support all kinds of applications as well, from event-driven Internet of Things to distributed, scalable Kubernetes, as well as being ready for platforms and services we haven’t yet begun to design.
Features like direct virtualization and networking offload look like they’re going to be the quickest wins for the widest pool of Azure customers. Faster, more portable VMs and containers will help make applications more scalable and more resilient. Offloading software-defined networking to dedicated servers can offer new ways to secure our virtual infrastructures and protect our valuable data.
What’s perhaps most interesting about Russinovich’s infrastructure presentation is that these aren’t technologies that are still in research labs. They’re being installed in new data centers today and are part of planned upgrades to the existing Azure platform. With that in mind, it’ll be interesting to see what new developments Microsoft will unveil next year.
High-performance programming with Java streams 18 Dec 2025, 9:00 am
My recent Java Stream API tutorial introduced Java streams, including how to create your first Java stream and how to build declarative stream pipelines with filtering, mapping, and sorting. I also demonstrated how to combine streams, collectors, and optionals, and I provided examples of functional programming with Java streams. If you are just getting started with Java streams, I recommend starting with the introductory tutorial.
In this tutorial, we go beyond the basics to explore advanced techniques with Java streams. You’ll learn about short-circuiting, parallel execution, virtual threads, and stream gatherers in the Java Stream API. You will also learn how to combine and zip Java streams, and we’ll conclude with a list of best practices for writing efficient, scalable stream code.
Short-circuiting with Java streams
A stream pipeline doesn’t always need to process every element. In some cases, we can use short-circuiting. These are operations that stop the stream processing as soon as a result is determined, saving time and memory.
Here’s a list of common short-circuiting operations:
findFirst()returns the first match and stops.findAny()returns any match (more efficient in parallel).anyMatch()/allMatch()/noneMatch()stops the stream once the outcome is known.limit(n)defines an intermediate operation that processes only the first n elements.
Here’s an example of short-circuiting operations in a Java stream pipeline:
import java.util.List;
public class ShortCircuitDemo {
public static void main(String[] args) {
List names = List.of("Duke", "Tux", "Juggy", "Moby", "Gordon");
boolean hasLongNames = names.stream()
.peek(System.out::println)
.anyMatch(n -> n.length() > 4);
}
}
The output for this pipeline will be:
Duke
Tux
Juggy
After "Juggy", the pipeline stops. That’s because it has served its purpose, so there is no need to evaluate Moby or Gordon. Short-circuiting takes advantage of the laziness of streams to complete work as soon as possible.
Parallel streams: Leveraging multiple cores
By default, streams run sequentially. When every element can be processed independently and the workload is CPU-intensive, switching to a parallel stream can significantly reduce processing time.
Behind the scenes, Java uses the ForkJoinPool to split work across CPU cores and merge the partial results when it’s done:
import java.util.List;
public class ParallelDemo {
public static void main(String[] args) {
List names = List.of("Duke", "Juggy", "Moby", "Tux", "Dash");
System.out.println("=== Sequential Stream ===");
names.stream()
.peek(n -> System.out.println(Thread.currentThread().getName() + " -> " + n))
.filter(n -> n.length() > 4)
.count();
System.out.println("\n=== Parallel Stream ===");
names.parallelStream()
.peek(n -> System.out.println(Thread.currentThread().getName() + " -> " + n))
.filter(n -> n.length() > 4)
.count();
}
}
Here, we compare output from sequential and parallel processing in a typical multi-core run:
=== Sequential Stream ===
main -> Duke
main -> Juggy
main -> Moby
main -> Tux
main -> Dash
=== Parallel Stream ===
ForkJoinPool.commonPool-worker-3 -> Moby
ForkJoinPool.commonPool-worker-1 -> Juggy
main -> Duke
ForkJoinPool.commonPool-worker-5 -> Dash
ForkJoinPool.commonPool-worker-7 -> Tux
Sequential streams run on a single thread (usually main), while parallel streams distribute work across multiple ForkJoinPool worker threads, typically one per CPU core.
Use the following to check the number of available cores:
System.out.println(Runtime.getRuntime().availableProcessors());
Parallelism produces real performance gains only for CPU-bound, stateless computations on large datasets. For lightweight or I/O-bound operations, the overhead of thread management often outweighs any benefits.
Sequential versus parallel stream processing
The program below simulates CPU-intensive work for each element and measures execution time with both sequential and parallel streams:
import java.util.*;
import java.util.stream.*;
import java.time.*;
public class ParallelThresholdDemo {
public static void main(String[] args) {
List sizes = List.of(10_000, 100_000, 1_000_000, 10_000_000);
for (int size : sizes) {
List data = IntStream.range(0, size).boxed().toList();
System.out.printf("%nData size: %,d%n", size);
System.out.printf("Sequential: %d ms%n",
time(() -> data.stream()
.mapToLong(ParallelThresholdDemo::cpuWork)
.sum()));
System.out.printf("Parallel: %d ms%n",
time(() -> data.parallelStream()
.mapToLong(ParallelThresholdDemo::cpuWork)
.sum()));
}
}
static long cpuWork(long n) {
long r = 0;
for (int i = 0; i
Now let’s look at some results. Here’s a snapshot after running both sequential and parallel streams on an Intel Core i9 (13th Gen) processor with Java 25:
| Data size | Sequential streams | Parallel streams |
| 10,000 | 8 ms | 11 ms |
| 100,000 | 78 ms | 41 ms |
| 1,000,000 | 770 ms | 140 ms |
| 10,000,000 | 7,950 ms | 910 ms |
At small scales (10,000 elements), the parallel version is slightly slower. This is because splitting, scheduling, and merging threads carries a fixed overhead. However, as the per-element workload grows, that overhead becomes negligible, and parallel processing begins to dominate.
Performance thresholds also differ across processors and architectures:
- Intel Core i7/i9 or AMD Ryzen 7/9: Parallelism pays off once you process hundreds of thousands of elements or heavier computations. Coordination costs are higher, so smaller datasets run faster with sequential processing.
- Apple Silicon (M1/M2/M3): Thanks to unified memory and highly efficient thread scheduling, parallel streams often become faster even for mid-size datasets, typically after a few hundred to a few thousand elements, depending on the work per element.
The number of elements isn’t the key variable; what you want to watch is the amount of CPU work per element. If computation is trivial, sequential execution remains faster.
Guidelines for using parallel streams
If each element involves significant math, parsing, or compression, parallel streams can easily deliver five to nine times the processing speed of sequential streams. Keep these guidelines in mind when deciding whether to use parallel streams or stick with sequential processing:
- Cheap, per-element work requires tens of thousands of elements before parallelism pays off.
- Benefits appear much sooner for expensive, per-element work.
- Use sequential processing for I/O or order-sensitive tasks.
- Pay attention to hardware and workload specs—these will define where parallelism begins to make a difference.
Parallel streams shine when each element is independent, computation is heavy, and there’s enough data to keep all the CPU cores busy. Used deliberately, parallel streams can unlock large performance gains with minimal code changes.
Performance-tuning parallel streams
Parallel streams use the common ForkJoinPool, which, by default, creates enough threads to fully utilize every available CPU core. In most situations, this default configuration performs well and requires no adjustment. However, for benchmarking or fine-grained performance testing, you can run a parallel stream inside a custom ForkJoinPool:
import java.util.concurrent.*;
import java.util.stream.IntStream;
public class ParallelTuningExample {
public static void main(String[] args) {
ForkJoinPool pool = new ForkJoinPool(8);
long result = pool.submit(() ->
IntStream.range(0, 1_000_000)
.parallel()
.sum()
).join();
}
}
Using a dedicated ForkJoinPool lets you experiment with different levels of parallelism to measure their impact on performance, without affecting other parts of the application.
Remember: Parallel streams deliver benefits only for CPU-bound, stateless operations where each element can run independently in parallel. For small datasets or I/O-bound work, the overhead of parallelism usually outweighs its benefits. In these cases, sequential streams are faster and simpler.
Streams and virtual threads (Java 21+)
Virtual threads, introduced in Java 21 via Project Loom, have redefined Java concurrency. While parallel streams focus on CPU-bound parallelism, virtual threads are designed for massive I/O concurrency.
A virtual thread is a lightweight, user-mode thread that does not block an underlying operating-system thread while waiting. This means you can run thousands—or even millions—of blocking tasks efficiently. Here’s an example:
import java.util.concurrent.*;
import java.util.stream.IntStream;
public class ThreadPerformanceComparison {
public static void main(String[] args) throws Exception {
int tasks = 1000;
run("Platform Threads (FixedPool)",
Executors.newFixedThreadPool(100), tasks);
run("Virtual Threads (Per Task)",
Executors.newVirtualThreadPerTaskExecutor(), tasks);
}
static void run(String label, ExecutorService executor, int tasks) throws Exception {
long start = System.nanoTime();
var futures = IntStream.range(0, tasks)
.mapToObj(i -> executor.submit(() -> sleep(500)))
.toList();
// Wait for all to complete
for (var future : futures) {
future.get();
}
System.out.printf("%s finished in %.3f s%n",
label, (System.nanoTime() - start) / 1_000_000_000.0);
executor.shutdown();
}
static void sleep(long millis) {
try {
Thread.sleep(millis);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
Output example: Platform Threads ≈ 5 s, Virtual Threads ≈ 0.6 s.
You probably noticed there are two executors in this example. Here’s how each one works:
newFixedThreadPool(100): Creates an executor backed by 100 platform threads (real operating system threads). At most, 100 tasks run concurrently, while additional tasks wait in the queue until a thread is available. Each platform thread stays fully blocked during Thread.sleep() or I/O operations, which means those 100 threads can’t do other work until the blocking call completes.
newVirtualThreadPerTaskExecutor(): Creates one virtual thread per task. Virtual threads are cheap, user-mode threads that don’t tie up an operating system thread when blocked. An analogy would be a few delivery trucks (platform threads) handling millions of packages (virtual threads). Only a handful of trucks drive at once, but millions of deliveries happen efficiently over time.
In the example, each task simulates blocking I/O with Thread.sleep(500).
If we were to run newFixedThreadPool(100):
- Only 100 tasks run concurrently.
- 1000 tasks ÷ 100 threads = 10 batches × 0.5 s ≈ 5 s total.
If we were to run newVirtualThreadPerTaskExecutor():
- All 1,000 tasks run at once.
- Every task sleeps for 500 ms concurrently.
- Total ≈ 0.5–0.6 s—just the simulated delay, no waiting queue.
Virtual threads drastically reduce overhead by releasing their underlying operating-system threads whenever blocking occurs, allowing vast I/O concurrency with minimal resource cost. Both parallel streams and virtual threads offer performance benefits, but you have to know when to use them. As a rule of thumb:
- Use parallel streams for CPU-bound workloads that benefit from data parallelism.
- Use virtual threads for I/O-bound tasks where many concurrent operations block on external resources.
Stream gatherers (Java 22+)
Before Java 22, streams were great for stateless transformations like filtering or mapping. But when you needed logic that depended on earlier elements—things like sliding windows, running totals, and conditional grouping—you had to abandon streams entirely and write imperative loops with mutable state. That changed with the introduction of stream gatherers.
Before stream gatherers, let’s say we wanted to calculate a moving average over a sliding window of three elements:
List data = List.of(1, 2, 3, 4, 5, 6);
List movingAverages = new ArrayList();
Deque window = new ArrayDeque();
for (int value : data) {
window.add(value);
if (window.size() > 3) {
window.removeFirst();
}
if (window.size() == 3) { // Only calculate when window is full
double avg = window.stream()
.mapToInt(Integer::intValue)
.average()
.orElse(0.0);
movingAverages.add(avg);
}
}
System.out.println(movingAverages); // [2.0, 3.0, 4.0, 5.0]
This approach works but it breaks the declarative, lazy nature of streams. In this code, we are manually managing state, mixing imperative and functional styles, and we’ve lost composability.
Now consider the same example using Stream.gather() and built-in gatherers. Using stream gatherers lets us perform stateful operations directly inside the stream pipeline while keeping it lazy and readable:
List movingAverages = Stream.of(1, 2, 3, 4, 5, 6)
.gather(Gatherers.windowSliding(3))
.map(window -> window.stream()
.mapToInt(Integer::intValue)
.average()
.orElse(0.0))
.toList();
System.out.println(movingAverages); // [2.0, 3.0, 4.0, 5.0]
As you can see, windowSliding(3) waits until it has three elements, then emits [1,2,3] and slides forward by one: [2,3,4], [3,4,5], [4,5,6]. The gatherer manages this state automatically, so we can express complex data flows cleanly without manual buffering or loops.
Built-in gatherers
The Stream Gatherers API includes the following built-in gatherers:
windowFixed(n): Used for non-overlapping batches of n elements.windowSliding(n): Used to create overlapping windows for moving averages or trend detection.scan(seed, acc): Used for running totals or cumulative metrics.mapConcurrent(maxConcurrency, mapper): Supports concurrent mapping with controlled parallelism.
Collectors vs. gatherers
In my introduction to Java streams, you learned about collectors, which serve a similar purpose to gatherers but operate differently. Collectors aggregate the entire stream into one result at the end, such as a list or sum, while gatherers operate during stream processing, maintaining context between elements. An easy way to remember the difference between the two features is that collectors finalize data once, whereas gatherers reshape it as it flows.
Example: Running total with streams gatherers
The following example demonstrates the benefits of stream gatherers:
Stream.of(2, 4, 6, 8)
.gather(Gatherers.scan(() -> 0, Integer::sum))
.forEach(System.out::println);
// 2, 6, 12, 20
Each emitted value includes the cumulative sum so far. The stream remains lazy and free of side-effects.
Like any technology, stream gatherers have their place. Use stream gatherers when the following conditions are true:
- The application involves sliding or cumulative analytics.
- The application produces metrics or transformations that depend on previous elements.
- The operation includes sequence analysis or pattern recognition.
- The code requires manual state with clean, declarative logic.
Gatherers restore the full expressive power of Java streams for stateful operations while keeping pipelines readable, efficient, and parallel-friendly.
Combining and zipping streams
Sometimes you need to combine data from multiple streams; an example is merging two sequences element by element. While the Stream API doesn’t yet include a built-in zip() method, you can easily implement one:
import java.util.*;
import java.util.function.BiFunction;
import java.util.stream.*;
public class StreamZipDemo {
public static Stream zip(
Stream a, Stream b, BiFunction combiner) {
Iterator itA = a.iterator();
Iterator itB = b.iterator();
Iterable iterable = () -> new Iterator() {
public boolean hasNext() {
return itA.hasNext() && itB.hasNext();
}
public C next() {
return combiner.apply(itA.next(), itB.next());
}
};
return StreamSupport.stream(iterable.spliterator(), false);
}
// Usage:
public static void main(String[] args) {
zip(Stream.of(1, 2, 3),
Stream.of("Duke", "Juggy", "Moby"),
(n, s) -> n + " → " + s)
.forEach(System.out::println);
}
}
The output will be:
1 → Duke
2 → Juggy
3 → Moby
Zipping pairs elements from two streams until one runs out, which is perfect for combining related data sequences.
Pitfalls and best practices with Java streams
We’ll conclude with an overview of pitfalls to avoid when working with streams, and some best practices to enhance streams performance and efficiency.
Pitfalls to avoid when using Java streams
- Overusing streams: Not every loop should be a stream.
- Side-effects in
map/filter: Retain pure functions. - Forgetting terminal operations: Remember that streams are lazy.
- Parallel misuse: Helps CPU-bound work but hurts I/O-bound work.
- Reusing consumed streams: One traversal only.
- Collector misuse: Avoid shared mutable state.
- Manual state hacks: Use gatherers instead.
Best practices when using Java streams
To maximize the benefits of Java streams, apply the following best practices:
- Keep pipelines small and readable.
- Prefer primitive streams for numbers.
- Use
peek()only for debugging. - Filter early, before expensive ops.
- Favor built-in gatherers for stateful logic.
- Avoid parallel streams for I/O; use virtual threads instead.
- Use the Java Microbenchmark Harness or profilers to measure performance before optimizing your code.
Conclusion
The advanced Java Stream API techniques in this tutorial will help you unlock expressive, high-performance data processing in modern Java. Short-circuiting saves computation; parallel streams use multiple cores; virtual threads handle massive I/O; and gatherers bring stateful transformations without breaking the declarative style in your Java code.
Combine these techniques wisely by testing, measuring, and reasoning about your workload, and your streams will remain concise, scalable, and as smooth as Duke surfing the digital wave!
Now it’s your turn: Take one of the examples in the Java Challengers GitHub repository, tweak it, and run your own benchmarks or experiments. Practice is the real challenge—and that’s how you’ll master modern Java streams.
Designing the agent-ready data stack 18 Dec 2025, 9:00 am
Executives increasingly believe AI will reshape their businesses, but many large organizations are still stuck at proofs of concept. McKinsey’s 2025 State of AI report shows widespread experimentation, but real business value is being seen by only a small set of “high performers.” 23% of respondents report their organizations are scaling an agentic AI system somewhere in their enterprises, but the use of agents is not yet widespread. Boston Consulting Group reports that around 70% of obstacles are people and process (not model) issues, yet poor data plumbing is still a major cause of project drag.
The database bottleneck
Many engineering teams still rely on architecture optimized for transactional apps, not for AI systems that mix structured and unstructured data and live event streams. This legacy architecture has three main characteristics that slow AI adoption: rigid schemas and silos, outdated logic, and AI implemented as a bolt-on.
Rigid schemas and silos
First, current ERP and CRM systems are built around rigid schemas and operate in silos. Data warehouses, search indexes, and vector stores live in different places with different contracts. Because they use different data models and APIs, you can’t ask one system a question that assumes the logic of another without doing translation or synchronization. Because these systems were not built to understand semantics and contextual relevance, the AI layer has to glue the pieces together before it can start making sense of the data itself.
Different data storage systems present different abstractions
| System | Optimized for | Data model | Query language |
| Data warehouse | Analytical queries (e.g. total sales by region) | Structured tables | SQL |
| Search index | Keyword or document retrieval (e.g. find documents about supply chain risk) | Inverted index | Search query syntax (Lucene, BM25, and others) |
| Vector store | Semantic or embedding similarity (e.g. find text similar to this paragraph) | Vectors (dense embeddings) | Similarity search (cosine, dot product, and others) |
Outdated logic
Secondly, if systems are updated only once nightly, this creates time gaps between changes in data and systems updates. Agents may be reasoning with yesterday’s data while source data may look completely different. Index drift has long-term consequences; vector stores and search indexes no longer match the reality in operational systems and on the shop floor.
Security policies can diverge too, since permissions or access controls updated in the source may not propagate immediately to the AI’s cached or copied data. Operating on outdated or inconsistent context reduces accuracy, seriously impacting compliance and trust.
AI as a bolt-on
Thirdly, because systems are not AI-native, AI is often bolted on as a sidecar, with separate security, lineage, and observability. This complicates audit and compliance, and treats AI as a separate feature set, rather than as part of a unified access, usage, and behavior trail. Operational blind spots are the consequence here. Incidents or data leaks may go undetected. Because the AI sidecar operates outside the organization’s governance framework, security teams have zero visibility into policy violations.
The RAND research organisation’s report, The Root Causes of Failure for Artificial Intelligence Projects and How They Can Succeed, bears these experiences out. It highlights that organizations underestimate the data quality, lineage, access control, and deployment scaffolding needed to make AI reliable.
Agentic AI collapses traditional data boundaries
Traditional database stacks assume clear boundaries: OLTP here, OLAP there, search elsewhere. Agentic AI use cases collapse these boundaries. Because agents need durable read-write interactions and real-time triggers, retrieval-augmented generation (RAG) must have low-latency joins across text, vectors, and graph relationships with consistent security. Legacy patterns that see data shipped to separate indexes and stores add latency, duplication, and significant risk.
The trend is convergent: bring semantic retrieval and policy closer to operational data. That’s why cloud platforms now connect vector and hybrid search into operational stores. MongoDB’s Atlas Vector Search, Databricks’s Mosaic AI Vector Search, and OpenSearch’s Neural/Vector Search are good examples. It’s also why Postgres communities extend with pgvector. However, we know that these less native “bolt-on” approaches bring their own problems. In response, a new wave of AI-native databases has the potential to close these gaps.
So, what are the technical steps engineering teams can take to prepare their systems for AI agents? And what are the choices available today?
Preparing the data layer for AI (and agents)
Migrating from one database to another is not an easy exercise. It’s complex, risky, and very expensive. The exception is where there are clear cost efficiencies to be made, but if a large company is all-in with their database stack, they’re unlikely to shift, yet.
For greenfield, AI-native projects, however, it’s important to be intentional, and choose a database model that understands what agentic systems need. Agentic systems plan, call tools, write back state, and coordinate across services, so they require specific conditions:
- Long-lived memory with persistence and retrieval, not just a chat window.
- Durable transactions that enable us to trust the updates agents issue.
- Event-driven reactivity that supports subscriptions and streams to keep UIs and other agents synchronized.
- Strong governance including low-level security, lineage, audit, and PII (personally identifiable information) protection.
Major frameworks emphasize these data requirements. LangGraph and AutoGen add persistent, database-backed memory, while Nvidia and Microsoft reference architectures center on data connectors, observability, and security in agent factories. Meanwhile, Oracle’s latest database release ships agent tooling into the core. This isn’t a model problem—it’s a state, memory, and policy problem.
Here’s what “done right” looks like when you rebuild the data layer for AI:
- First, build for adaptability. This means first-class support for mixed data (relational, document, graph, time series, vector) and flexible schemas, so AI can reason over entities, relationships, and semantics without struggling with brittle ETL.
- Next, commit to openness. Standard interfaces, open formats, and open-source participation allow teams to combine best-of-breed embedding models, re-ranking tools, and governance. The added bonus is avoiding vendor lock-in.
- Finally, embrace composability. Build in real-time subscriptions and streams, functions close to data, and unified security so that retrieval, reasoning, and action run against one trustworthy source of truth.
Which model for my use case?
Every organization works with a mix of (usually open-source) databases, all optimized for different workloads. For instance, MySQL and PostgreSQL handle transactional data, while MongoDB and Couchbase offer document storage suited to dynamic application data. Together, this mix of databases form a polyglot persistence layer where teams select the right tool depending on the scenario.
Companies have invested heavily in creating the right mix of databases for their needs, so how can teams bring AI to existing stacks? Is this the right approach and what are the alternatives? And, are companies facing a complete database refactor to bring AI agents to their companies?
Unified operational stores with vector and hybrid search
MongoDB Atlas, Databricks Mosaic AI, and OpenSearch put approximate-nearest-neighbor and hybrid retrieval next to data, reducing sync drift. Postgres offers pgvector for teams standardizing on SQL. Oracle Database 23ai and 26ai add native vector search and agent builders into the core RDBMS, reflecting the shift toward AI-aware data layers.
This approach is a good fit for simple AI projects that rely only on one data tool (e.g. only MongoDB, or only OpenSearch, or only Postgres). But the reality is, enterprise AI systems and agents typically rely on multiple data sources and tools, and being able to search and retrieve data across a variety of databases is difficult. Being able to store the data natively in one place that can support a mix of data models and search natively across all of those data models allows teams to harness the power within their data for building AI systems.
Purpose-built vector databases
Pinecone, Weaviate, and Milvus focus on vector scale and latency; many enterprises pair them with operational databases when they need specialized retrieval at scale. This is great when embedding and vector search is a key, large‐scale workload, requiring high performance and advanced vector features. The downside is that you need to manage and operate another, separate database system.
Multi-model databases
SurrealDB is one concrete approach to this convergence It’s an open-source, multi-model database that combines relational, document, graph, and vector data with ACID transactions, row-level permissions, and live queries for real-time subscriptions. For AI workloads, it supports vector search and hybrid search in the same engine that enforces company governance policies, and it offers event-driven features (LIVE SELECT, change feeds) to keep agents and UIs in sync without extra brokers.
For many teams, this reduces the number of moving parts between the system of record, the semantic index, and the event stream.
What integrating with AI feels like today (and how it should feel tomorrow)
Trying to use AI in traditional environments is pretty painful. Engineering teams routinely face multiple copies of data, which leads to drift and inconsistent access control lists. The job of embedding refresh cycles and index rebuilds results in latency spikes and degraded quality. And, separate policy engines lead to audit gaps across chat, retrieval, and actions.
If we have an AI-ready data layer, we can store entities, relationships, and embeddings together, and query them with one policy model. We can also use real-time subscriptions to push changes into agent memory, not nightly backfills. And we can enforce row-level security and lineage at the source, so every retrieval is compliant by default.
This isn’t hypothetical. Public case studies show tangible results when teams collapse data sprawl. For example, LiveSponsors rebuilt a loyalty engine and cut query times from 20 seconds to 7 milliseconds while unifying relational and document data. Aspire Comps scaled to 700,000 users in eight hours after consolidating back-end components. Many companies cite major gains in consolidation and AI-readiness.
Principles for AI-ready architecture
AI doesn’t stall because models are “not good enough.” It stalls because data architecture lags ambition. The fastest path from pilot to profit is to modernize the database layer so retrieval, reasoning, and action happen against one governed, real-time source of truth. There are a few essential considerations that should create the right conditions for AI agents to fulfill their potential.
Design for retrieval and relationships
Treat graph, vector, and keyword as a first-class trio. Knowledge graphs paired with RAG are becoming a standard for explainable, lineage-aware answers.
Co-locate state, policy, and compute
Keep embeddings next to the system of record and push policy (role-based access control and row-level security) into the database to minimize data hops.
Make memory durable
Agents need persisted memory and resumable workflows, supported by frameworks (e.g. LangGraph and AutoGen) and enterprise “AI factory” designs from Nvidia and Microsoft.
Prefer open building blocks
Open-source options (pgvector, Weaviate, Milvus, OpenSearch) de-risk lock-in and accelerate learning curves, especially when paired with an open operational database.
Prepare the environment
Then, there are three important practical steps to take to prepare the environment:
- Start with the bottlenecks. Inventory the extra hops between your app, your vectors, and your policies, then remove them.
- Adopt a unified, AI-aware data layer. Whether you evolve your incumbent platforms or adopt a unified engine (e.g. SurrealDB), collapse silos and co-locate semantics, relationships, and state.
- Measure business impact in milliseconds and dollars. Look for latency, accuracy, and productivity movement. In the wild, we’re seeing that sub-10-millisecond retrieval and significant simplification of stacks translate into feature velocity and cost savings. Public case studies from SurrealDB customers such as LiveSponsors, Aspire, and a collaboration between Verizon, Samsung, and Tencent illustrate both the technical and organizational dividends when the data layer is simplified.
Databases have always been at the core of traditional software applications. With the rise of agentic AI, the role of databases must evolve to serve as the “agentic memory” at the core of reliable agentic systems. The question isn’t whether to rethink data for agents, but how quickly to equip agents with the memory they need to drive decision velocity.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
JetBrains releases Kotlin 2.3.0 18 Dec 2025, 5:09 am
Kotlin 2.3.0 is now generally available, with the language update offering capabilities such as an unused return value checker and support for Java 25. The release has a variety of experimental-stage features including the value checker, Swift interoperability, and a new syntax for explicit backing fields.
JetBrains released the language update on December 16. Installation instructions can be found at blog.jetbrains.com. This general release of the Java rival follows a release candidate introduced November 18.
With Kotlin 2.3.0, a new checker for unused values helps prevent ignored results. It warns whenever an expression returns a value other than Unit or Nothing and is not passed to a function, checked in a condition, or otherwise used. The checker helps catch bugs where a function call produces a meaningful result that was silently dropped, which can lead to unexpected behavior or hard-to-trace issues. This feature is currently experimental.
Another experimental capability in Kotlin 2.3.0 improves Kotlin’s interoperability with Swift through Swift export, adding support for native enum classes and variadic function parameters. Previously, Kotlin enums were exported as ordinary Swift classes. With mapping now direct, developers can use regular native Swift enums.
Other features in Kotlin 2.3.0 include the following:
- The Kotlin compiler now can generate classes containing Java 25 bytecode.
- Explicit backing fields provide a new syntax for explicitly declaring the underlying field that holds a property’s value, in contrast to the existing implicit backing fields.
- Support for
returnstatements in expression bodies with explicit return types is now enabled by default. - Context-sensitive resolution, still an experimental feature, has been improved. The sealed and enclosing supertypes of the current type are now considered part of the contextual scope of the search. No other supertype scopes are considered. And when type operators and equalities are involved, the compiler now reports a warning if using context-sensitive resolution makes the resolution ambiguous.
- Support for importing C and Objective-C capabilities to Kotlin/Native projects has moved to a beta stage.
- For Kotlin/Wasm (WebAssembly), Kotlin 2.3.0 enables by default fully qualified names for the Kotlin/Wasm targets, the new exception handling proposal for the
wasmWasitarget. It also introduces compact storage for Latin-1 characters. Also, the new WebAssembly exception handling proposal is enabled by default for thewasmWasitarget, ensuring better compatibility with modern WebAssembly runtimes. - For Kotlin/JS, suspend functions now can be exported directly to JavaScript using the
@JsExportannotation, and theBigInt64Arraytype now can be used to represent Kotlin’sLongArraytype. These are both experimental features. - Support is no longer available for the Ant build system.
.
What developers call themselves 17 Dec 2025, 10:00 am
I got an email from a friend this week in response to my column about coding domains that no developer understands. He wrote:
Bro, if you don’t understand Kubernetes, it just means your claim to have evolved from coder to developer is sketchy at best.
He’s known me a long time, and I’m guessing he was referring back to a long-lost review I wrote about a terrific book called Coder to Developer by Mike Gunderloy. It’s a terrific book, and up until about four minutes ago in AI time, it was totally relevant to our profession. The book covers the things that you need to be doing that aren’t actually writing code in order to be an actual professional in this business (whatever “professional” means here).
All that got me thinking about the words that we use to describe ourselves and how those words convey different meanings and purposes and layers of exactly what it means to be in the software development business.
Sockets, switches, and dials
The first computers weren’t coded with words or languages, but by manipulating physical entities to do fairly basic calculations. “Programmers” would plug wires into sockets, set switches, turn dials, and spin rotors. It was, at the time, considered “women’s work” because it was mostly clerical. But setting that aside, it was all mechanical in nature. These workers didn’t call themselves “programmers” but “operators” because they physically operated the machine. There was no separation between the machine and the logic used to run the machine. They were the same thing.
It wasn’t until the abstraction of a “computer language” came along that the term “programmer” was introduced. Programming languages allowed for a distinct separation between the logic used to execute a program and the physical device used to execute that logic. As computers became more generalized, the notion of being a “computer programmer” arose.
Early on, computer programs were “linear” or task-bound—that is, they started at Point A and ran to Point B, most often doing calculations of some sort. Sure, they had branching, looping, and flow control, but most often the programs started with some input and produced some output.
I remember back in high school when our programming assignments consisted of typing out the program on cards, turning those cards in to the “computer rat” in the computer lab, who would run it and neatly wrap the paper output around the cards with a rubber band.
As an aside, I often fantasize about what you could do if you took a laptop computer in a time machine and showed the Army in 1942 an Excel spreadsheet that could calculate the ballistic tables that rooms full of people with adding machines were slowly and laboriously calculating.
Boundaries, modules, and interfaces
But systems grew more complex, the machines grew more generalized, and the software grew more varied. The notion of user interface became an issue. Code became unwieldy, reusable, varied, and intertwined. Programmers had to start thinking in terms of boundaries and modules. The interfaces between these boundaries and modules became of great concern. Versioning became a thing. Managing all of that gave rise to the notion of “software engineering.” (I personally have never liked that term. I’ve always fallen on the side that considers software development an art, not a science.)
All of these developments have rendered the term “programmer” a bit old-fashioned. Instead, the notion of being a “software developer” arose. Thirty years ago, one might have described oneself as a “computer programmer,” but I hardly ever hear that term used by people in the business anymore. Most everyone describes themself as a “software developer” or “software engineer” these days.
These terms imply a level of knowledge and skill above the mere act of writing code. Code is still at the core, but as Gunderloy wrote in his book, there is a lot more to the job these days than writing code. Code is still at the core, but there are now layers of abstraction and complexity on top that render the term “programmer” inadequate for describing what it is we do.
And that brings us to the point we are at today — what, exactly, does it mean to be a “software developer” now? As I have argued, it seems like the “programming” portion of that is becoming less and less prevalent. Agentic AI is starting to do all the “clerical” work of software development. It writes the code. At most, we will be spot checking what it writes, but let’s face it, even that spot checking will go by the wayside.
Operator. Programmer. Developer. Engineer. What we call ourselves has changed over the years, as has exactly what it is that we do. At every step, it took a while for the function to get a new title. I wonder when we’ll come up with a new word to describe what it is that we do when wielding our coding agents. I’m pretty sure we eventually will.
Spring Boot tutorial: Get started with Spring Boot 17 Dec 2025, 9:00 am
Spring’s most popular offering, Spring Boot is a modern, lightweight extension of the original Spring Framework. Spring Boot offers the same power and range as the original Spring, but it’s streamlined with popular conventions and packages. Both agile and comprehensive, Spring Boot gives you the best of both worlds.
Dependency injection in Spring Boot
Like the Spring Framework, Spring Boot is essentially a dependency Injection engine. Dependency injection is slightly different from inversion of control, but the basic idea is the same. Dependency injection is what Spring was created for originally, and it drives much of the framework’s power and popularity.
Modern applications use variable references to connect the different parts of the application. In an object-oriented application, these parts (or components) are objects. In Java, we connect objects using classes that refer to each other. Spring’s genius was in moving the fulfillment of variable references out of hard-coded Java and into the framework itself. The Spring framework uses settings you provide to wire together the different parts of your application.
Spring Boot takes things a step further, though, by simplifying the way you configure your application settings. It also provides out-of-the-box packages (detailed later in this article) that perform most common application tasks.
As an example, in vanilla Java, you might have a Knight class that is associated with a Weapon:
public class Knight {
private Sword weapon = new Sword();
public void attack() {
weapon.use();
}
}
In Spring, you create the two classes and annotate each one. Spring then automatically fulfills (or “injects”) the association between the two:
@Component
public class Knight {
private final Weapon weapon;
@Autowired
public Knight(Weapon weapon) {
this.weapon = weapon;
}
public void attack() {
weapon.use();
}
}
In recent versions of Spring (4.3 and higher) even the @Autowired annotation is optional if the class has a single constructor. This update makes Spring Boot feel even more like an extension of the Java platform, although some critique it as a hidden dependency.
Getting started with Spring Boot
There are a couple of ways to get started with a new Spring Boot application. One is to use the start.spring.io website to download a zipped-up version of a new application. Personally, I prefer using the Spring Boot command line.
Of course, to work with Spring, you will need Java installed first. SDKMAN is an excellent tool you can use to install both Java and Spring Boot. Or, if you are using Windows, consider WSL with SDKMAN or a native tool like Scoop. Assuming you already have Java installed, you can start with:
$ sdk install springboot
The Spring Boot CLI has many powers, but we just want a simple demo application to start with:
$ spring init --group-id=com.infoworld --artifact-id=demo \
--package-name=com.infoworld.demo \
--build=maven --java-version=17 \
--dependency=web \
demo
Although this looks like a big chunk of code, it only defines the basics of a Java application—the group, artifact, and package names, as well as the Java version and build tool—followed by one Spring Boot dependency: web.
Web development with Spring Boot
This web dependency lets you do all the things required for a web application. In Spring Boot, this kind of dependency is known as a “starter”; in this case, it’s the spring-boot-starter-web starter. This web starter adds a single umbrella dependency (like Spring MVC) to the Maven POM. When Spring Boot sees this dependency, it will automatically add web development support, like a Tomcat server.
The above command also adds directories and files to the new demo/ folder. One of them is demo/src/main/java/com/infoworld/demo/DemoApplication.java, the main executable for the demo program:
package com.infoworld.demo;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class DemoApplication {
public static void main(String[] args) {
SpringApplication.run(DemoApplication.class, args);
}
}
Configuring an application in Spring Boot
The @SpringBootApplication annotation is a shorthand for three other annotations that will do most of the up-front configuration of an application:
- @EnableAutoConfiguration: This powerful annotation instructs Spring to search and configure the context automatically, by guessing and configuring the beans your application may need. In Spring, a bean is any class that can be injected. This annotation creates new beans (like the previously mentioned Tomcat server) based on your application dependencies.
- @ComponentScan: This annotation enables Spring to search for components. It will automatically wire together the components you have defined (with annotations like
@Component,@Repository,or@Bean) and make them available to the application. - @Configuration: This annotation lets you define beans directly inside the main class and make them available to the application.
The overall effect of these three annotations is to instruct Spring to create standard components and make them available to your web application, while also allowing you to define some components yourself within the main class. These components are then wired up automatically. That’s what they call bootstrapping!
Adding a web controller
Since we added the web dependency when we created our example project, Spring Boot automatically made a variety of annotations available, including @RestController and @GetMapping.
We can use these to generate endpoints for our web application. Then, we can drop the new file right next to the main DemoApplication class:
// demo/src/main/java/com/infoworld/demo/HelloController.java
package com.infoworld.demo;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
public class HelloController {
@GetMapping("/")
public String hello() {
return "Hello, InfoWorld!";
}
}
Spring Boot will automatically find this class and load it when we run the application. When we visit the root path, it will simply return a “Hello, InfoWorld!” string.
Running a Spring Boot application
Now we can run the application from the command line:
$ ./mvnw spring-boot:run
This command uses the bundled Maven program to call the built-in spring-boot:run target. This launches the application in dev mode.
For a simple test, we can use cURL:
$ curl http://localhost:8080
Hello, InfoWorld!
As you can see, Spring Boot makes it exceedingly quick and easy to go from a blank canvas to a working endpoint.
Spring Boot starter packages
In addition to the web starter, Spring Boot includes a large ecosystem of other powerful starter packages that provide out-of-the-box support. This section is a quick overview of some of the most important and popular Spring Boot starters.
Core and web
- spring-boot-starter-web: The starter for RESTful APIs and an embedded Tomcat server. (You’ve already seen this one.)
- spring-boot-starter-webflux: This is the reactive version of the web starter, allowing for non-blocking endpoints using a Netty server.
- spring-boot-starter-thymeleaf: Adds the popular Thymeleaf HTML templating engine for Java.
- spring-boot-starter-websocket: Adds WebSocket support to Spring MVC.
Data access
- spring-boot-starter-data-jpa: Adds the Spring JPA with Hibernate data layer.
- spring-boot-starter-data-mongodb: Adds MongoDB support.
- spring-boot-starter-data-redis: Adds support for working with the Redis key-value data store.
- spring-boot-starter-jdbc: Adds standard JDBC with a connection pool.
- spring-boot-starter-data-r2dbc: Adds Reactive (non-blocking asynchronous) relational database support.
Security
- spring-boot-starter-security: Adds authentication and authorization support. This starter automatically configures the appropriate security based on the other starters in place.
- spring-boot-starter-oauth2-client: Provides OAuth2 and OpenID client support (i.e., JWT tokens sent with requests).
- spring-boot-starter-oauth2-resource-server: Used for handling OAuth requests, most commonly from the app’s own front-end clients, and validating the tokens sent to the endpoints.
Messaging
- spring-boot-starter-amqp: Adds support for RabbitMQ using Spring AMQP.
- spring-boot-starter-kafka: Adds Apache Kafka for high-throughput messaging.
- spring-boot-starter-artemis: Adds Apache Artemis as a JMS broker.
Operations and testing
- spring-boot-starter-actuator: Production-grade support for health checks, metrics, and monitoring.
- spring-boot-starter-test: Testing libraries, including JUnit 5, Mockito, and Spring Test.
- spring-boot-starter-validation: JavaBean Validation with Hibernate Validator.
- spring-boot-starter-aop: Aspect-oriented programming with Spring AOP and AspectJ.
Configuring Spring Boot applications
When we started the example application, it created an application.properties file. This is a text file that we can use to make configuration changes in a central place.
For example, if we wanted to change the port that our app listens on from the default 8080, we could add this line:
server.port=8081
Or, if we wanted to provide a string value to be used in the application, we could add it here (and then we’d be able to modify it without recompiling):
welcome.greeting=Hello from InfoWorld!
We can then inject the message into the controller:
package com.infoworld.demo;
import org.springframework.beans.factory.annotation.Value; // Import this
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
public class HelloController {
@Value("${welcome.greeting}") // Inject the property
private String message;
@GetMapping("/")
public String hello() {
return this.message; // Return the property
}
}
Conclusion
Thanks to its versatility and the structure it provides, Spring is a mainstay of the enterprise. Spring Boot gives you a low-effort entry to Spring without losing any of the Spring Framework’s power. It is an excellent tool to learn both for the present and the future technology landscape.
Django tutorial: Get started with Django 6 17 Dec 2025, 9:00 am
Django is a one-size-fits-all Python web framework that was inspired by Ruby on Rails and uses many of the same metaphors to make web development fast and easy. Fully loaded and flexible, Django has become one of Python’s most widely used web frameworks.
Now in version 6.0, Django includes virtually everything you need to build a web application of any size, and its popularity makes it easy to find examples and help for various scenarios. Plus, Django provides tools to allow your application to evolve and add features gracefully, and to migrate its data schema if there is one.
Django also has a reputation for being complex, with many components and a good deal of “under the hood” configuration required. In truth, you can use Django to get a simple Python application up and running in relatively short order, then expand its functionality as needed.
This article guides you through creating a basic application using Django 6.0. We’ll also touch on the most crucial features for web developers in the Django 6 release.
Installing Django
Assuming you have Python 3.12 or higher installed, the first step to installing Django is to create a virtual environment. Installing Django in the venv keeps Django and its associated libraries separate from your base Python installation, which is always a good practice.
Next, install Django in your chosen virtual environment via Python’s pip utility:
pip install django
This installs the core Django libraries and the django-admin command-line utility used to manage Django projects.
Creating a new Django project
Django instances are organized into two tiers: projects and apps.
- A project is an instance of Django with its own database configuration, settings, and apps. It’s best to think of a project as a place to store all the site-level configurations you’ll use.
- An app is a subdivision of a project, with its own route and rendering logic. Multiple apps can be placed in a single Django project.
To create a new Django project from scratch, activate the virtual environment where you have Django installed. Then enter the directory where you want to store the project and type:
django-admin startproject
The is the name of both the project and the subdirectory where the project will be stored. Be sure to pick a name that isn’t likely to collide with a name used by Python or Django internally. A name like myproj works well.
The newly created directory should contain a manage.py file, which is used to control the app’s behavior from the command line, along with another subdirectory (also with the project name) that contains the following files:
- An
__init__.pyfile, which is used by Python to designate a subdirectory as a code module. settings.py, which holds the settings used for the project. Many of the most common settings will be pre-populated for you.urls.py, which lists the routes or URLs available to your Django project, or that the project will return responses for.wsgi.py, which is used by WSGI-compatible web servers, such as Apache HTTP or Nginx, to serve your project’s apps.asgi.py, which is used by ASGI-compatible web servers to serve your project’s apps. ASGI is a relatively new standard for asynchronous servers and applications, and requires a server that supports it, likeuvicorn. Django only recently added native support for asynchronous applications, which will also need to be hosted on an async-compatible server to be fully effective.
Next, test the project to ensure it’s functioning. From the command line in the directory containing your project’s manage.py file, enter:
python manage.py runserver
This should start a development web server available at http://127.0.0.1:8000/. Visit that link and you should see a simple welcome page that tells you the installation was successful.
Note that the development web server should not be used to serve a Django project to the public. It’s solely for local testing and is not designed to scale for public-facing applications.
Creating a Django application
Next, we’ll create an application inside of this project. Navigate to the same directory as manage.py and issue the following command:
python manage.py startapp myapp
This creates a subdirectory for an application named myapp that contains the following:
- A migrations directory: Contains code used to migrate the site between versions of its data schema. Django projects typically have a database, so the schema for the database—including changes to the schema—is managed as part of the project.
admin.py: Contains objects used by Django’s built-in administration tools. If your app has an admin interface or privileged users, you will configure the related objects here.apps.py: Provides configuration information about the app to the project at large, by way of anAppConfigobject.models.py: Contains objects that define data structures, used by your app to interface with databases.tests.py: Contains any tests created by you and used to ensure that your site’s functions and modules are working as intended.views.py: Contains functions that render and return responses.
To start working with the application, you need to first register it with the project. Edit myproj/settings.py as follows, adding a line to the top of the INSTALLED_APPS list:
INSTALLED_APPS = [
"myapp.apps.MyappConfig",
"django.contrib.admin",
...
If you look in myproj/myapp/apps.py, you’ll see a pre-generated object named MyappConfig, which we’ve referenced here.
Adding routes and views to your Django application
Django applications follow a basic pattern for processing requests:
- When an incoming request is received, Django parses the URL for a route to apply it to.
- Routes are defined in
urls.py, with each route linked to a view, meaning a function that returns data to be sent back to the client. Views can be located anywhere in a Django project, but they’re best organized into their own modules. - Views can contain the results of a template, which is code that formats requested data according to a certain design.
To get an idea of how all these pieces fit together, let’s modify the default route of our sample application to return a custom message.
Routes are defined in urls.py, in a list named urlpatterns. If you open the sample urls.py, you’ll see urlpatterns already predefined:
urlpatterns = [
path('admin/', admin.site.urls),
]
The path function (a Django built-in) takes a route and a view function as arguments and generates a reference to a URL path. By default, Django creates an admin path that is used for site administration, but we need to create our own routes.
Add another entry, so that the whole file looks like this:
from django.contrib import admin
from django.urls import include, path
urlpatterns = [
path('admin/', admin.site.urls),
path('myapp/', include('myapp.urls'))
]
The include function tells Django to look for more route pattern information in the file myapp.urls. All routes found in that file will be attached to the top-level route myapp (e.g., http://127.0.0.1:8080/myapp).
Next, create a new urls.py in myapp and add the following:
from django.urls import path
from . import views
urlpatterns = [
path('', views.index)
]
Django prepends a slash to the beginning of each URL, so to specify the root of the site (/), we just supply a blank string as the URL.
Now, edit the file myapp/views.py so it looks like this:
from django.http import HttpResponse
def index(request):
return HttpResponse("Hello, world!")
django.http.HttpResponse is a Django built-in that generates an HTTP response from a supplied string. Note that request, which contains the information for an incoming HTTP request, must be passed as the first parameter to a view function.
Stop and restart the development server, and navigate to http://127.0.0.1:8000/myapp/. You should see “”Hello, world!” appear in the browser.
Adding routes with variables in Django
Django can accept routes that incorporate variables as part of their syntax. Let’s say you wanted to accept URLs that had the format year/. You could accomplish that by adding the following entry to urlpatterns:
path(‘year/’, views.year)
The view function views.year would then be invoked through routes like year/1996, year/2010, and so on, with the variable year passed as a parameter to views.year.
To try this out for yourself, add the above urlpatterns entry to myapp/urls.py, then add this function to myapp/views.py:
def year(request, year):
return HttpResponse('Year: {}'.format(year))
If you navigate to /myapp/year/2010 on your site, you should see Year: 2010 displayed in response. Note that routes like /myapp/year/rutabaga will yield an error because the int: constraint on the variable year allows only an integer in that position. Many other formatting options are available for routes.
Django templates and template partials
You can use Django’s built-in template language to generate web pages from data.
Templates used by Django apps are stored in a directory that is central to the project: . For our myapp project, the directory would be myapp/templates/myapp/. This directory structure may seem awkward, but allowing Django to look for templates in multiple places avoids name collisions between templates with the same name across multiple apps.
In your myapp/templates/myapp/ directory, create a file named year.html with the following content:
Year: {{year}}
Any value within double curly braces in a template is treated as a variable. Everything else is treated literally.
Modify myapp/views.py to look like this:
from django.shortcuts import render
from django.http import HttpResponse
def index(request):
return HttpResponse("Hello, world!")
def year(request, year):
data = {'year':year}
return render(request, 'myapp/year.html', data)
The render function—a Django “shortcut” (a combination of multiple built-ins for convenience)—takes the existing request object, looks for the template myapp/year.html in the list of available template locations, and passes the dictionary data to it as context for the template. The template uses the dictionary as a namespace for variables used in the template. In this case, the variable {{year}} in the template is replaced with the value for the key year in the dictionary data (that is, data["year"]).
The amount of processing you can do on data within Django templates is intentionally limited. Django’s philosophy is to enforce the separation of presentation and business logic whenever possible. Thus, you can loop through an iterable object, and you can perform if/then/else tests, but modifying the data within a template is discouraged.
For instance, you could encode a simple “if” test this way:
{% if year > 2000 %}
21st century year: {{year}}
{% else %}
Pre-21st century year: {{year}}
{% endif %}
The {% and %} markers delimit blocks of code that can be executed in Django’s template language.
If you want to use a more sophisticated template processing language, you can swap in something like Jinja2 or Mako. Django includes back-end integration for Jinja2, but you can use any template language that returns a string—for instance, by returning that string in an HttpResponse object, as in the case of our “Hello, world!” route.
In versions 6 and up, Django supports template partials, a way to create portions of a template that can be defined once and reused throughout a template. This lets you precompute a given value once over the course of a given template—such as a fancy display version of a user name—and re-use it without having to recompute it each time it’s displayed.
Doing more with Django
What you’ve seen here covers only the most basic elements of a Django application. Django includes a great many other components for use in web projects. Here’s a quick overview:
- Databases and data models: Django’s built-in ORM lets you define data structures and relationships between them, as well as migration paths between versions of those structures.
- Forms: Django provides a consistent way for views to supply input forms to a user, retrieve data, normalize the results, and provide consistent error reporting. Django 6 added support for Content Security Policy, a way to prevent submitted forms from being vulnerable to content injection or cross-site scripting (XSS) attacks.
- Security and utilities: Django includes many built-in functions for caching, logging, session handling, handling static files, and normalizing URLs. It also bundles tools for common security needs like using cryptographic certificates or guarding against cross-site forgery protection or clickjacking.
- Tasks: Django 6 added a native mechanisms for creating and managing long-running background tasks, without holding up a response to the user. Note that Django only provides ways to set up and keep track of tasks; it doesn’t include the actual execution mechanism. The only included back ends for tasks are for testing, so you will either need to add a third-party solution or write your own using Django’s back-end task code as a base.
Microsoft deprecates IntelliCode for Visual Studio Code 17 Dec 2025, 1:59 am
Microsoft is officially deprecating the IntelliCode AI-assisted code completion extensions for the Visual Studio Code editor, and is recommending that C# developers use the GitHub Copilot Chat conversational AI assistant instead.
A Microsoft post on GitHub lists the following VS Code extensions as being deprecated: IntelliCode, IntelliCode Completions, IntelliCode for C# Dev Kit, and IntelliCode API Usage Examples. The company recommends that developers uninstall the IntelliCode for C# Dev Kit extension and continue using the built-in language server support from Roslyn or install GitHub Copilot Chat for advanced suggestions and inline completions. In the wake of the deprecations, developers will get the same language server-powered completion lists (IntelliSense) along with other language server features such as signature help, hover information, and syntax highlighting from the Roslyn .NET compiler platform in VS Code.
Bug fixes and support ends immediately for listed extensions, which will be marked as deprecated. Also, the deprecation specifically means that starred completions in the code completion lists, i.e. IntelliSense, will no longer be shown. Additionally, inline gray text suggestions will be removed and no new features will be added for listed extensions.
Azul acquires enterprise Java middleware provider Payara 16 Dec 2025, 10:53 pm
Eying its competition with Oracle in the Java space, Azul has acquired Payara, a provider of enterprise solutions for Jakarta EE Java-based applications and microservices for cloud-native and hybrid cloud deployments.
Announced December 10, the deal enables Java platform provider Azul to offer faster, more efficient, more secure, and more cost-effective deployments in the Java application stack, Azul said. The company said the combination of Azul and Payara addresses pressing challenges enterprises face today: accelerating application modernization, achieving cloud-native agility, and reducing dependencies on proprietary platforms. With an integrated offering, users are provided with a unified, enterprise-grade Java platform based on open-source that can support an organization’s full Java fleet – from business-critical applications to IoT, microservices, and modern Java frameworks, Azul said.
The acquisition marks a moment in enterprise Java innovation and builds on nearly eight years of collaboration between Azul and Payara, according to Azul. This collaboration between with the two began in 2018 with the introduction of Azul Platform Core embedded into Payara Server Enterprise. Payara adds engineering expertise and experience in Java Enterprise Edition, thus strengthening the Azul Java platform with complementary products and enhanced market reach, said Azul.
AWS AI Factories: Innovation or complication? 16 Dec 2025, 9:00 am
Last week at AWS re:Invent, amid many product announcements and cloud messages, AWS introduced AWS AI Factories. The press release emphasizes accelerating artificial intelligence development with Trainium, Nvidia GPUs, and reliable, secure infrastructure, all delivered with the ease, security, and sophistication you’ve come to expect from Amazon’s cloud. If you’re an enterprise leader with a budget and a mandate to “do more with AI,” the announcement is likely to prompt C-suite inquiries about deploying your own factory.
The reality warrants a more skeptical look. AWS AI Factories are certainly innovative, but as is so often the case with big public cloud initiatives, I find myself asking who this is actually for—and at what ultimate cost? The fanfare glosses over several critical realities that most enterprises simply cannot afford to ignore.
First, let’s get one uncomfortable truth out of the way: For many organizations, especially those beholden to strict regulatory environments or that require ultra-low latency, these “factories” are little more than half measures. They exist somewhere between true on-premises infrastructure and public cloud, offering AWS-managed AI in your own data center but putting you firmly inside AWS’s walled garden. For some, that’s enough. For most, it creates more headaches than it solves.
Innovative but also expensive
AWS AI Factories promise to bring cutting-edge AI hardware and foundation model access to your own facilities, presumably addressing concerns around data residency and sovereignty. But as always, the devil is in the details. AWS delivers and manages the infrastructure, but you provide the real estate and power. You get Bedrock and SageMaker, you bypass the procurement maze, and, in theory, you enjoy the operational excellence of AWS’s cloud—homegrown, in your own data center.
Here’s where theory and practice diverge. For customers that need to keep AI workloads and data truly local, whether for latency, compliance, or even corporate paranoia, this architecture is hardly a panacea. There’s always an implicit complexity to hybrid solutions, especially when a third party controls the automation, orchestration, and cloud-native features. Instead of true architectural independence, you’re just extending your AWS dependency into your basement.
What about cost? AWS has not formally disclosed and almost certainly will not publish a simple pricing page. My experience tells me the price tag will come in at two to three (or more) times the cost of a private cloud or on-premises AI solution. That’s before you start factoring in the inevitable customizations, integration projects, and ongoing operational bills that public cloud providers are famous for. While AWS promises faster time to market, that acceleration comes at a premium that few enterprises can ignore in this economy.
Let’s also talk about lock-in, a subject that hardly gets the attention it deserves. With each layer of native AWS AI service you adopt—the glue that connects your data to their foundation models, management tools, and development APIs—you’re building business logic and workflows on AWS terms. It’s easy to get in and nearly impossible to get out. Most of my clients now find themselves married to AWS (or another hyperscaler) not because it’s always the best technology, but because the migrations that started five, eight, or ten years ago created a dependency web too expensive or disruptive to untangle. The prospect of “divorcing” the public cloud, as it’s been described to me, is unthinkable, so they stay and pay the rising bills.
What to do instead
My advice for most enterprises contemplating an AI Factories solution is simple: Pass. Don’t let re:Invent theatrics distract you from the basics of building workable, sustainable AI. The hard truth is that you’re likely better off building your own path with a do-it-yourself approach: choosing your own hardware, storage, and frameworks, and integrating only those public cloud services that add demonstrable value. Over the long term, you control your stack, you set your price envelope, and you retain the flexibility to pivot as the industry changes.
So, what’s the first step on an enterprise AI journey? Start by honestly assessing your actual AI requirements in depth. Ask what data you really need to stay local, what latency targets are dictated by your business, and what compliance obligations you must meet. Don’t let the promise of turnkey solutions lure you into misjudging these needs or taking on unnecessary risk.
Second, develop a strategy that guides AI use for the next five to ten years. Too often, I see organizations jump on the latest AI trends without a clear plan for how these capabilities should develop alongside business goals and technical debt. By creating a strategy that includes both short-term successes and long-term adaptability, it’s much less likely you’ll be trapped in costly or unsuitable solutions.
Finally, look at every vendor and every architectural choice through the lens of total cost of ownership. AWS AI Factories will likely be priced at a premium that’s hard to justify unless you’re absolutely desperate for AWS integration in your own data center. Consider hardware life-cycle costs, operational staffing, migration, vendor lock-in, and, above all, the costs associated with switching down the line if your needs or your vendor relationships change. Price out all the paths, not just the shiny new one a vendor wants to sell you.
The future has a bottom line
AWS AI Factories introduce a new twist to the cloud conversation, but for most real enterprise needs, it’s not the breakthrough the headlines suggest. Cloud solutions, especially those managed by your cloud provider in your own house, may be easy in the short term. However, that ease is always expensive, always anchored to long-term lock-in, and ultimately much more complex to unwind than most leaders anticipate.
The winners in the next phase of enterprise AI will be those who chart their own course, building for flexibility, cost-efficiency, and independence regardless of what’s splashed across the keynote slides. DIY is harder at the outset, but it’s the only way to guarantee you’ll hold the keys to your future rather than handing them over to someone else—no matter how many accelerators are in the rack.
5 key agenticops practices to start building now 16 Dec 2025, 9:00 am
AI agents combine language and reasoning models with the ability to take action through automations and APIs. Agent-to-agent protocols like the Model Context Protocol (MCP) enable integrations, making each agent discoverable and capable of orchestrating more complex operations.
Many organizations will first experiment with AI agents embedded in their SaaS applications. AI agents in HR can assist recruiters with the hiring process, while AI agents in operations address complex supply-chain issues. AI agents are also transforming the future of work by taking notes, scheduling meetings, and capturing tasks in workflow tools.
Innovative companies are taking the next steps and developing AI agents. These agents will augment proprietary workflows, support industry-specific types of work, and will be integrated into customer experiences. To develop these AI agents, organizations must consider the development principles, architecture, non-functional requirements, and testing methodologies that will guide AI agent rollouts. These steps are essential before deploying experiments or promoting AI agents into production.
Rapidly deploying AI agents poses operational and security risks, prompting IT leaders to consider a new set of agentic operations practices. agenticops will extend devops practices and IT service management functions to secure, observe, monitor, and respond to AI agent incidents.
What is agenticops?
Agenticops builds on several existing IT operational capabilities:
- AIops emerged several years ago to address the problem of having too many independent monitoring tools. AIops platforms centralize logfiles and other observability data, then apply machine learning to correlate alerts into manageable incidents.
- Modelops emerged as a separate capability to monitor machine learning models in production for model drift and other operational issues.
- Combining platform engineering, automating IT processes, and using genAI in IT operations helps IT teams improve collaboration and resolve incidents efficiently.
Agenticops must also support the operational needs unique to managing AI agents while providing IT with new AI capabilities.
DJ Sampath, SVP of the AI software and platform group at Cisco, notes that there are “three core requirements of agenticops”:
- Centralizing data from across multiple operational silos together
- Supporting collaboration between humans and AI agents
- Leveraging purpose-built AI language models that understand networks, infrastructure, and applications
“AI agents with advanced models can help network, system, and security engineers configure networks, understand logs, run queries, and address issue root causes more efficiently and effectively,” he says.
These requirements address the distinct challenges involved with managing AI agents versus applications, web services, and AI models.
“AI agents in production need a different playbook because, unlike traditional apps, their outputs vary, so teams must track outcomes like containment, cost per action, and escalation rates, not just uptime,” says Rajeev Butani, chairman and CEO of MediaMint. “The real test is not avoiding incidents but proving agents deliver reliable, repeatable outcomes at scale.”
Here are five agenticops practices IT teams can begin to integrate now, as they begin to develop and deploy more AI agents in production.
1. Establish AI agent identities and security profiles
What data and APIs are agents empowered to access? A recommended practice is to provision AI agents the same way we do humans, with identities, authorizations, and entitlements using platforms like Microsoft Entra ID, Okta, Oracle Identity and Access Management, or other IAM (identity and access management) platforms.
“Because AI agents adapt and learn, they need strong cryptographic identities, and digital certificates make it possible to revoke access instantly if an agent is compromised or goes rogue,” says Jason Sabin, CTO of DigiCert. Securing agent identities in this manner, similar to machine identities, ensures digital trust and accountability across the security architecture.”
Recommendation: Architects, devops engineers, and security leaders should collaborate on standards for IAM and digital certificates for the initial rollout of AI agents. But expect capabilities to evolve, especially as the number of AI agents scales. As the agent workforce grows, specialized tools and configurations may be needed.
2. Extend platform engineering, observability, and monitoring for AI agents
As a hybrid of application, data pipelines, AI models, integrations, and APIs, AI agents require combining and extending existing devops practices. For example, platform engineering practices will need to consider unstructured data pipelines, MCP integrations, and feedback loops for AI models.
“Platform teams will play an instrumental role in moving AI agents from pilots into production,” says Christian Posta, Global Field CTO of Solo.io. “That means evolving platform engineering to be context aware, not just of infrastructure, but of the stateful prompts, decisions, and data flows that agents and LLMs rely on. Organizations get observability, security, and governance without slowing down the self-service innovation AI teams need.”
Similarly, observability and monitoring tools will need to help diagnose more than uptime, reliability, errors, and performance.
“AI agents require multi-layered monitoring, including performance metrics, decision logging, and behavior tracking,” says Federico Larsen, CTO of Copado. “Conducting proactive anomaly detection using machine learning can identify when agents deviate from expected patterns before business impact occurs. You should also establish clear escalation paths when AI agents make unexpected decisions, with human-in-the-loop override capabilities.”
Observability, monitoring, and incident management platforms with capabilities supporting AI agents as of this writing include BigPanda, Cisco AI Canvas, Datadog LLM observability, and SolarWinds AI Agent.
Recommendation: Devops teams will need to define the minimally required configurations and standards for platform engineering, observability, and monitoring for the first AI agents deployed to production. Then, teams should monitor their vendor capabilities and review new tools as AI agent development becomes mainstream.
3. Upgrade incident management and root cause analysis
Site reliability engineers (SREs) often struggle to find root causes for application and data pipeline issues. With AI agents, they will face significantly greater challenges.
When an AI agent hallucinates, provides an incorrect response, or automates improper actions, SREs and IT operations must respond and resolve issues. They will need to trace the agent’s data sources, models, reasoning, empowerments, and business rules to identify root causes.
“Traditional observability falls short because it only tracks success or failure, and with AI agents, you need to understand the reasoning pathway—which data the agent used, which models influenced it, and what rules shaped its output,” says Kurt Muehmel, head of AI strategy at Dataiku. “Incident management becomes inspection, and root cause isn’t just, “the agent crashed,” it’s “the agent used stale data because the upstream model hadn’t refreshed.” Enterprises need tools that inspect decision provenance and tune orchestration—getting under the hood, not just asking what went wrong.”
Andy Sen, CTO of AppDirect, recommends repurposing real-time monitoring tools and utilizing logging and performance metrics to track AI agents’ behavior. “When incidents occur, keep existing procedures for root cause analysis and post-incident reviews, and provide this data to the agent as feedback for continuous improvement. This integrated approach to observability, incident management, and user support not only enhances the performance of AI agents but also ensures a secure and efficient operational environment.”
Recommendation: Select tools and train SREs on the concepts of data lineage, provenance, and data quality. These areas will be critical to up-skilling IT operations to support incident and problem management related to AI agents.
4. Track KPIs on model accuracy, drift, and costs
Most devops organizations look well beyond uptime and system performance metrics to gauge an application’s reliability. SREs manage error budgets to drive application improvements and reduce technical debt.
Standard SRE practices of understanding business impacts and tracking subtle errors become more critical when tracking AI agents. Experts identified three areas where new KPIs and metrics may be needed to track an AI agent’s behaviors and end-user benefits continuously:
- Craig Wiley, senior director of product for AI/ML at Databricks, says, “Defining KPIs can help you establish a proper monitoring system. For example, accuracy must be higher than 95%, which can then trigger alert mechanisms, providing your organization with a centralized visibility and response system.”
- Jacob Leverich, co-founder and CPO of Observe, Inc., says, “With AI agents, teams may find themselves taking a heavy dependency on model providers, so it becomes critical to monitor token usage and understand how to optimize costs associated with the use of LLMs.”
- Ryan Peterson, EVP and CPO at Concentrix, says, “Data readiness isn’t a one-time check; it requires continuous audits for freshness and accuracy, bias testing, and alignment to brand voice. Metrics like knowledge base coverage, update frequency, and error rates are the real tests of AI-ready data.”
Recommendation: Leaders should define a holistic model of operational metrics for AI agents, which can be implemented using third-party agents from SaaS vendors and proprietary ones developed in-house.
5. Capture user feedback to measure AI agent usefulness
Devops and ITops sometimes overlook the importance of tracking customer and employee satisfaction. Leaving the review of end-user metrics and feedback to product management and stakeholders is shortsighted, even in the application domain. Such review becomes a more critical discipline when supporting AI agents.;
“Managing AI agents in production starts with visibility into how they operate and what outcomes they drive,” says Saurabh Sodani, chief development officer at Pendo. “We think about connecting agent behavior to the user experience and not just about whether an agent responds, but whether it actually helps someone complete a task, resolve an issue, or move through a workflow, all the while being compliant. That level of insight is what allows teams to monitor performance, respond to issues, and continuously improve how agents support users in interactive, autonomous, and asynchronous modes.”
Recommendation: User feedback is essential operational data that shouldn’t be left out of scope in AIops and incident management. This data not only helps to resolve issues with AI agents, but is critical for feeding back into AI agent language and reasoning models.
Conclusion
As more organizations develop and experiment with AI agents, IT operations will need the tools and practices to manage them in production. IT teams should start now by tracking end-user impacts and business outcomes, then work deeper into tracking the agent’s performance in recommending decisions and providing responses. Focusing only on system-level metrics is insufficient when monitoring and resolving issues with AI agents.
Nvidia bets on open infrastructure for the agentic AI era with Nemotron 3 16 Dec 2025, 4:13 am
AI agents must be able to cooperate, coordinate, and execute across large contexts and long time periods, and this, says Nvidia, demands a new type of infrastructure, one that is open.
The company says it has the answer with its new Nemotron 3 family of open models.
Developers and engineers can use the new models to create domain-specific AI agents or applications without having to build a foundation model from scratch. Nvidia is also releasing most of its training data and its reinforcement learning (RL) libraries for use by anyone looking to build AI agents.
[ Related: More Nvidia news and insights ]
“This is Nvidia’s response to DeepSeek disrupting the AI market,” said Wyatt Mayham of Northwest AI Consulting. “They’re offering a ‘business-ready’ open alternative with enterprise support and hardware optimization.”
Introducing Nemotron 3 Nano, Super, and Ultra
Nemotron 3 features what Nvidia calls a “breakthrough hybrid latent mixture-of-experts (MoE) architecture”. The model comes in three sizes:
- Nano: The smallest and most “compute-cost-efficient,” intended for targeted, highly-efficient tasks like quick information retrieval, software debugging, content summarization, and AI assistant workflows. The 30-billion-parameter model activates 3 billion parameters at a time for speed and has a 1-million-token context window, allowing it to remember and connect information over multi-step tasks.
- Super: An advanced, high-accuracy reasoning model with roughly 100 billion parameters, up to 10 billion of which are active per token. It is intended for applications that require many collaborating agents to tackle complex tasks, such as deep research and strategy planning, with low latency.
- Ultra: A large reasoning engine intended for complex AI applications. It has 500 billion parameters, with up to 50 billion active per token.
Nemotron 3 Nano is now available on Hugging Face and through other inference service providers and enterprise AI and data infrastructure platforms. It will soon be made available on AWS via Amazon Bedrock and will be supported on Google Cloud, CoreWeave, Microsoft Foundry, and other public infrastructures. It is also offered as a pre-built Nvidia NIM microservice.
Nemotron 3 Super and Ultra are expected to be available in the first half of 2026.
Positioned as an infrastructure layer
The strategic positioning here is fundamentally different from that of the API providers, experts note.
“Nvidia isn’t trying to compete with OpenAI or Anthropic’s hosted services — they’re positioning themselves as the infrastructure layer for enterprises that want to build and own their own AI agents,” said Mayham.
Brian Jackson, principal research director at Info-Tech Research Group, agreed that the Nemotron models aren’t intended as a ready-baked product. “They are more like a meal kit that a developer can start working with,” he said, “and make desired modifications along the way to get the exact flavor they want.”
Hybrid architecture enhances performance
So far, Nemotron 3 seems to be exhibiting impressive gains in efficiency and performance; according to third-party benchmarking company Artificial Analysis, Nano is the most efficient among those of its size, and leads in accuracy.
Nvidia says Nano’s hybrid Mamba-Transformer MoE architecture, which integrates three architectures into a single backbone, supports this efficiency. Mamba layers offer efficient sequence modeling, transformer layers provide precision reasoning, and MoE routing gives scalable compute efficiency. The company says this design delivers a 4X higher token throughput compared to Nemotron 2 Nano while reducing reasoning-token generation by up to 60%.
“Throughput is the critical metric for agentic AI,” said Mayham. “When you’re orchestrating dozens of concurrent agents, inference costs scale dramatically. Higher throughput means lower cost per token and more responsive real-time agent behavior.”
The 60% reduction in reasoning-token generation addresses the “verbosity problem,” where chain-of-thought (CoT) models generate excessive internal reasoning before producing useful output, he noted. “For developers building multi-agent systems, this translates directly to lower latency and reduced compute costs.”
The upcoming Nemotron 3 Super, Nvidia says, excels at applications that require many collaborating agents to achieve complex tasks with low latency, while Nemotron 3 Ultra will serve as an advanced reasoning engine for AI workflows that demand deep research and strategic planning.
Mayham explained that these as-yet-unreleased models feature latent MoE, which projects tokens into a smaller, latent, dimension before expert routing, “theoretically” enabling 4X more experts at the same inference cost because it reduces communication overhead between GPUs.
The hybrid architecture behind Nemotron 3 that combines Mamba-2 layers, sparse transformers, and MoE routing is “genuinely novel in its combination,” Mayham said, although each technique exists individually elsewhere.
Ultimately, Nemotron pricing is “attractive,” he said; open weights are free to download and run locally. Third-party API pricing on DeepInfra starts at $0.06/million input tokens for Nemotron 3 Nano, which is “significantly cheaper” than GPT-4o, he noted.
Differentiator is openness
To underscore its commitment to open source, Nvidia is revealing some of Nemotron 3’s inner workings, releasing a dataset with real-world telemetry for safety evaluations, and 3 trillion tokens of Nemotron 3’s pretraining, post-training, and RL datasets.
In addition, Nvidia is open-sourcing its NeMo Gym and NeMo RL libraries, which provide Nemotron 3’s training environments and post-training foundation, and NeMo Evaluator, to help builders validate model safety and performance. All are now available on GitHub and Hugging Face. Of these, Mayham noted, NeMo Gym might be the most “strategically significant” piece of this release.
Pre-training teaches models to predict tokens, not to complete domain-specific tasks, and traditional RL from human feedback (RLHF) doesn’t scale for complex agentic behaviors, Mayham explained. NeMo Gym enables RL with verifiable rewards — essentially computational verification of task completion rather than subjective human ratings. That is, did the code pass tests? Is the math correct? Were the tools called properly?
This gives developers building domain-specific agents the infrastructure to train models on their own workflows without having to understand the full RL training loop.
“The idea is that NeMo Gym will speed up the setup and execution of RL jobs for models,” explained Jason Andersen, VP and principal analyst with Moor Insights & Strategy. “The important distinction is NeMo Gym decouples the RL environment from the training itself, so it can easily set up and create multiple training instances (or ‘gyms’).”
Mayham called this “unprecedented openness” the real differentiator of the Nemotron 3 release. “No major competitor offers that level of completeness,” he said. “For enterprises, this means full control over customization, on premises deployment, and cost optimization that closed providers simply can’t match.”
But there is a tradeoff in capability, Mayham pointed out: Claude and GPT-4o still outperform Nemotron 3 on specialized tasks like coding benchmarks. However, Nemotron 3 seems to be targeting a different buyer: Enterprises that need deployment flexibility and don’t want vendor lock-in.
“The value proposition for enterprises isn’t raw capability, it’s the combination of open weights, training data, deployment flexibility, and Nvidia ecosystem integration that closed providers can’t match,” he said.
More Nvidia news:
- HPE loads up AI networking portfolio, strengthens Nvidia, AMD partnerships
- Nvidia’s $2B Synopsys stake tests independence of open AI interconnect standard
- Nvidia chips sold out? Cut back on AI plans, or look elsewhere
- Nvidia’s first exascale system is the 4th fastest supercomputer in the world
- Nvidia highlights considerable science-based supercomputing efforts
- Nvidia’s first exascale system is the 4th fastest supercomputer in the world
- Nvidia touts next-gen quantum computing interconnects
- Nvidia highlights considerable science-based supercomputing efforts
- Next-generation HPE supercomputer offers a mix of Nvidia and AMD silicon
- Cisco, Nvidia strengthen AI ties with new data center switch, reference architectures
InfoWorld’s 2025 Technology of the Year Award winners 15 Dec 2025, 9:00 am
InfoWorld celebrates the year’s best products
From AI-powered coding assistants to real-time analytics engines, the software stack is undergoing its biggest shakeup in decades. Generative AI (genAI) and agentic AI tools are redefining how code is written, tested, and deployed — even as experts debate the true productivity gains they facilitate. Data management is converging around unified lakehouse architectures, the Apache Iceberg table format, and streaming technologies such as Apache Kafka, bridging the gap between raw data and actionable insight. On virtually every front, from application programming interface (API) development to cloud security, new platforms promise automated intelligence and tighter governance, signaling a new era in which innovation and control must evolve in tandem.
Examples of this innovative era are prominent among the 99 finalists and 35 winners of InfoWorld’s 2025 Technology of the Year Awards.
The InfoWorld Technology of the Year Awards recognize the best and most innovative products in AI, APIs, applications, business intelligence (BI), cloud, data management, devops, and software development. Read on to meet our finalists and winners.
Award categories
AI
- AI and machine learning: Applications
- AI and machine learning: Governance
- AI and machine learning: Infrastructure
- AI and machine learning: MLOps
- AI and machine learning: Models
- AI and machine learning: Platforms
- AI and machine learning: Security
- AI and machine learning: Tools
APIs
- API development
- API management
- API security
Applications
- Application management
- Application networking
- Application security
Business intelligence
- Business intelligence and analytics
Cloud
- Cloud backup and disaster recovery
- Cloud compliance and governance
- Cloud cost management
- Cloud security
Data management
- Data management: Databases
- Data management: Governance
- Data management: Integration
- Data management: Pipelines
- Data management: Security
- Data management: Streaming
Devops
- Devops: Analytics
- Devops: Automation
- Devops: CI/CD
- Devops: Code quality
- Devops: Observability
- Devops: Productivity
Software development
- Software development: Platforms
- Software development: Security
- Software development: Testing
- Software development: Tools
AI and machine learning: Applications
Winner
- Mirror, Whatfix
Finalists
- Solve(X), GoExceed
- HP AI Companion, HP
- Fascia PROMIS, PROLIM
- Mirror, Whatfix
From the winner
Mirror is a genAI simulation training platform that empowers teams with hands-on experience in safe, immersive, and hyper-realistic environments. Designed for companies that need to scale training across tools, workflows, and customer-facing interactions, Mirror combines interactive application simulations with AI-driven conversational role play to deliver a complete training experience. Employees can practice real-life scenarios, navigate systems, respond to simulated conversations, and make decisions, all without the risk of live system exposure. Mirror addresses key limitations of traditional simulation training that slow down learning and reduce its real-world impact.
From the judges
“Really interesting use case and tech/AI implementation. What makes Mirror truly innovative is its ability to replicate any web application without the need for costly, fragile sandbox environments.”
AI and machine learning: Governance
Winner
- AI Policy Suite, Pacific AI
Finalists
- AI Policy Suite, Pacific AI
- CTGT Platform, CTGT
From the winner
Pacific AI’s AI Policy Suite is a free, comprehensive, and continuously updated framework designed to simplify AI compliance. It translates complex legal and regulatory requirements — spanning 150+ domestic and international AI-related laws, regulations, and standards, including the EU AI Act, as well as frameworks such as NIST and ISO — into clear, actionable policies. Organizations gain access to a single, centralized policy suite that deduplicates overlapping rules, streamlines governance, and lowers compliance overhead. The suite now includes an AI Incident Reporting Policy to help companies align with more than 100 U.S. laws and industry standards and manage operational and regulatory risk.
From the judges
“Pacific AI’s AI Policy Suite is highly relevant and should have a big impact on small businesses to help them navigate legal challenges.”
AI and machine learning: Infrastructure
Winner
- Cloudera AI Inference, Cloudera
Finalists
- Compute Orchestration, Clarifai
- Cloudera AI Inference, Cloudera
- Inworld Runtime, Inworld AI
- AIStor, MinIO
From the winner
Cloudera AI Inference service, accelerated by NVIDIA, is one of the industry’s first AI inference services to provide embedded NVIDIA NIM [NVIDIA Inference Microservices] microservice capabilities. With the latest update, Cloudera brings its ability to streamline the deployment and management of large-scale AI models to the data center, behind an organization’s firewall, for maximum security. Features like auto-scaling, canary rollouts, and real-time performance tracking ensure resilient, efficient operations. By uniting performance acceleration, security, and governance in a single solution, Cloudera AI Inference enables enterprises to deploy trusted AI solutions quickly and confidently.
From the judges
“Cloudera’s AI Inference service is technically strong, built on NVIDIA GPU acceleration and deployment options that offer great service along with flexibility. What particularly stands out is governance and security, which is critical for enterprise adoption. Overall, it combines solid engineering with practical business impact.”
AI and machine learning: MLOps
Winner
- JFrog ML, JFrog
Finalists
- JFrog ML, JFrog
- Runloop Platform, Runloop
From the winner
JFrog ML is an enterprise-grade MLOps solution integrated into the JFrog Software Supply Chain Platform, designed to streamline the development, deployment, and security of machine learning [ML] models alongside traditional software components. It equips data scientists, ML engineers, and AI developers with an end-to-end system to build, train, secure, deploy, manage, and monitor both classic ML models and genAI/LLM [large language model] workflows — all within a single trusted interface.
With seamless integration across AWS, GCP, and hybrid clouds as well as an out-of-the-box feature store that supports LLMOps [large language model operations], prompt management, batch and real-time deployment, JFrog ML shortens time to production while ensuring compliance and reducing tool chain complexity.
From the judges
“JFrog ML is a strong technical product as it offers a complete platform for managing the ML life cycle with amazing features like model registry, feature store, deployment, and security built in. Overall, this product is technically complete, enterprise-ready, and well positioned to drive adoption at scale.”
AI and machine learning: Models
Winner
- Medical LLMs, John Snow Labs
Finalists
- Medical LLMs, John Snow Labs
- voyage-context-3, Voyage AI by MongoDB
From the winner
John Snow Labs has developed a suite of Medical LLMs purpose-built for clinical, biomedical, and life sciences applications. John Snow Labs’ models are designed to deliver best-in-class performance across a wide range of medical tasks — from clinical reasoning and diagnostics to medical research comprehension and genetic analysis. The software has been validated by peer-reviewed papers to deliver state-of-the-art accuracy on a variety of medical language understanding tasks and is designed to meet the security and compliance needs unique to the healthcare industry. The LLMs’ modular architecture and plug-and-play design allow seamless integration across healthcare systems, providers, payers, and pharmaceutical environments.
From the judges
“John Snow Labs Medical LLMs are advanced domain-specific models with large context windows, multimodal capabilities and benchmark results that show strong performance. They address privacy and compliance requirements, which is critical. The technology seems robust, and the focus on domain specialization is highly innovative.”
AI and machine learning: Platforms
Winner
- Eureka AI Platform, SymphonyAI
Finalists
- Airia Platform, Airia
- Cognite Atlas AI, Cognite
- Generate Enterprise, Iterate.ai
- Eureka AI Platform, SymphonyAI
From the winner
Eureka AI is SymphonyAI’s vertical-first enterprise AI platform, purpose-built for Retail, Financial Services, Industrial, and Enterprise IT. Rather than offering a generic toolkit, Eureka AI comes pretrained with industry-specific models, knowledge graphs, and workflows — drawn from decades of domain expertise — so customers realize measurable ROI from day 1.
The platform powers specialized applications such as CINDE for retail analytics, Sensa AI for financial crime prevention, IRIS Foundry for manufacturing optimization, and APEX for IT operations automation. These applications share a common, secure core, enabling innovations developed in one vertical to rapidly benefit others.
From the judges
“The Eureka AI Platform brings AI to many companies, because it is tailored to different vertical industries. This is a highly practical approach to AI in the enterprise that will deliver significant value by saving implementation time.”
AI and machine learning: Security
Winner
- Vibe Coding Security, Backslash Security
Finalists
- Vibe Coding Security, Backslash Security
- AI Gatekeeper, Operant AI
- Pangea AI Detection and Response, Pangea
From the winner
Vibe coding and AI-assisted software development are being adopted at breakneck speeds, creating significant these risks in the software supply chain and IT environments of many organizations. Backslash provides a comprehensive solution for new risks, addressing three key areas: visibility into the use of AI and vibe coding tools by developers; governance and security of the stack used for vibe coding; and the security of the code created using AI, ensuring prompts given for code generation include the right instructions to avoid creating vulnerabilities and exposures.
Backslash combines its App Graph technology, which maps all connections and dependencies within the application, with purpose-built IDE [integrated development environment] extensions, MCP [Model Context Protocol] server, and gateway to provide comprehensive coverage for the AI coding infrastructure and AI code-generation process.
From the judges
“Vibe Coding Security is differentiated and uniquely placed. Ensuring the focus remains on vulnerability identification and mitigation from the start of development is a great enabler, drastically reducing the pain on the software development life cycle using vibe coding.”
AI and machine learning: Tools
Winner
- Bloomfire Platform, Bloomfire
Finalists
- neuralSPOT, Ambiq Micro
- Bloomfire Platform, Bloomfire
From the winner
Bloomfire is an AI-powered knowledge platform that turns scattered files, chats, and tacit know-how into a governed “truth layer” your teams can trust. Ask AI delivers plain-language answers with clickable citations, so people can verify the source in a second. Our self-healing knowledge base continuously detects redun- dant, outdated, or trivial content and auto-routes refresh or archival, keeping your RAG [retrieval-augmented generation] inputs clean and your answers current. For IT and data leaders, Bloomfire operationalizes trustworthy retrieval, governance, and measurement [with] role-based permissions, audit trails, SOC 2 Type II security, usage analytics to expose gaps, and automated prompts that enlist subject matter experts to close them.
From the judges
“Bloomfire takes a fresh approach to knowledge management and search, combining the power of AI and extensive integrations to inject relevant, up-to-date information, including citations, into enterprise workflows.”
API development
Winner
- Postman API Platform, Postman
Finalist
- Postman API Platform, Postman
From the winner
Postman is a collaborative end-to-end platform for build ing and managing APIs. Its foundation is the collection — a structured container of API requests, test scripts, and documentation that developers can version, chain, and automate. Collections run within workspaces, which enable real-time collaboration across teams and external partners.
Postman’s Agent Mode, an AI assistant embedded in the platform, enables developers to describe what they want in natural language, and then Agent Mode turns that into tests, documentation, monitors, and more. By supporting both developers and autonomous agents, Postman delivers unmatched visibility, governance, and scale.
From the judges
“Postman API platform is a well-designed, very comprehensive solution for API development, testing, operationalization, and governance, with end-to-end functionality that is unrivaled.”
API management
Winner
- Kong Konnect, Kong
Finalists
- Kong Konnect, Kong
- Swagger, SmartBear
From the winner
Kong Konnect is a unified platform that enables organizations to securely build, run, discover, and govern APIs, AI workflows, and event streams. It uses a global control plane with distributed runtimes, such as Kong API Gateway, that can be deployed in any environment.
Users define policies, services, and governance rules in the Konnect platform, which can then be applied globally or selectively across all connected runtimes. Kong Konnect has specialized runtimes for different use cases such as the API Gateway for proxying traditional APIs such as REST and SOAP. With Konnect, customers see [shorter] time-to- market, stronger security, and reduced costs through platform consolidation.
From the judges
“Kong Konnect goes beyond traditional API management to include AI/event scenarios. A single control plane to control APIs, microservices, events, and AI is very valuable.”
API security
Winner
- Harness Cloud Web Application and API Protection, Harness
Finalist
- Harness Cloud Web Application and API Protection, Harness
From the winner
With Harness’s Cloud Web Application and API Protection (WAAP), enterprises get end-to-end API security without slowing development. It eliminates blind spots, stops advanced attacks in real time, and plugs directly into CI/CD [continuous integration and continuous delivery] so security becomes part of delivery.
Cloud WAAP continuously discovers and maps every API (including shadow and third-party); analyzes live traffic and data sensitivity to assess risk; and actively protects against fraud, attacks, abuse, and DDoS [distributed denial of services] threats. One platform unifies discovery, testing, and runtime defense, giving teams a real-time searchable view of their API estate and the controls to act.
From the judges
“Harness Cloud Web Application and API Protection is an innovative solution that offers a unified strategy to secure apps and APIs, dramatically reducing the effort/complexity of tasks including context-aware, behavior-based detection. Very relevant for all types of enterprises.”
Application management
Winner
- Omni, Sidero Labs
Finalists
- Komodor Platform, Komodor
- Omnissa App Volumes, Omnissa
- Omni, Sidero Labs
From the winner
Omni is a Kubernetes operations platform that brings SaaS [software as a service] simplicity to devops and infrastructure teams managing clusters across bare metal, cloud, and edge environments. Built on the hardened, immutable Talos Linux OS, Omni eliminates SSH [Secure Shell], configuration drift, and manual toil by delivering a centralized, declarative control plane that operates anywhere Kubernetes can run. Teams can create, scale, upgrade, and secure clusters with one click while maintaining full control over infrastructure and identity. Omni blends the reliability of a cloud-native Kubernetes stack with the flexibility of “bring your own infrastructure,” creating a radically simplified and portable alternative to legacy managed Kubernetes platforms.
From the judges
“Omni is transforming Kubernetes operations by combining infrastructure-agnostic management with a secure, declarative approach. Designed for modern devops teams, Omni has built-in security and reduces operational burden and manual toil. Proven impact in production and a standout alternative to legacy tools.”
Application networking
Winner
- noBGP, noBGP
Finalists
- noBGP, noBGP
- Ambient Mesh, Solo.io
- Tailscale, Tailscale
From the winner
noBGP is a cloud networking platform that eliminates one of the internet’s biggest sources of complexity and risk: the Border Gateway Protocol (BGP). Built for cloud-native, hybrid, and AI environments, noBGP replaces BGP with private routing that is automated, secure, and simple to deploy. With a noBGP router, enterprises can instantly connect cloud resources across AWS, Azure, GCP, Oracle, and on-prem environments, without public IPs, VPNs [virtual private networks], or manual routing tables.
This means faster deployment, reduced attack surfaces, and dramatically simplified operations. Traffic is encrypted end- to-end and zero trust is enforced by default. Devops teams get seamless cloud connectivity, while security teams gain a hardened infrastructure that removes entire categories of network threats.
From the judges
“An innovative solution to replace legacy services and protocols for hybrid and multi-cloud environments, especially for organizations looking to implement zero-trust architectures.”
Application security
Winner
Application Security Posture Management Platform, Legit Security
Finalists
- Apiiro Agentic Application Security Platform, Apiiro
- Application Security Posture Management Platform, Legit Security
- Oso Cloud, Oso Security
From the winner
The Legit Application Security Posture Management (ASPM) platform offers comprehensive visibility and risk management across the software development life cycle. This includes coverage of everything from source code repositories to runtime environments and cloud infrastructure. The platform integrates with a wide range of tools, including AST solutions, cloud security platforms, version control systems, artifact registries, identity providers, and API security tools.
AI is applied throughout the platform. Code-to-cloud correlation is supported through AI-driven analysis. Legit sits alongside coding assistants to keep code secure while developers write it and gives teams visibility into where AI is generating code.
From the judges
“It is good to see a product that is not just flagging risks but also zeroes in on what to fix first and how to fix it fast. This feature separates the solution from its competition.”
Business intelligence and analytics
Winner
- Plotly Dash Enterprise, Plotly
Finalists
- FICO Platform, FICO
- Plotly Dash Enterprise, Plotly
- Spotter, ThoughtSpot
From the winner
Dash Enterprise (DE) is an enterprise platform for creating customizable, interactive data applications in Python. Domain experts surface insights and take action through AI-powered development that transforms Python workflows into production apps instantly.
Python-native development builds on existing data science stacks while delivering true customization that creates exactly what stakeholders need, not generic dashboards. Enterprise controls provide built-in security, compliance, and governance, while self-service capabilities eliminate IT bottlenecks. Interactive design enables stakeholders to explore data rather than consume static reports. The transformation shifts teams from analytics support to strategic enablement.
From the judges
“Plotly Dash Enterprise stands out by turning Python workflows into secure, governed, interactive read/write data apps while retaining dev control. The platform is also able to provide contextual intelligence with domain-aware suggestions (across finance, healthcare, and telecom sectors).”
Cloud backup and disaster recovery
Winner
- United Private Cloud, UnitedLayer
Finalists
- Cayosoft Guardian Forest Recovery, Cayosoft
- CloudCasa, CloudCasa by Catalogic Software
- United Private Cloud, UnitedLayer
From the winner
United Private Cloud delivers cloud backup and disaster recovery through a layered, intelligent architecture. It integrates business continuity with high-availability architecture, performance, intelligence, hybrid colocation, and compliance for mission-critical workloads.
Workloads and data are continuously protected via four DR [disaster recovery] strategies. Real-time replication ensures that data is always protected in Tier 3+ data centers across 30+ private cloud regions and 175+ edge sites on five continents. Data is encrypted in transit and at rest.
UnitedLayer’s approach ensures rapid recovery, regulatory adherence, and seamless scaling — backed by repeated industry recognition and trusted by global enterprises for business continuity.
From the judges
“United Private Cloud guarantees zero downtime and minimal data loss through 99.999% high availability and real-time replication. This translates to measurable business value. A proactive, autonomous approach redefines disaster recovery by enabling predictive failure detection and instant restoration at scale.”
Cloud compliance & governance
Winner
- Secureframe, Secureframe
Finalists
- Kion Platform, Kion
- Secureframe, Secureframe
From the winner
Secureframe is a comprehensive, AI-powered platform that helps organizations meet security and compliance requirements more efficiently and effectively. With out-of-the-box support for 40+ frameworks, Secureframe helps organizations streamline audits, reduce manual effort, and improve visibility into their security posture.
The Secureframe platform identifies gaps, collects evidence, generates tailored policies, completes risk assessments and access reviews, and monitors compliance progress in real time. Built-in automation handles control testing, validates evidence, recommends remediations, and accelerates questionnaire responses with the help of our proprietary Comply AI engine. Customers report a 26% average reduction in annual compliance costs and complete audits up to 90% faster.
From the judges
“Secureframe applies contextual AI and 300+ integrations to automate evidence validation, remediation, and monitoring across hybrid environments, turning compliance into an intelligent, end-to-end technology solution. Breakthrough capabilities like AI Evidence Validation, Workspaces, and Custom Integrations redefine compliance automation, outpacing competitors with foundational shifts rather than incremental features.”
Cloud cost management
Winner
- Tangoe One Cloud, Tangoe
Finalists
- Stacklet Jun0, Stacklet
- Tangoe One Cloud, Tangoe
- UnityOne AI, UnityOne AI
From the winner
Tangoe One Cloud is an enterprise-grade FinOps [financial operations] platform that unifies expense and asset management across public and private IaaS [infrastructure as a service], SaaS, and UCaaS [unified communications as a service] environments. Designed to align IT, finance, and procurement, Tangoe One Cloud centralizes governance, chargebacks, anomaly detection, and multi-cloud transparency into one scalable platform.
Tangoe’s “Cloud Optimizer” applies machine learning, predictive models, and historical usage data to identify savings opportunities and recommend the most cost-effective cloud infrastructure options. Automated workflows drive cost allocation, charge-backs, and remediation actions, eliminating manual reporting cycles. The platform also delivers real-time anomaly alerts and AI workload governance through a new AI Cost Visibility Dashboard.
From the judges
“This AI-driven, multi-cloud architecture is a highly scalable and technically robust FinOps solution. Tangoe One Cloud’s capabilities set it apart from competitors by enabling real-time governance of AI/ML spending, automated chargebacks, and actionable insights across diverse enterprise infrastructures.”
Cloud security
Winner
- Cortex Cloud, Palo Alto Networks
Finalists
- Aviatrix Cloud Native Security Fabric, Aviatrix
- Cortex Cloud, Palo Alto Networks
- Ivanti Neurons Platform, Ivanti
From the winner
Cortex Cloud rearchitected Palo Alto Networks’ cloud-native application protection platform (CNAPP) on the AI-driven Cortex SecOps platform to deliver a unified user experience with persona-driven dashboards and workflows.
Cortex Cloud identifies and prioritizes issues across the application development pipeline. The platform improves multi-cloud risk management with AI-powered prioritization, guided fixes, and automated remediation. Cortex Cloud natively integrates the unified Cortex XDR agent, enriched with additional cloud data sources, to prevent threats with advanced analytics. Cortex Cloud natively integrates cloud data, context, and workflows within Cortex XSIAM to significantly reduce the mean time to respond (MTTR) to modern threats with a single, unified secops solution.
From the judges
“Cortex Cloud is one of the strongest offerings in cloud security, because it unifies cloud detection and response with CNAPP capabilities on a single AI-driven platform, giving teams real-time, context-rich protection across code, runtime, and cloud environments. By eliminating silos, reducing MTTR, and continuously learning from incidents, Cortex Cloud sets a new standard for enterprise-to-cloud protection.”
Data management: Databases
Winner
- Qdrant, Qdrant
Finalists
- Couchbase Capella, Couchbase
- EDB PostgresAI, EnterpriseDB
- Percona Everest, Percona
- Qdrant, Qdrant
From the winner
Qdrant is an open-source, dedicated vector search engine built for an era where over 90% of enterprise data is unstructured. It enables developers to build and support production-grade AI retrieval and vector search across any scale, modality, or deployment. Purpose-built in Rust for unmatched speed, memory safety, and scale, Qdrant delivers up to 40x faster retrieval and >4x higher throughput.
Qdrant is more than vector search: It gives AI agents grounded retrieval “memory” to plan, use tools, and act in real time. With Cloud Inference, hybrid search, and flexible reranking, teams build agentic workflows that stay relevant, responsive, and cost-efficient.
From the judges
“Qdrant is a powerful, technically advanced product that stands out in the highly competitive vector database space. Its deep support for agentic AI, multimodal search, and enterprise-scale deployments make it a foundational tool for the AI era.”
Data management: Governance
Winner
- Actian Data Intelligence Platform, Actian
Finalists
- Actian Data Intelligence Platform, Actian
- Pentaho Platform, Pentaho
- Transcend, Transcend
From the winner
The Actian Data Intelligence Platform centralizes all enterprise meta-data into a single source of truth, enabling effective discovery and utilization of data assets while ensuring regulatory compliance (GDPR, CCPA) and security controls. The cloud-native platform provides a searchable catalog of all data assets with automated metadata collection to ensure proper data documentation and context. Actian’s “governance by design” approach embeds governance into the DNA of every data interaction.
The platform’s innovative knowledge graph delivers deeper, more relevant search results by understanding complex relationships between concepts and entities, enabling faster data discovery, reduced regulatory compliance risk, and confident AI implementation.
From the judges
“The Actian Data Intelligence Platform addresses some of the most critical and persistent challenges in data governance with a highly modern, forward-looking approach. Actian’s platform is well architected and addresses real organizational pain points, including inconsistent data, a lack of trust in analytics, and growing compliance burdens.”
Data management: Integration
Winner
- SnapLogic Agentic Integration Platform, SnapLogic
Finalists
- Rocket DataEdge, Rocket Software
- SnapLogic Agentic Integration Platform, SnapLogic
- Merge, Merge
From the winner
SnapLogic is an all-in-one platform to create, integrate, and orchestrate data products, apps, APIs, and AI agents. Using a low-code, “click-not-code” interface and built-in genAI, SnapLogic empowers both IT and business users to automate workflows, transform data, and create intelligent agents that streamline operations — no matter where the data resides.
With natural-language prompts, anyone across the enterprise — from HR to Finance to Legal — can build IT-approved automations that eliminate repetitive tasks and deliver real business value. For example, legal teams can create IDP [intelligent document processing] agents to redline contracts, while finance teams can automate fraud detection workflows.
From the judges
“By embedding AI-powered agent creation directly into its integration fabric, SnapLogic transforms how enterprises operationalize genAI. With tools like Prompt Composer and Agent Visualizer, it delivers secure, enterprise-grade innovation well beyond traditional data movement platforms.”
Data management: Pipelines
Winner
- Prophecy, Prophecy
Finalists
- Airbyte Open Source, Airbyte Cloud, Airbyte Enterprise, Airbyte Enterprise Flex, Airbyte
- DataPelago Accelerator for Spark, DataPelago
- Prophecy, Prophecy
From the winner
Prophecy uses AI agents, a visual canvas, and automatic code generation to help analysts and business experts of all skill levels access the data they need, whenever they need it. An LLM AI assistant for data pipeline creation is powered by a knowledge graph of data sets, schemas, models, and pipelines. The Visual Canvas enables anyone who needs data to build and refine pipelines, using drag-and-drop components for extraction, transformation, enrichment, aggregation, and loading.
Prophecy identifies errors and suggests how to fix them. Visual pipelines are automatically compiled into production-ready code with full Git versioning, documentation, CI/CD support, and lineage tracking.
From the judges
“Prophecy delivers a rare combination of technical depth, user-centric design, and enterprise-grade controls. Its intelligent use of AI assistants, visual editing, and code compilation makes it uniquely powerful, particularly in large organizations seeking secure, self-service analytics at scale. It transforms how data engineering is done.”
Data management: Security
Winner
- QuProtect, QuSecure
Finalists
- Bedrock Platform, Bedrock Data
- QuProtect, QuSecure
- Sentra Data Security Platform, Sentra
From the winner
QuSecure’s QuProtect platform is a comprehensive suite of post-quantum cryptographic solutions designed to safeguard data across various platforms and applications. It includes quantum-resistant algorithms, cryptographic agility, key management systems, and secure communication protocols. QuProtect enables visibility into cryptography in use — a new level of insight.
QuProtect provides centralized control of quantum-resilient cryptography, which is ideal for managing and updating cryptographic protocols across the network. Post-quantum algorithms can be implemented and updated centrally, ensuring uniform security policies. QuProtect empowers security leaders with comprehensive visibility, adaptive cryptographic controls, and orchestrated protection to safeguard data against both traditional and emerging threats.
From the judges
“This is one of the most innovative products I have ever seen that offers quantum-resilient protection. It offers security without changing existing systems, which is a great feature to have. I can see this driving a lot of innovation in the cost-effective, post-quantum security of applications for enterprises.”
Data management: Streaming
Winner
Confluent Cloud, Confluent
Finalists
- Confluent Cloud, Confluent
- Hydrolix, Hydrolix
- Lenses, Lenses.io
From the winner
Confluent Cloud offers a fully managed cloud-native data streaming platform, built and operated by the creators of Apache Kafka. This platform empowers teams to easily connect, process, and govern real-time data without the operational burden associated with open source solutions. The result is [shorter] time to insight, lower infrastructure costs, and the ability to build data-driven applications at scale.
On top of trusted features like Confluent Cloud for Apache Flink and Stream Governance, Confluent has introduced capabilities over the past year to unify streaming and batch data processing on a single serverless platform. Confluent Cloud features battle-tested security, compliance, and 99.99% uptime.
From the judges
“Confluent Cloud extends Kafka and Flink into a managed serverless platform that unifies streaming and batch, simplifying real-time data use for AI and analytics. Its innovations — Tableflow, Flink Native Inference, and Flink Search — move streaming closer to enterprise AI workflows, while strong uptime and governance features deliver reliability at scale.”
Devops: Analytics
Winner
- Azul Intelligence Cloud, Azul
Finalist
- Azul Intelligence Cloud, Azul
From the winner
Azul Intelligence Cloud provides actionable intelligence from production Java runtime data that dramatically boosts devops productivity. It supports any OpenJDK-based JVM (Java Virtual Machine) from any vendor or distribution. It consists of three services: Azul Code Inventory, the only solution that precisely catalogs what code runs in production across all Java workloads to accurately identify unused and dead code for removal; Azul JVM Inventory, which uses actionable intelligence available only at run time to continuously catalog running JVMs to help ensure ongoing Oracle license compliance; and Azul Vulnerability Detection, which uses Java class-level production runtime data to detect known security vulnerabilities.
From the judges
“Azul Intelligence Cloud delivers strong impact for Java-heavy organizations. Its class-level runtime approach is more precise than traditional SCA [software composition analysis] tools and yields measurable productivity gains. Overall, a differentiated and timely solution with high relevance for devops productivity.”
Devops: Automation
Winner
- Chef, Progress
Finalists
- Ciroos AI SRE Teammate, Ciroos
- Nutanix Database Service, Nutanix
- Chef, Progress
From the winner
Progress Chef addresses the full spectrum of devops and devsecops, using a single “as code” framework to configure, deploy, and manage virtually any asset on any cloud to any edge, including support for any infrastructure or application, including cloud-native assets like Kubernetes and public cloud services.
Using this framework, the Chef portfolio can help organizations streamline their continuous compliance posture and secure infrastructure support on-premises or in the cloud. Our Policy as Code approach brings configuration management, application delivery, security policy enforcement, and compliance into a single step, eliminating the security silo and moving everyone into a shared pipeline and framework.
From the judges
“Chef has earned its place at the forefront of the devops and devsecops fields. It brings innovative automation, robust security features, and scalability that address the challenges faced by modern organizations, especially in hybrid and multi-cloud environments.”
Devops: CI/CD
Winner
- Buildkite Platform, Buildkite
Finalists
- Buildkite Platform, Buildkite
- CircleCI, CircleCI
- CloudBees Unify, CloudBees
- Harness CI/CD, Harness
From the winner
As the volume, speed, and unpredictability of code creation accelerate with LLM-assisted development, traditional CI/CD pipelines can no longer keep up. Buildkite’s Model Context Protocol (MCP) Server enables real-time, adaptive automation, reducing build times by allowing models to optimize execution order and parallelism while supporting real-time adaptation, failure recovery, dynamic resource allocation, and context-driven decision-making.
Unlike other MCP implementations that wrap APIs or provide prebuild insights, Buildkite’s MCP Server operates inside the pipeline context, turning scripted workflows into dynamic, self-optimizing systems. It acts as a conversational protocol between LLMs and the Buildkite platform, using more than a decade of operational context.
From the judges
“Buildkite brings cutting-edge technology to CI/CD, drives significant business benefits for organizations, and demonstrates true innovation in the space. It’s a powerful tool that not only addresses current challenges but is well positioned to define the next wave of software delivery as AI becomes an increasingly integral part of development workflows.”
Devops: Code quality
Winner
- SonarQube, Sonar
Finalists
- Graphite, Graphite
- Moderne, Moderne
- SmartBear AI, SmartBear
- SonarQube, Sonar
From the winner
SonarQube is an automated code review platform that helps developers deliver high-quality, secure code. It integrates into the CI/CD pipeline on devops platforms for automated, continuous code inspections. Deployable on-prem or in the cloud, it scans repositories for bugs, vulnerabilities, and code quality issues. Seamless integration into IDEs, via SonarQube for IDE, ensures quality and security at the start of development.
SonarQube’s quality gate alerts in real time when there’s something to fix or review in changed or added code, offering real-time guidance. The AI CodeFix feature takes this a step further, leveraging LLMs to automatically generate AI-driven code fixes for discovered issues.
From the judges
“SonarQube’s developer-first approach integrates seamlessly into CI/CD and IDEs, ensuring issues are caught early — before they become costly. The AI CodeFix feature goes beyond detection, offering intelligent one-click fixes. With industry-wide adoption and proven ROI, SonarQube not only supports but elevates modern software development in the age of AI-assisted coding.”
Devops: Observability
Winner
- Observo AI, Observo AI
Finalists
- DataBahn, DataBahn
- Grafana Cloud, Grafana Labs
- Honeycomb.io, Honeycomb
- Observo AI, Observo AI
From the winner
Observo AI is an AI-powered data pipeline that transforms how enterprises manage security and observability. It analyzes telemetry data from security and devops tools, using machine learning and agentic AI to flag anomalies; detect privacy risks; perform data transformations; and strip irrelevant data from cloud flow, firewall, operating system, CDN [content delivery network], and application logs.
What sets Observo AI apart is its flexible, non-scripted automation. Instead of asking users to manually write and maintain thousands of inflexible rules, Observo’s ML models automatically detect schemas, summarize normal events, and enrich data with contextual intelligence.
From the judges
“Observo AI addresses one of the hardest problems in security by moving beyond brittle rule-based pipelines to adaptive machine learning and agentic automation. It redefines how telemetry pipelines are built and managed, putting it ahead of competitors that are still bound by static rules.”
Devops: Productivity
Winner
- Flow, Appfire
Finalists
- Flow, Appfire
- Cortex Internal Developer Portal, Cortex
- Harness Internal Developer Portal, Harness
From the winner
Appfire Flow is redefining how modern engineering organizations elevate code quality and team performance. As a core component of Appfire’s Software Engineering Intelligence (SEI) platform, Flow helps teams move beyond intuition and manual analysis by delivering real-time insights across the development life cycle. By unifying Git and ticket data from tools like GitHub, GitLab, and Jira, Flow gives teams a clear, objective understanding of where bottlenecks occur, how code review processes are functioning, and how collaboration impacts the quality and veloci- ty of software delivery.
By translating complex engineering data into clear, research-backed insights, Flow makes software delivery understandable and actionable for everyone, from engineers to product leaders to executives.
From the judges
“Flow bridges the gap between engineering work and business outcomes. Its combination of developer empathy, innovation, and actionable insights makes it stand out from legacy tools and rigid DORA [DevOps Research and Assessment]–only dashboards.”
Software development: Platforms
Winner
- Azul Platform Prime, Azul
Finalists
- Azul Platform Prime, Azul
- Uno Platform, Uno Platform
- 1NCE OS, 1NCE
From the winner
Azul Platform Prime is a high-performance Java platform that provides superior speed, startup and warmup, and consistency vs. other OpenJDK distributions to increase responsiveness, reduce cloud compute costs by 20%+, and boost operational efficiency — all without recompiling or changing application code.
With Platform Prime, companies can reduce cloud waste and improve application performance. It works across the most popular Java long-term releases: Java 8, 11, 17, and 21+. It is ideal for demanding, business-critical applications, including those used for massive data sets in distributed data processing.
From the judges
“Runtime-level improvements without application changes are compelling and reduce adoption friction. The cost-savings angle is strong: fewer servers, higher CPU thresholds, and better autoscaling leads directly to opex reduction in cloud spend.”
Software development: Security
Winner
- Chainguard Containers, Chainguard
Finalists
- Chainguard Containers, Chainguard
- SonarQube Advanced Security, Sonar
From the winner
Chainguard Containers is a curated catalog of over 1,800 minimal, vulnerability-free container images that have a reduced attack surface, broad customization capabilities, and improved supply chain integrity for containerized applications. By providing trusted open source software (OSS), built from source and updated continuously, Chainguard helps organizations eliminate threats in their software supply chains.
Chainguard Containers reduces the cost of engineering toil that comes with patching software and strengthens an organization’s security posture by ensuring [that the organization always has] the latest version of the software in production.
From the judges
“Chainguard Containers represents a step-change in how organizations consume open source software by moving from reactive vulnerability scanning to proactively delivering zero-CVE [common vulnerabilities and exposures] images built and maintained continuously. This product is a redefinition of supply chain security, setting a new benchmark for the industry.”
Software development: Testing
Winner
- CloudBees Smart Tests, CloudBees
Finalists
- CloudBees Smart Tests, CloudBees
- Sauce Labs Platform for Test, Sauce Labs
- Harness AI Test Automation, Harness
From the winner
CloudBees Smart Tests is an AI-powered intelligent testing solution built to support enterprise dev-test workloads. It reduces cycle times, improves triage accuracy, and enhances visibility into test behavior across teams.
CloudBees Smart Tests enables faster dev-test iteration with an analytics engine that flags flaky, long-running, and reliable tests, giving both engineers and leaders a shared view of test performance. Predictive Test Selection (PTS) finds failures 40% to 90% faster by running only the most relevant tests. It also offers accelerated test failure resolution through smart classification, pattern detection, and unified session insights, plus automated alerts that keep code owners informed and engaged.
From the judges
“CloudBees Smart Tests reimagines enterprise testing by applying ML to prioritize only the tests that matter, eliminating waste and accelerating feedback loops. Its innovation lies in combining predictive selection with easy integration, turning testing from a bottleneck into a driver of speed, quality, and confidence.”
Software development: Tools
Winner
- Tabnine, Tabnine
Finalists
- Progress Telerik & Progress Kendo UI, Progress Software
- Tabnine, Tabnine
- Warp Agentic Development Environment, Warp
From the winner
Tabnine is the only AI coding assistant purpose-built for enterprise teams with complex codebases, mixed tech stacks, and strict security and compliance requirements. Whether deployed as SaaS, VPC [virtual private cloud], on-prem, or fully air-gapped, Tabnine provides full control over the environment — no outbound connections, no telemetry, no silent updates.
With deep integrations across Git and other SCMs [source code managers], IDEs, and tools like Jira and Confluence, Tabnine understands the entire organizational context, helping teams write, test, document, review, and maintain code faster and more consistently while following teams’ internal standards and compliance rules.
From the judges
“Tabnine is technically strong, enterprise-focused, and full of meaningful differentiators. Its commitment to secure, offline, and customized AI coding assistance gives it an edge in regulated industries. The product clearly aligns with market needs and shows a credible track record of impact.”
About our judges
Ashutosh Datar is a seasoned technology leader specializing in distributed systems, scalable API infrastructure, and intelligent storage. With more than 20 years of experience at companies such as Pure Storage, Hewlett Packard Enterprise, and Nimble Storage, he has led the design of next-generation systems that power large-scale storage and data platforms. At Pure Storage, Datar plays a key role in advancing the Fusion platform — an intelligent, policy-driven infrastructure that unifies storage management across heterogeneous systems.
Anshul Gandhi is an AI and product leader with a track record of building 0→1 systems and scaling 1→N AI platforms across enterprise, consumer, and go-to-market domains. His work bridges cutting-edge research and real-world applications, translating AI innovation into products that deliver exceptional user experiences and measurable business impact. He has led AI strategy and platform initiatives across sectors including healthcare, SaaS, manufacturing, and cybersecurity and holds multiple patents in applied AI.
Sahil Gandhi is a senior data scientist at Amazon and a product-minded AI leader with over a decade of experience in analytics, experimentation, and applied machine learning. He specializes in building and scaling AI-powered data products, including AI agents, RAG-based systems, and enterprise analytics platforms that drive smarter decisions and measurable business growth. He also serves as an Advisory Council Member at Products That Count, contributing to industry leadership and best practices.
Stan Gibson is an award-winning editor, writer, and speaker with 41 years’ experience covering information technology. Formerly executive editor of eWEEK and PC Week and senior editor at Computerworld, he is currently an adjunct analyst at IDC. As principal of Stan Gibson Communications, he writes for many websites, including CIO.com, and is a popular host for online events.
Arun Krishnakumar is a seasoned leader in e-commerce product strategy; digital transformation; and emerging technologies such as AI, machine learning, and blockchain. An author, startup mentor, and master class instructor, Krishnakumar has built and scaled cloud-based web and mobile platforms that serve millions of users. His experience spans customer acquisition and retention, conversion optimization, and sustainable growth, delivering impactful products for multibillion-dollar international brands.
Gaurav Mittal is a software engineer and seasoned IT manager adept at guiding teams in developing and deploying cutting-edge technology solutions. He specializes in implementing innovative automation solutions that unlock substantial cost savings and enhance operational efficiency.
Shipra Mittal is an accomplished IT professional with more than a decade of experience in software engineering, data quality, and analytics. She has advanced from leading client-focused software projects to building and guiding QA and data teams and now focuses on transforming data into actionable insights that enhance business performance. Recognized for her strong technical foundation and commitment to continuous learning, Mittal brings deep expertise in data validation, quality assurance, and analytics innovation to her role as a judge.
Priyank Naik is a principal engineer with more than 20 years of experience in the financial industry, specializing in building complex, real-time distributed, cloud-enabled systems for front-office operations, risk management, and fixed income research. In his current position, he is also involved in integrating genAI for automating financial reporting and forecasting cash flows.
Peter Nichol is a data and analytics leader for North America at Nestlé Health Science. He is a four-time author and an MIT Sloan and Yale School of Management speaker dedicated to helping organizations connect strategy to execution to maximize performance. His career has focused on driving and quantifying business value by championing disruptive technologies such as data analytics, blockchain, data science, and artificial intelligence. He has contributed to CIO.com and has been recognized for digital innovation by CIO 100, MIT Sloan, the BRM Institute, Computerworld, and PMI.
Anton Novikau is a seasoned software development leader with nine years of experience spearheading innovative technology solutions. As head of mobile development at Talaera, an EdTech start-up, Novikau drives the technical vision and execution of transformative learning experiences while pioneering AI integration across the company’s product suite. His expertise spans full-stack development, cloud architecture, and leveraging artificial intelligence to enhance educational outcomes.
Rahul Patil, hailing from the vibrant city of New York, is a seasoned professional in the tech industry with 18 years of extensive experience. Currently working at a hedge fund, he has honed his skills in back-end development with a particular focus on Java. His deep passion for technology drives him to constantly explore and utilize cloud-native services such as AWS and GCP.
Kautilya Prasad is a distinguished expert in software development, specializing in digital experience platforms and artificial intelligence. With more than 18 years of experience driving digital transformation for numerous Fortune 500 clients, Prasad excels at integrating artificial intelligence, digital experience, and data analytics to deliver innovative solutions that elevate customer engagement. He is an active contributor to the technology community, participating in peer reviews and shaping discussions on emerging technologies.
Shafeeq Ur Rahaman is an accomplished leader and researcher in data analytics and digital infrastructure, with over a decade of experience developing transformative, data-driven solutions that drive business performance. As the associate director of analytics and data infrastructure at Monks, he leads global initiatives in data pipeline automation, cloud architecture, and advanced analytics, including the design of mixed media marketing models to optimize campaign effectiveness.
Ramprakash Ramamoorthy leads the AI efforts for Zoho Corporation. He has been instrumental in setting up Zoho’s AI platform from scratch. He comes with a rich 12-plus years of experience in building AI for the enterprises at Zoho. The AI platform currently serves over a billion requests a day and is growing strong. Ramamoorthy is a passionate leader with a level-headed approach to emerging technologies and is a sought-after speaker at tech conferences.
Monika Rathor is a lead application engineer at Level Home, where she is building smart home solutions with smart access, automation, and building intelligence solutions that improve apartment living and management in the most impactful, cost-effective way possible. She is also a performance improvement enthusiast, driven to achieving optimizations like cutting latency from 200ms to just 50ms. Monika also loves mentoring her team, helping them grow and learn.
Isaac Sacolick is a lifelong technologist who has served in CTO and CIO roles and the founder of StarCIO, a digital transformation leadership, learning, and advisory company. He is a writer and keynote speaker and the author of the Amazon bestseller Driving Digital, a playbook for leading digital transformation, and Digital Trailblazer, a career guide for technology and business professionals. Recognized as a top digital influencer, Sacolick is a frequent contributor to InfoWorld and CIO.com.
Scott Schober is the president and CEO of Berkeley Varitronics Systems, a 54-year-old New Jersey–based provider of advanced, world-class wireless test and security solutions. He is the author of three best-selling security books: Hacked Again, Cybersecurity is Everybody’s Business, and Senior Cyber. Schober is a highly sought-after author and expert for live security events, media appearances, and commentary on the topics of ransomware, wireless threats, drone surveillance and hacking, cybersecurity for consumers, and small business.
Kumar Srivastava is a seasoned technology executive and entrepreneur with more than two decades of experience in building and scaling AI-driven platforms across consumer packaged goods (CPG), retail, cybersecurity, supply chain, and digital transformation. Currently serving as chief technology officer at Turing Labs, Inc., he leads the development of the industry’s most advanced AI formulation platform, helping the world’s largest CPG companies accelerate innovation, optimize costs, and bring products to market faster with scientific precision.
Page processed in 0.38 seconds.
Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2025, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.
