Or try one of the following: 詹姆斯.com, adult swim, Afterdawn, Ajaxian, Andy Budd, Ask a Ninja, AtomEnabled.org, BBC News, BBC Arabic, BBC China, BBC Russia, Brent Simmons, Channel Frederator, CNN, Digg, Diggnation, Flickr, Google News, Google Video, Harvard Law, Hebrew Language, InfoWorld, iTunes, Japanese Language, Korean Language, mir.aculo.us, Movie Trailers, Newspond, Nick Bradbury, OK/Cancel, OS News, Phil Ringnalda, Photoshop Videocast, reddit, Romanian Language, Russian Language, Ryan Parman, Traditional Chinese Language, Technorati, Tim Bray, TUAW, TVgasm, UNEASYsilence, Web 2.0 Show, Windows Vista Blog, XKCD, Yahoo! News, You Tube, Zeldman
Google’s cheaper, faster TPUs are here, while users of other AI processors face a supply crunch | InfoWorld
Technology insight for the enterpriseGoogle’s cheaper, faster TPUs are here, while users of other AI processors face a supply crunch 6 Nov 2025, 8:02 pm
Relief could be on the way for enterprises facing shortages of GPUs to run their AI workloads, or unable to afford the electricity to power them: Google will add Ironwood, a faster, more energy-efficient version of its Tensor Processing Unit (TPU), to its cloud computing offering in the coming weeks.
Analysts expect Ironwood to offer price-performance similar to GPUs from AMD and Nvidia, running in Google’s cloud, so this could ease the pressure on enterprises and vendors struggling to secure GPUs for AI model training or inferencing projects.
That would be particularly welcome as enterprises grapple with a global shortage of high-end GPUs that is driving up costs and slowing AI deployment timelines, and even those who have the GPUs can’t always get the electricity to operate them.
That doesn’t mean it will be all plain sailing for Google and its TPU customers, though: Myron Xie, a research analyst at SemiAnalysis, warned that Google might also face constraints in terms of chip manufacturing capacity at Taiwan Semiconductor Manufacturing Company (TSMC), which is facing bottlenecks around limited capacity for advanced chip packaging.
Designed for TensorFlow
Ironwood is the seventh generation of Google’s TPU platform, and was designed alongside TensorFlow, Google’s open-source machine learning framework.
That gives the chips an edge over GPUs in general for common in AI workloads built for TensorFlow, said Omdia principal analyst Alexander Harrowell. Many AI models, especially in research and enterprise scenarios, are built using TensorFlow, he said, and the TPUs are highly optimized for such operations while general-purpose GPUs that support multiple frameworks aren’t as specialized.
Opportunities for the AI industry
LLM vendors such as OpenAI and Anthropic, which still have relatively young code bases and are continuously evolving them, also have much to gain from the arrival of Ironwood for training their models, said Forrester vice president and principal analyst Charlie Dai.
In fact, Anthropic has already agreed to procure 1 million TPUs for training and its models and using them for inferencing. Other, smaller vendors using Google’s TPUs for training models include Lightricks and Essential AI.
Google has seen a steady increase in demand for its TPUs (which it also uses to run interna services), and is expected to buy $9.8 billion worth of TPUs from Broadcom this year, compared to $6.2 billion and $2.04 billion in 2024 and 2023 respectively, according to Harrowell.
“This makes them the second-biggest AI chip program for cloud and enterprise data centers, just tailing Nvidia, with approximately 5% of the market. Nvidia owns about 78% of the market,” Harrowell said.
The legacy problem
While some analysts were optimistic about the prospects for TPUs in the enterprise, IDC research director Brandon Hoff said enterprises will most likely to stay away from Ironwood or TPUs in general because of their existing code base written for other platforms.
“For enterprise customers who are writing their own inferencing, they will be tied into Nvidia’s software platform,” Hoff said, referring to CUDA, the software platform that runs on Nvidia GPUs. CUDA was released to the public in 2007, while the first version of TensorFlow has only been around since 2015.
This article first appeared on Network World.
Tabnine launches ‘org-native’ AI agent platform 6 Nov 2025, 8:01 pm
Tabnine has launched the Tabnine Agentic Platform for AI-assisted software development with coding agents. The platform enables enterprise software development teams to ship faster while maintaining control over code and context, the company said.
With Tabnine Agentic, introduced November 5, developers get autonomous coding partners that complete workflows, not just code suggestions or completions, all aligned with an organization’s standards and security policies, Tabnine said. Powered by the Tabnine Enterprise Context Engine, Tabnine’s “org-native” agents understand the users’ repositories, tools, and policies and use these artifacts to plan, execute, and validate multi-step development tasks, Tabnine said. Agent tasks include refactoring, debugging, and documentation. The engine incorporates coding standards, source and log files, and ticketing systems. Tabnine agents execute complete coding workflows, offering security and context, according to Tabnine.
Tabnine Agents can use external systems and tools to adapt to new codebases and polices without retraining or redeployment. The engine combines vector, graph, and agentic retrieval techniques to interpret relationships across codebases, tickets, and tools, enabling Tabnine’s org-native agents to reason through multi-step workflows, the company said. Enterprise-grade benefits cited include:
- Agents can automatically adapt to new codebases and policies, with no retraining or redeployment required.
- Agents can act and iterate autonomously through coding workflows.
- Centralized control ensures oversight of permissions, usage, and context.
- Contextual intelligence provides awareness of internal repositories, ticketing systems, and coding guidelines.
- SaaS, private, VPC, on-premises, and air-gapped deployments are all available and meet enterprise security standards.
Perplexity’s open-source tool to run trillion-parameter models without costly upgrades 6 Nov 2025, 12:52 pm
Perplexity AI has released an open-source software tool that solves two expensive problems for enterprises running AI systems: being locked into a single cloud provider and the need to buy the latest hardware to run massive models.
The tool, called TransferEngine, enables large language models to communicate across different cloud providers’ hardware at full speed. Companies can now run trillion-parameter models like DeepSeek V3 and Kimi K2 on older H100 and H200 GPU systems instead of waiting for expensive next-generation hardware, Perplexity wrote in a research paper. The company also open-sourced the tool on GitHub.
“Existing implementations are locked to specific Network Interface Controllers, hindering integration into inference engines and portability across hardware providers,” the researchers wrote in their paper.
The vendor lock-in trap
That lock-in stems from a fundamental technical incompatibility, according to the research. Cloud providers use different networking protocols for high-speed GPU communication. Nvidia’s ConnectX chips use one standard, while AWS’s Elastic Fabric Adapter (AWS EFA) uses an entirely different proprietary protocol.
Previous solutions worked on one system or the other, but not both, the paper noted. This forced companies to commit to a single provider’s ecosystem, or accept dramatically slower performance.
The problem is particularly acute with newer Mixture-of-Experts models, Perplexity found. DeepSeek V3 packs 671 billion parameters. Kimi K2 hits a full trillion. These models are too large to fit on single eight-GPU systems, according to the research.
The obvious answer would be Nvidia’s new GB200 systems, essentially one giant 72-GPU server. But those cost millions, face extreme supply shortages, and aren’t available everywhere, the researchers noted. Meanwhile, H100 and H200 systems are plentiful and relatively cheap.
The catch: running large models across multiple older systems has traditionally meant brutal performance penalties. “There are no viable cross-provider solutions for LLM inference,” the research team wrote, noting that existing libraries either lack AWS support entirely or suffer severe performance degradation on Amazon’s hardware.
TransferEngine aims to change that. “TransferEngine enables portable point-to-point communication for modern LLM architectures, avoiding vendor lock-in while complementing collective libraries for cloud-native deployments,” the researchers wrote.
How TransferEngine works
TransferEngine acts as a universal translator for GPU-to-GPU communication, according to the paper. It creates a common interface that works across different networking hardware by identifying the core functionality shared by various systems.
TransferEngine uses RDMA (Remote Direct Memory Access) technology. This allows computers to transfer data directly between graphics cards without involving the main processor—think of it as a dedicated express lane between chips.
Perplexity’s implementation achieved 400 gigabits per second throughput on both Nvidia ConnectX-7 and AWS EFA, matching existing single-platform solutions. TransferEngine also supports using multiple network cards per GPU, aggregating bandwidth for even faster communication.
“We address portability by leveraging the common functionality across heterogeneous RDMA hardware,” the paper explained, noting that the approach works by creating “a reliable abstraction without ordering guarantees” over the underlying protocols.
Already live in production environments
The technology isn’t just theoretical. Perplexity has been using TransferEngine in production to power its AI search engine, according to the company.
The company deployed it across three critical systems. For disaggregated inference, TransferEngine handles the high-speed transfer of cached data between servers, allowing companies to scale their AI services dynamically. The library also powers Perplexity’s reinforcement learning system, achieving weight updates for trillion-parameter models in just 1.3 seconds, the researchers said.
Perhaps most significantly, Perplexity implemented TransferEngine for Mixture-of-Experts routing. These models route different requests to different “experts” within the model, creating far more network traffic than traditional models. DeepSeek built its own DeepEP framework to handle this, but it only worked on Nvidia ConnectX hardware, according to the paper.
TransferEngine matched DeepEP’s performance on ConnectX-7, the researchers said. More importantly, they said it achieved “state-of-the-art latency” on Nvidia hardware while creating “the first viable implementation compatible with AWS EFA.”
In testing DeepSeek V3 and Kimi K2 on AWS H200 instances, Perplexity found substantial performance gains when distributing models across multiple nodes, particularly at medium batch sizes, the sweet spot for production serving.
The open-source bet
Perplexity’s decision to open-source production infrastructure contrasts sharply with competitors like OpenAI and Anthropic, which keep their technical implementations proprietary.
The company released the complete library, including code, Python bindings, and benchmarking tools, under an open license.
The move mirrors Meta’s strategy with PyTorch — open-source a critical tool, help establish an industry standard, and benefit from community contributions. Perplexity said it’s continuing to optimize the technology for AWS, following updates to Amazon’s networking libraries to further reduce latency.
Flaw in React Native CLI opens dev servers to attacks 6 Nov 2025, 12:33 pm
A critical remote-code execution (RCE) flaw in the widely used @react-native-community/cli (and its server API) lets attackers run arbitrary OS commands via the Metro development server, the default JavaScript bundler for React Native.
In essence, launching the development server through standard commands (eg, npm start or npx react-native start) could expose the machine to external attackers, because the server binds to all network interfaces by default (0.0.0.0), rather than limiting itself to “localhost” as it says in the console message.
According to JFrog researchers, the bug is a severe issue threatening developers of React Native apps. While exploitation on Windows is well-demonstrated (full OS command execution via unsafe open() call), the macOS/Linux paths are currently less straightforward–though the risk remains real and subject to further research.
A fix is available, but development teams must move fast, JFrog researchers warned in a blog post.
Weak development server defaults
The vulnerability arises because the Metro development server, which started using the CLI tool, exposes a “/open-url” HTTP endpoint that takes a URL parameter from a POST request and passes it directly to the “open()” function in the open NPM package. On Windows, this can spawn an “smd /c..” call, enabling arbitrary command execution.
Adding to the problem is a misconfiguration in the CLI, which prints that the server is listening on “localhost”, but under the hood, the host values end up undefined, and the server listens on 0.0.0.0 by default, opening it to all external networks.
This combination of insecure default binding and the flawed open() call creates the conditions for remote code execution, something rare and dangerous in a development-only tool.
“This vulnerability shows that even straightforward Remote Code Execution flaws, such as passing user input to the system shell, are still found in real-world software, especially in cases where the dangerous sink function actually resides in 3rd-party code, which was the imported “open” function in this case,” the researchers said.
The bug, tracked as CVE-2025-11953, is assigned a CVSS score of 9.8 out of 10, and affects versions 4.8.0 through 20.0.0-alpha.2.
What must developers do now?
Developers using @react-native-community/cli (or the bundled cli-server-api) in their React Native projects should check for the vulnerable package version on the npm list. The vulnerability is fixed in version 20.0.0 of cli-server-api, so immediate updating is recommended.
The stakes include an attacker remotely executing commands on the victim’s development machine, potentially leading to broader network access, code corruption, or injecting malicious payloads into an app build. If updating isn’t feasible right away, JFrog advised restricting the dev server to localhost by explicitly passing the “–host 127.0.0.1” flag to reduce exposure.
“It’s a reminder that secure coding practices and automated security scanning are essential for preventing these easily exploitable flaws before they make it to production,” the researchers said, recommending JFrog SAST for identifying issues early in the development process.
The React Native CLI flaw mirrors a broader trend of attackers slipping into developer ecosystems, from npm packages with hidden payloads to rogue “verified” IDE extensions, turning trusted build tools into stealthy points of entry.
Google boosts Vertex AI Agent Builder with new observability and deployment tools 6 Nov 2025, 11:44 am
Google Cloud has updated its Vertex AI Agent Builder with new observability dashboards, faster build-and-deploy tools, and stronger governance controls, aiming to make it easier for developers to move AI agents from prototype to production at scale.
The update adds an observability dashboard within the Agent Engine runtime to track token usage, latency, and error rates, along with a new evaluation layer that can simulate user interactions to test agent reliability.
Developers can now deploy agents to production with a single command using the Agent Development Kit (ADK), Google said in a blog post. New governance tools, such as agent identities tied to Cloud IAM and Model Armor, which block prompt injection attacks, are designed to improve security and compliance.
The ADK, which Google says has been downloaded more than seven million times, now supports Go in addition to Python and Java. This broader language support is aimed at making the framework accessible to a wider developer base and improving flexibility for enterprise teams building on multi-language stacks.
Google has also expanded managed services within the Agent Engine runtime. Developers can now deploy to the Agent Engine runtime directly from the ADK command-line interface without creating a full Google Cloud account. A Gmail address is enough to start using the service, with a free 90-day trial available for testing.
Agents built with Vertex AI Agent Builder can also be registered within Gemini Enterprise, giving employees access to custom-built agents in one workspace and linking internal tools with generative AI workflows.
The race to provide developer-friendly tools for creating secure and scalable agentic systems reflects a wider shift in enterprise AI. With the latest updates, Google is strengthening its position against competition that includes Microsoft’s Azure AI Foundry and AWS Bedrock.
Developer productivity gains
The updates are intended to make it easier to build and scale AI agents while enhancing governance and security controls.
“By turning orchestration, environment setup, and runtime management into managed services, Google’s Agent Development Kit cuts down on the time it takes to create and deploy software,” said Dhiraj Badgujar, senior research manager at IDC. “Vertex’s built-in model registry, IAM, and deployment fabric can shorten early development cycles for enterprises who are already using GCP.”
“LangChain and Azure AI Foundry provide for more model/cloud interoperability and manual flexibility, but they need more setup and bespoke integration to reach the same level of scalability, monitoring, and environment parity,” Badgujar added. “For new projects that fit with GCP, ADK may speed up development cycles by 2–3 times.”
Charlie Dai, VP and principal analyst at Forrester, agreed that Google’s new capabilities streamline the development process. “Compared to other offerings that often require custom pipelines and integration steps, Google’s approach can cut iteration time for teams already on Vertex AI,” Dai added.
Tulika Sheel, senior VP at Kadence International, noted that the ADK and one-click deployment in Vertex AI Agent Builder simplify agent creation by reducing setup and integration effort.
“For highly custom or niche workflows, the flexibility of open-framework solutions still wins, but for many enterprises seeking faster time-to-value, Google’s offering could be a real accelerator,” Sheel added.
The upgrade also represents a reset in how enterprises move from prototype to production, according to Sanchit Vir Gogia, chief analyst, founder, and CEO of Greyhound Research.
“For years, teams have been slowed by the hand-offs between development, security, and operations,” Gogia said. “Each phase added new tools, new reviews, and fresh delays. Google has pulled those pieces into one track. A developer can now build, test, and release an agent that already fits inside corporate policy.”
Observability and evaluation features
Analysts view Google’s new observability and evaluation tools as a significant improvement, though they say the capabilities are still developing for large-scale and non-deterministic agent workflows.
“The features in Vertex AI Agent Builder are a solid step forward but remain early-stage for complex, non-deterministic agent debugging,” Dai said. “While they provide granular metrics and traceability, integration with OpenTelemetry or Datadog is possible through custom connectors but not yet native.”
Others agreed that the tools are not yet full-stack mature. The latest updates enable real-time and retrospective debugging with agent-level tracing, tool auditing, and orchestrator visualization, along with evaluation using both metric-based and LLM-based regression testing.
“ADK gives GCP-native agents a lot of visibility, but multi-cloud observability is still not mature,” Badgujar said. “The new features make debugging non-deterministic flows a lot easier, although deep correlation across multi-agent states still needs third-party telemetry.”
Sheel echoed similar thoughts while acknowledging that the features are promising.
“At this stage, they’re still maturing,” Sheel said. “Enterprise uses with complex non-deterministic workflows (multi-agent orchestration, tool chains) will likely require additional monitoring hooks, custom dashboards, and metric extensions.”
Databricks adds customizable evaluation tools to boost AI agent accuracy 6 Nov 2025, 11:28 am
Databricks is expanding the evaluation capabilities of its Agent Bricks interface with three new features that are expected to help enterprises improve the accuracy and reliability of AI agents.
Agent Bricks, released in beta in June, is a generative AI-driven automated interface that streamlines agent development for enterprises and combines technologies developed by MosaicML, including TAO, the synthetic data generation API, and the Mosaic Agent platform.
The new features, which include Agent-as-a-Judge, Tunable Judges, and Judge Builder, enhance Agent Bricks’ automated evaluation system with more flexibility and customization, Craig Wiley, senior director of product management at Databricks, told InfoWorld.
Agent Bricks’ automated evaluation system can generate evaluation benchmarks via an LLM judge based on the defined agent task or workflow, often using synthetic data, to assess agent performance as part of its auto-optimization loop.
However, it didn’t offer an automated ability for developers to dig through the agent’s execution trace to find relevant steps without writing code.
One of the new features, Agent-as-a-Judge, offers that capability for developers, saving time and complexity while offering insights into an agent’s trace that can make evaluations more accurate.
“It’s a new capability that makes those automated evaluations even smarter and more adaptable — adding intelligence that can automatically identify which parts of an agent’s trace to evaluate, removing the need for developers to write or maintain complex traversal logic,” Wiley said.
AI and data consultancy firm Asperitas Consulting’s agentic AI enablement principal Derek Ashmore, too, feels that Agent-as-a-Judge offers a more flexible and explainable way to assess AI agent accuracy than the automated scoring that originally shipped with Agent Bricks.
Tunable Judges for agents with domain expertise
Another feature, Tunable Judges, is designed to give enterprises the flexibility to tune LLM judges for agents with domain expertise, which is a growing requirement in enterprise production environments.
“Enterprises value domain experts’ input to ensure accurate evaluations that reflect unique contexts, business needs, or compliance standards,” said Robert Kramer, principal analyst at Moor Strategy and Insights. “When Agent Bricks was initially introduced, many enterprises welcomed the ability to automate the evaluation and assessment of agents based on quality. As these agents transitioned from prototypes to a more demanding production environment, the limitations of generic evaluation logic became evident,” Kramer added.
Tunable Judges was the result of customer feedback, specifically on capturing subject matter expertise accurately and letting enterprises define what “correctness” is applicable to their agents, Wiley said.
Tunable Judges could be used in ensuring that clinical summaries don’t omit contraindications in healthcare, or in enforcing compliant language in portfolio recommendations, and evaluating tone, de-escalation accuracy, or even in policy adherence in customer support.
Enterprises have the option of using the new “make_judge” SDK introduced in MLflow 3.4.0 to create custom LLM judges by defining tailored evaluation criteria in natural language within Python code and running an evaluation on it.
Easing the complexity of agent evaluation
Enterprises would also have the option of using Judge Builder, a new visual interface within Databricks’ workspace, to create and tune LLM judges with domain knowledge from subject matter experts and utilize the Agent-as-a-Judge capability.
The Judge Builder, according to Kramer, is Databricks’s effort to set itself apart from rivals such as Snowflake, Salesforce, and ServiceNow, which also offer agent evaluation features, by making agent evaluation less complex and customizable.
“Snowflake’s agent tools use frameworks to check quality, but they don’t let you tune checks with business-specific feedback or domain rules in the same way Databricks does,” Kramer said.
Snowflake already offers AI observability and Cortex Agents, including “LLM-as-a-judge” evaluations, which focus on measuring accuracy and performance rather than interpreting an agent’s full execution trace.
Comparing Databricks’ new agent evaluation tools to those of Salesforce and ServiceNow, Kramer said that both vendors mostly focus on automating workflows and outcomes without deep, tunable agent judgment options. “If you need really tailored compliance or want business experts involved in agent quality, Databricks has the edge. For more basic automations, these differences probably matter less,” Kramer added.
Microsoft steers Aspire to a polyglot future 6 Nov 2025, 9:00 am
Microsoft’s Aspire development framework has dropped .NET from its name and moved to a new website, as it is now becoming a general-purpose environment for building, testing, and deploying scalable cross-cloud applications. Aspire has already proven to be a powerful tool for quickly creating cloud-native C# code. Is it ready to support other pieces of the modern development stack?
I’ve looked at Aspire before as it’s long been one of the more interesting parts of Microsoft’s developer tools, taking a code-first approach to all aspects of development and configuration. Instead of a sprawl of different (often YAML) files to configure services and platforms, Aspire uses a single code-based AppHost file that describes your application and the services it needs to run.
Along with the platform team at Microsoft, the growing Aspire community is developing an expanding set of integrations for what Aspire calls resources: applications, languages, runtimes, and services. There’s a standard format for building integrations that makes it easy to build your own and share them with the rest of the Aspire community, adding hooks for code and OpenTelemetry providers for Aspire’s dashboard.
Making Aspire cross-language
How does Aspire go from a .NET tool to supporting a wider set of platforms? Much of its capability comes from its architecture and its code-based approach to defining the components of your applications.
Using AppHost to bring together your code is the key to building polyglot applications in Azure. It lets you mix and match the code you need: a React front end, a Python data and AI layer, and Go services for business logic. You define how they’re called and how they’re deployed, whether for test or for production, on your PC or in the cloud.
Such an approach builds on familiar tools. There’s no difference between instantiating a custom Go application in a container and doing the same for an application like Redis. The only difference is whether you use a Microsoft-provided integration, one from the growing Aspire community, or one you’ve built yourself.
If you want to use, say, a Node.js component as part of an Aspire application, use the Aspire command line (or Visual Studio) to add the Node.js hosting library to your project. With a prebuilt application using Express or a similar framework, your AppHost simply needs to add a call to Aspire’s builder method using AddNodeApp for an application or AddNpmApp for one that’s packaged for Node’s package manager.
Node.js needs to be installed on your development and production systems, with code providing an appropriate REST API that can be consumed by the rest of your application. If you have other JavaScript code, like a React front end, it can be launched using the same tooling, packaging them all in separate Docker files.
Aspire Community Toolkit
An important piece of Aspire’s polyglot future is the Aspire Community Toolkit. This is a library of tools for hosting code and integrating with services that may not be in the official release yet. It gives you the tools to quickly extend Aspire in the direction you need without having to wait for a full internal review cycle. You get to move faster, albeit with the risks of not being able to use official support resources or of working with features that may not be quite ready for production.
If you use features from the Aspire Community Toolkit in your AppHost, you’re able to start with cutting-edge tools to build applications, like the Bun and Deno JavaScript/TypeScript environments, or you can work with memory-safe Go and Rust code. You can even bring in legacy Java code with support for a local JDK and popular enterprise frameworks like Spring.
There’s a long list of integrations as part of the Aspire Community Toolkit documentation, covering languages and runtimes, multiple container types and containerized services, and additional databases. If you want to use a specific client for a service, the toolkit includes a set of useful tools that can simplify working with APIs, including using popular .NET features like the Entity Framework. There is support for using Aspire to work with mock services during development, so you can connect to dummy mail servers and the like, swapping for live services in production.
Aspire Community Toolkit integrations began life as new custom integrations, which you can use to create your own links to internal services or to external APIs to use in your Aspire applications. For now, most integrations are written using .NET, adding custom resources and configurations.
At the heart of an integration is the Aspire.Hosting package reference. This is used to link the methods and resources in a class library to your Aspire integration.
Adding custom integrations
Building a new hosting integration starts with a project that’s designed to test that integration, which will initially be a basic AppHost that we’ll use to connect to the integration and display it in the Aspire dashboard. If you run the test project, you’ll see basic diagnostics and a blank dashboard.
Next, we need to create another project to host our new resources. This time we’re creating a .NET class library, adding the Aspire.Hosting package to this project. While it’s still blank, it can now be added as a reference to the test project. First make sure that the class library is treated as a non-service project by editing its project reference file. This will stop the project failing to run.
We’re now ready to start writing the code to implement the resource we’re building an integration for. Resources are added to the Aspire.Hosting.ApplicationModel namespace, with endpoint references and any necessary connection strings. This is where Aspire code will integrate with your new resource, providing a platform-agnostic link between application and service.
Your project now needs an extension method to handle configuration, using Aspire’s builder method to download and launch the container that hosts the service you’re adding. If you’re targeting a static endpoint, say a SAP application or similar with a REST API, you can simply define the endpoint used, either HTTP or a custom value.
With this in place, your new integration is ready for use, and you can write code that launches it and works with its endpoints. In production, of course, you’ll need to ensure that your endpoints are secure and that messages sent to and from them are sanitized. That also means making sure your application deployment runs on private networks and isn’t exposed to the wider internet, so be sure to consider how your provider configures its networking.
You can simplify things by ensuring that your integration publishes a manifest that contains details like host names and ports. Once you have a working integration, you’re able to package it as a NuGet package for sharing with colleagues or the wider internet.
A community to build the future
Moving from a .NET-only Aspire to one that supports the tools and platforms you want to use makes a lot of sense for Microsoft. Cloud-native, distributed applications are hard to build and run, so anything that simplifies both development and operations should make a lot of developers’ lives easier. By adopting a code-based approach to application architecture and services, Aspire embodies key devops principles and bakes them into the software development process.
For now, there will still be dependencies on .NET in Aspire, even though you can build integrations for any language or platform—or any application endpoint, for that matter. There are some complexities associated with building integrations, but we can expect the process to become a lot simpler as more developers adopt the platform and as they start to share their own integrations with the community. This is perhaps key to this change of direction in Aspire. If it is to be successful as a polyglot application development tool, it needs to have buy-in, not only from its existing core developers, but from experts in all the languages and services it wants to consume so that we are able to build the best possible code.
Building a bigger community of engaged contributors is key to Aspire’s future. Emphasizing features like the Aspire Community Toolkit as a way for integrations to graduate from being experiments to being part of the platform will be essential to any success.
Developers don’t care about Kubernetes clusters 6 Nov 2025, 9:00 am
If you look at the Cloud Native Computing Foundation landscape it might seem cloud developers are a lucky bunch. There seems to be an existing tool for literally every part of the software development life cycle. This means that developers can focus on what they want (i.e., creating features) while everything else (e.g., continuous integration and deployment) is already in place. Right?
Not so fast. The CNCF landscape tells only part of the story. If you look at the cloud tools available, you might think that everything is covered and we actually have more tools than needed.
The problem, however, is that the cloud ecosystem right now has the wrong focus. Most of the tools available are destined for administrators and operators instead of feature developers. This creates a paradox where the more tools your organization adopts, the less happy are your developers. Can we avoid this?
Looking beyond the clusters
It was only natural that the first cloud tools would be about creating infrastructure. After all, you need a place to run your application, in order to offer value to your end users. The clear winner in the cloud ecosystem is Kubernetes, and many tools revolve around it. Most of these tools only deal with the cluster itself. You can find great tools today that
- Create Kubernetes clusters
- Monitor Kubernetes clusters
- Debug Kubernetes clusters
- Network and secure Kubernetes clusters
- Auto-scale and cost-optimize the cluster according to load
This is a great starting point, but it doesn’t actually help developers in any way. Developers only care about shipping features. Kubernetes is a technical detail for them, as virtual machines were before Kubernetes.
The problem is that almost all the tools available focus on individual clusters. If your organization is using any kind of Kubernetes dashboard, I would bet that on the left sidebar there is a nice big button called “clusters” that shows me a list of all available Kubernetes installations.
But here is the hard truth. Developers don’t care about Kubernetes clusters. They care about environments—more specifically the classic trilogy of QA, staging, and production. That’s it.
Maybe in your organization Staging is a single cluster. Maybe Staging is two clusters. Maybe Staging is a namespace inside another bigger cluster. Maybe Staging is even a virtual cluster. It doesn’t really matter for developers. All they want to see is an easy way to deploy their features from one environment to the next.
If you want to make life easy for developers, then offer them what they actually need.
- A list of predefined environments with a logical progression structure
- A way to “deploy” their application to those environments without any deep Kubernetes knowledge
- An easy way to create temporary preview environments for testing a new feature in isolation
- A powerful tool to debug deployments when things go wrong.
In this manner, developers will be able to focus on what actually matters to them. If you force developers to learn Helm, Kustomize, or how Kubernetes manifests work, you are wasting their time. If every time a deployment fails, your answer is “just use kubectl to debug the cluster,” then you are doing it wrong.
Promotions are more critical than deployments
So, let’s say you followed my advice and offered your developers a nice dashboard that presents environments instead of individual clusters. Is that enough?
It turns out that you must also offer a way to “deploy” to those environments. But here is the critical point. Make sure that your fancy dashboard understands the difference between a deployment and a promotion.
A deployment to an environment should be a straightforward process. A developer must be able to choose
- A version of their application (the latest one or a previous one)
- An environment with appropriate access
- A way to make sure that the deployment has finished successfully.
Sounds simple, right? It is simple, but this process is only useful for the first environment where code needs to be verified. For the rest of the environments in the chains, developers want to promote their application.
Unlike a deployment, a promotion is a bit more complex. It means that a developer wants to take what is already available in the previous environments (e.g., QA) and move that very same package in the next environment (e.g., Staging).
The magic here is that if you look at all the environments of the organization there is a constant battle between how “similar” two environments are. In the most classic example, your Staging environment should be as close as possible to Production in order to make sure that you test your application in similar conditions.
On the other hand, it should be obvious that Staging should not be able to access your production database or your production queues. You should have separate infrastructure for handling Staging data.
This means that by definition some configuration settings (e.g., database credentials) are different between Production and Staging. So when a developer wants to “promote” an application, what they really want to do is
- Take the parts of the application that actually need to move from one environment to another
- Ignore the configuration settings that stay constant between environments.
This is a very important distinction, and the majority of cloud tools do not understand it. Many production incidents start because either a configuration change was different between production and staging (or whichever was the previous environment) or because the developer deployed the wrong version to production, bypassing the previous environments.
Coming back to your developer dashboard, if you offer your developers just a drop down list from all possible versions of an application allowing them to choose what to deploy, you are doing it wrong. What developers really want is to promote whatever is active and verified in the previous environment.
At least for production this should be enforced at all times. Production deployments are the last step in a chain where a software version is gradually moved from one environment to another.
Behind the scenes, your fancy dashboard should also understand what configuration needs to be promoted and what configuration stays the same for each environment. In the case of Kubernetes, for example, the number of replicas for each environment is probably static. But your application’s configmaps should move from one environment to another when a promotion happens.
Deployment pipelines no longer work in the cloud era
We have covered environments and promotions so it is time to talk how exactly a deployment takes place. The traditional way of deploying an application is via pipelines. Most continuous integration software has a way of creating pipelines as a series of steps (or scripts) that execute one after the other.
The typical pipeline consists of:
- Checking out the source code of the application
- Compiling and building the code
- Running unit and integration tests
- Scanning the code for vulnerabilities
- Packaging the code in its final deliverable.
Before the cloud, it was common to have another step in the pipeline that took the binary artifact and deployed it to a machine (via FTP, rsync, SSH, etc.). The problem with this approach is that the pipeline only knows what is happening while the pipeline is running. Once the pipeline has finished, it no longer has visibility into what is happening in the cluster.
This creates a very unfortunate situation for developers, with the following pattern:
- A developer is ready to perform a deployment
- They start the respective pipeline in the continuous integration dashboard
- The pipeline runs successfully and deploys the application to the cluster
- The pipeline ends with a “green” status
- Five minutes later the application faces an issue (e.g., slow requests, missing database, evicted pods)
- The developer still sees as “green” the pipeline and has no way of understanding what went wrong.
It is at this point that developers are forced to look at complex metrics or other external systems in order to understand what went wrong. But developers shouldn’t have to look in multiple places to understand if their deployment is OK or not.
Your deployment system should also monitor applications, even after the initial deployment has finished. This is an absolute requirement for cloud environments where resources come and go—especially in the case of Kubernetes clusters, where autoscaling is in constant effect.
Catching up with cloud deployments
Cloud computing comes with its own challenges. Most existing tools were create before the cloud revolution and were never designed for the dynamic nature of how cloud deployments work. In this new era, developers are left behind because nobody really understands what they need.
In the case of Kubernetes, existing tools tend to be oriented towards operators and administrators:
- They show too much low-level information that is not useful for developers
- They don’t understand how environments are different and how to promote applications
- They still have the old mindset of continuous integration pipelines.
We need to rethink how cloud computing affects developers. With the recent surge in generative AI and LLM tools, deploying applications will quickly become the bottleneck. Developers will be able to quickly create features with their smart IDEs or AI agents, but they will never understand how to promote applications or how to quickly pinpoint what was the issue of a failed deployment.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
Mozilla.ai releases universal interface to LLMs 5 Nov 2025, 9:46 pm
Mozilla.ai, a company backed by the Mozilla Foundation, has released any-llm v1.0, an open-source Python library that provides a single interface to communicate with different large language model (LLM) providers.
any-llm 1.0 was released November 4 and is available on GitHub. With any-llm, developers can use any model, cloud or local, without rewriting a stack every time. This means less boilerplate code, fewer integration headaches, and more flexibility to pick what works best for the developer, Nathan Brake, machine learning engineer at Mozilla.ai, wrote in a blog post. “We wanted to make it easy for developers to use any large language model without being locked into a single provider,” Brake wrote.
Mozilla.ai initially introduced any-llm on July 24. The 1.0 release has a stable, consistent API surface, async-first APIs, and re-usable client connections for high-throughput and streaming use cases, Brake wrote. Clear deprecation and experimental notices are provided to avoid surprises when API changes may occur.
The any-llm v1.0 release adds the following capabilities:
- Improved test coverage for stability and reliability
- Responses API support
- A List Models API to programmatically query supported models per provider
- Re-usable client connections for better performance
- Standardized reasoning output across all models, allowing users to access LLM reasoning results regardless of the provider chosen
- Auto-updating of the provider compatibility matrix, which shows which features are supported by which providers
Future plans call for support for native batch completions, support for new providers, and deeper integrations inside of the company’s other “any-suite” libraries including any-guardrail, any-agent, and mcpd.
How multi-agent collaboration is redefining real-world problem solving 5 Nov 2025, 9:09 am
When I first started working with multi-agent collaboration (MAC) systems, they felt like something out of science fiction. It’s a group of autonomous digital entities that negotiate, share context, and solve problems together. Over the past year, MAC has begun to take practical shape, with applications in multiple real-world problems, including climate-adaptive agriculture, supply chain management, and disaster management. It’s slowly emerging as one of the most promising architectural patterns for addressing complex and distributed challenges in the real world.
In simple terms, MAC systems consist of multiple intelligent agents, each designed to perform specific tasks, that coordinate through shared protocols or goals. Instead of one large model trying to understand and solve everything, MAC systems decompose work into specialized parts, with agents communicating and adapting dynamically.
Traditional AI architectures often operate in isolation, relying on predefined models. While powerful, they tend to break down when confronted with unpredictable or multi-domain complexity. For example, a single model trained to forecast supply chain delays might perform well under stable conditions, but it often falters when faced with situations like simultaneous shocks, logistics breakdowns or policy changes. In contrast, multi-agent collaboration distributes intelligence. Agents are specialized units on the ground responsible for analysis or action, while a “supervisor” or “orchestrator” coordinates their output. In enterprise terms, these are autonomous components collaborating through defined interfaces.
The Amazon Bedrock platform is one of the few early commercial examples that provide multi-agent collaboration capability. It consists of a supervisor agent that breaks down a complex user request — say, “optimizing a retail forecast” — into sub-tasks for domain-specific agents to action, such as data retrieval, model selection and synthesis.
This decomposition helps improve decision-making accuracy and, at the same time, provides more transparency and control. At the protocol layer, standards like Google’s Agent-to-Agent (A2A) and Anthropic’s Model Context Protocol (MCP) define how agents discover and communicate across environments. Think of them as the TCP/IP of collaborative AI, enabling agents built by different organizations or using different models to work together safely and efficiently.
The architecture of multi-agent collaboration
Solving global real-world problems requires architectures that can maintain a balance between autonomy, communication and oversight. In my experience, designing such a system on a high level requires following four interoperable layers:
1. Agent layer: Specialization
This layer contains individual agents, each having a dedicated role such as prediction, allocation, logistics or regulation. Agents can be fine-tuned LLMs, symbolic planners or hybrid models wrapped in domain-specific APIs. This modularity mirrors microservice design: loosely coupled, highly cohesive.
2. Coordination layer: Orchestration
This layer is known as the nervous system, responsible for keeping agents connected with each other. Agents exchange intents instead of raw data using A2A, MCP or custom message brokers (e.g., Kafka, Pulsar). The orchestration layer routes these intents between agents, resolves conflicts and aligns timing. It can support different topologies, including centralized, peer-to-peer or hierarchical, depending on latency and trust requirements.
3. Knowledge layer: Shared context
This layer provides memory for the agents, a shared context store, typically a vector database (e.g., Weaviate, Pinecone) combined with a graph database (e.g., Neo4j), that maintains world state: facts, commitments, dependencies and outcomes. This persistent memory ensures continuity across events and agents.
4. Governance layer: Oversight and trust
This layer provides governance through policy enforcement, decision audits and human involvement for ad hoc inspection/checkpoints. In addition, it manages authentication, explainability and ensures decisions remain within legal and ethical bounds.
Multi-agent collaboration in action
The real excitement around multi-agent collaboration isn’t confined to cloud platforms or developer sandboxes. It’s happening in the physical and environmental systems that sustain our world.
Climate-adaptive agriculture: Agents for a living planet
Nowhere have I found this shift more urgent or inspiring than in climate-adaptive agriculture. Today, Farmers are confronting growing uncertainty in rainfall, soil health and temperature variability. Centralized AI models can provide useful insights, but they rarely adapt fast to localized changes.
In contrast, a multi-agent ecosystem can coordinate real-time sensing, forecasting and action across distributed farms:
- Sensor agents monitor soil moisture and nutrient data.
- Weather agents pull localized forecasts and detect anomalies.
- Irrigation agents decide watering schedules, negotiating water allocation with regional policy agents.
- Market agents adjust planting and distribution strategies based on demand and logistics.
In precision agriculture projects, I’ve researched how farmers using multi-agent systems that integrate aerial drones with ground robots have reported crop yield increases of up to 10%, while simultaneously reducing input costs. That’s not a theoretical projection — it’s happening on working farms right now.
Here’s how it works in practice: UAVs (drones) survey fields from above, identifying problem areas and monitoring crop health across hundreds of acres. Meanwhile, ground-based robots handle targeted interventions like precise irrigation, fertilizer application or pest management. The key is that these agents communicate and coordinate. When a sensor detects a sudden increase in soil moisture in one area, the irrigation system automatically adjusts to prevent overwatering. No human intervention or central command center is required for making all the decisions.
Supply chain collaboration under pressure
The global supply chain is another proving ground for MAC. A single bottleneck, whether caused by weather, labor strikes or geopolitical tension, can ripple across continents. Multi-agent systems provide a way to detect, simulate and respond to those disruptions faster than traditional analytics pipelines.
Multi-agent systems in supply chains involve networks of AI-powered agents that work together autonomously, making the supply chain smarter, faster and more resilient. The beauty of these systems lies in their autonomy and flexibility, where each agent can make decisions within its realm while communicating and collaborating to achieve overarching goals.
Here’s how I’ve found collaboration plays out in practice:
- In demand forecasting, one agent might analyze social media trends while another examines economic indicators. Working together, they create a more accurate forecast.
- For inventory management, an agent monitoring sales trends can instantly communicate with another controlling reordering to ensure optimal stock levels.
- In logistics optimization, one agent plans the best truck routes while another monitors traffic conditions and if a road closure occurs, the agents can quickly recalculate and reroute in real time.
The integration creates a digital nervous system for supply chains, enabling unprecedented levels of coordination and efficiency, with companies reporting an average 15% reduction in overall supply chain costs. The systems provide enhanced end-to-end visibility, improved demand forecasting accuracy, reduced planning costs by over 25%, increased agility in responding to market fluctuations and optimized inventory management.
Multi-agent disaster management systems
The same principles of distributed intelligence are also redefining disaster management. In these high-stakes environments, I’ve found that coordination and adaptability can mean the difference between life and death.
When I first began exploring multi-agent disaster response systems, I was struck by how they function like a digital ecosystem of autonomous specialists. Each agent representing rescue workers, evacuees or information hubs, acts independently but coordinates through shared situational awareness. By processing data and executing localized decisions in parallel, Multi-agent systems dramatically reduce response latency and improve resilience in uncertain environments.
In simulated evacuations, for instance, each virtual evacuee is modeled as an agent with unique physical and psychological attributes such as age, health and stress level that evolve in real time. The emergent behavior that arises from thousands of these agents interacting offers critical insights into crowd dynamics and evacuation strategies that static models could never capture.
Lessons for system architects
Architecting multi-agent ecosystems demands new design heuristics:
- Design for negotiation, not command. Replace schedulers with protocols where agents bargain over shared goals.
- Treat memory as infrastructure. Context persistence is as critical as compute.
- Embed governance early. Auditing and policy hooks must be first-class citizens.
- Prioritize modular onboarding. Use schemas and APIs that allow new agents to join with minimal friction.
In this paradigm, coordination becomes a first-order system capability. Future cloud platforms will likely evolve to provide “cooperation primitives ” — built-in support for intent passing, conflict arbitration and collective state management.
The road ahead: Standards, security and trust
Like any emerging paradigm, MAC comes with its share of unanswered questions. How do we keep agents aligned when they act semi-autonomously? Who defines their access rights and goals? And what happens when two agents disagree?
Early standards such as the Model Context Protocol (MCP) and Agent-to-Agent (A2A) are beginning to shape the answers. They make it possible for agents to communicate securely, share context and discover one another in permissioned ways. But technology alone won’t solve the deeper challenges. Organizations will also need governance frameworks, clear rules for delegation, auditing and alignment, to prevent “agent sprawl” as systems scale.
In practice, the most successful MAC pilots typically start small with a few agents automating tasks such as data triage or workflow handoffs. Over time, it evolves into full-fledged ecosystems where collaboration between agents feels as natural as calling an API.
That evolution, however, comes with new responsibilities:
- Balancing goals: When agents have conflicting goals, for example, one trying to maximize yield while another aims to minimize emissions, they need a way to resolve those differences through arbitration models that balance fairness with efficiency.
- Securing the network: A single malicious or compromised agent could distort results or spread misinformation. Robust identity and trust management are non-negotiable.
- Building transparency: For high-impact systems, humans must be able to trace why an agent made a decision. Clear logs and language-level reasoning trails make that possible.
- Testing at scale: Before deployment, thousands of agents need to be stress-tested in realistic environments. Tools like MechAgents and SIMA are paving the way here.
Ultimately, the future of multi-agent collaboration will depend not just on smarter technology but on how well we design for trust, transparency and responsible governance. The organizations that get this balance right will be the ones that turn MAC from a promising experiment into a lasting advantage.
A change in how we think about intelligence itself
Multi-agent collaboration represents a transformational shift from building smarter models to building smarter networks. It’s a change in how we think about intelligence itself; it is not a single entity, but as a collection of cooperating minds, each contributing a piece of situational understanding.
As someone who has spent years in enterprise systems, I find that deeply human. We thrive not as isolated experts but as collaborators, each with a unique role and perspective. The same principle is now shaping the next generation of AI. Whether we’re managing crops, supply chains or disasters, the path forward looks less like command-and-control and more like conversation.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
Some thoughts on AI and coding 5 Nov 2025, 9:00 am
Holy cow, things are moving fast in the software business—ideas are coming like an oil gusher, and keeping up is a challenge. Here are a few observations, amazements, puzzlements, and prognostications regarding AI and software development that have occurred to me recently.
Vibe coding for the win
Vibe coding has come a long way in the six months since I first tried it. I recently picked up the same project that I (or rather Claude Code) had built, and I was shocked at how much better the agent was. In the first go round, I had to keep a close eye on things to make sure that the agent didn’t go off the rails. But this time? It pretty much did everything right the first time. I’m still stunned. I suspect it’s going to be a while before the amazement wears off.
One thing that is absolutely fantastic about vibe coding is debugging. Those cryptic error messages that take human programmers minutes or sometimes hours to run down can be deciphered and debugged by AI in seconds. I’m now at the point where I don’t even ask the agent about the error message; I just enter it, and it automatically identifies the problem. A great example: package dependency hell. No human can decipher the deep dependency chains created by our applications today, but AI can untangle—and fix!—these issues without missing a beat.
I expect we will see an explosion of what might be called “boutique software” as a result of vibe coding. There are endless ideas for websites and mobile apps that never got written or created because the cost to produce them outweighed the benefits they promised. But if the cost of producing them is drastically reduced, then that cost/benefit ratio becomes viable, and those small but great ideas will come to fruition. Prepare for “short form software,” similar to what TikTok did for content producers.
Software development is uniquely positioned to take advantage of AI agents. Large language models (LLMs) are—no surprise—based on text. They take text as input and produce text as output. Given that code is all text, LLMs are particularly good at producing code. And because computer code isn’t particularly nuanced compared to spoken language, AI easily learns from existing code and thus excels at producing code. It’s a virtuous cycle.
Software development futures
The previous point creates a dilemma of sorts. Up until now, humans have written all the code that LLMs train on. As humans write less and less code, what will the LLMs be trained on? Will they learn from their own code? I’m guessing what will happen is that humans will continue to design the building blocks—components, libraries, and frameworks—and LLMs will “riff” off of the scaffolding that humans create. Of course, it may be that at some point AI will be able to learn from itself, and we humans will merely describe what we want and get it without worrying about the code at all.
It seems kind of nuts to put limits on what AI can do in coding and software engineering. “We’ll always need software developers” is easy to say, but frankly, I’m not so sure it is true. I suppose it was easy to say “We’ll always need farmers” or “We’ll always need autoworkers.” Although both of those statements are still true, there are a lot fewer farmers and autoworkers today than there were decades ago. I suppose there will always be a need for software developers—the question is how many.
Hidden Figures is a beautiful movie about a group of Black women who were instrumental in getting the early US space program off the ground. They were called “computers” because they literally computed trajectories, landing coordinates, and all the precise calculations needed to safely conduct space flight. They did heroic and admirable work. But today, all of those calculations can be done with a Google spreadsheet. I think that AI is going to do to software developers what HP calculators did to human computers.
At this point, the only thing I can predict is that no one has a clue where software development is headed. AI is such a strong catalytic force that no one knows what will happen next week, much less next month or next year. Whatever does happen, it is going to be amazing.
A fresh look at the Spring Framework 5 Nov 2025, 9:00 am
The Spring Framework is possibly the most iconic software development framework of all time. It once suffered from a reputation of bloat, but it has long since shed that perception.
In this article, we’ll take fresh look at Spring, including an overview of Spring Boot and how Spring handles standard application needs like persistence and security. You’ll also learn about newer features that modernize Spring, including its approach to developer experience, reactive programming, and cloud and serverless development.
Streamlined dependency injection
The heart of Spring is still the dependency injection (DI) engine, which is the killer feature that started it all. Also called inversion of control (IOC), dependency injection is a way of wiring together classes without explicitly writing the connective code in the classes themselves. In modern Spring, DI works by convention, with minimal intervention from the developer. Unlike in the past, there is usually little if any XML management involved.
Most Spring beans can be auto-wired with just a few annotations. Spring will scan the project and automatically inject the proper dependency. For example, here’s how you would wire a Repository component to a Service component:
@Component
public class MovieFinder {
public List findByGenre(String genre) {
if ("sci-fi".equalsIgnoreCase(genre)) {
return List.of("Blade Runner", "Total Recall", "The 5th Element");
} else {
return List.of("The Godfather", "The Princess Bride");
}
}
}
@Component
public class MovieRecommender {
private final MovieFinder movieFinder;
// Automatically injected:
public MovieRecommender(MovieFinder finder) {
this.movieFinder = finder;
}
public String recommendMovie(String genre) {
List movies = movieFinder.findByGenre(genre);
return movies.get(0);
}
}
Simply declaring the two classes as @Components is enough for Spring to put them together, so long as the injected component’s constructor (in this case the default) matches with the consuming component’s reference. Using Spring’s DI implementation to wire together components feels almost like an effortless extension of Java itself.
Spring Boot: The key to modern Spring
Modern Spring is built around Spring Boot, which provides low-overhead access to the vast resources of the Spring Framework.
Starting a new Spring application used to be a manual process, but Spring Boot changed all that. Spring Boot makes it simple to create a new standalone app without any boilerplate configuration or containers.
Creating a new Spring Boot application is as simple as using Spring Initializr, which you can access as a web tool or via the Initializr CLI.
Spring Boot apps that include @SpringBootApplication will even configure dependencies like datastores for you automatically. Of course, you can always override the default configurations with manual ones (for example, if you needed to create your own DataStore component).
Spring Boot also includes dependency starters. These let you include one dependency that includes all the things you need for a particular area. For example, spring-boot-starter-web automatically pulls in Spring MVC, Jackson XML, and an embedded server, so you don’t have to deal with adding these components yourself.
Spring Boot is highly streamlined and makes it easy to set up projects using best practices. It also allows you to incrementally adopt more sophisticated or custom features as the need arises. The overall effect is that you can add a Spring Boot dependency, define your project, and get right into the coding.
Enhanced developer experience
Modern Spring includes top-shelf test support. Beyond unit tests, Spring Boot lets you simply include the spring-boot-starter-test starter, which brings in complete testing support. Using the @SpringBootTest annotation allows you to start the entire app in a test context for full-integration test suites.
Spring Boot also includes top-shelf integration with TestContainers, which allows you to create profiles for running tests inside fully configured containers (like Docker), which contain not only the application but architectural dependencies such as databases.
Similarly, spring-boot-starter-actuator allows you to quickly add production-grade services like monitoring and management. These are essential needs in a cloud environment that allow for the automated monitoring of application health and LoS (Level of Service) metrics.
Actuator generates several endpoints for your app, including /actuator/health, /actuator/metrics, and /actuator/info.
Built for the modern cloud
Spring was once associated with applications that were slow to start, but that canard has long since been laid to rest. Modern Spring apps can be AOT (ahead-of-time) compiled to a native binary using GraalVM.
Native binaries and AOT means you get near-instant startup times, a key benefit for serverless deployments where instances can be frequently started and stopped to meet demand. (It should also be noted that your Spring app gets all the benefits of the JVM as well. Newer Java features like virtual threads and continuous refinements like compact headers add up to big benefits for cloud-hosted applications.
Modern Spring also handles all the scanning and linking of the Spring beans (wired components) during the AOT compilation phase, so applications do not suffer any slowdown from classes using dependency injection.
Reactive programming with Spring WebFlux
Another key trend in modern development and the cloud especially is reactive programming, which Spring fully embraces. Reactive programming is asynchronous and non-blocking and gives you a whole conceptual model for handling realtime streams of data.
Spring’s WebFlux module is designed from the ground up to give you access to this powerful paradigm. Everything remains within the domain of Spring’s overarching design, which eases the adoption path for developers coming from a traditional Java Servlet background.
As an example of the reactive approach, when using WebFlux the return type from an endpoint would be a Flux object, instead of a standard Collection. This lets you stream the results back as they become available:
@RestController
public class MovieController {
private final MovieRepository movieRepository;
public MovieController(MovieRepository movieRepository) {
this.movieRepository = movieRepository;
}
// Notice the return type
@GetMapping("/movies")
public Flux getAllMovies() {
return movieRepository.findAll();
}
}
When application requirements call for high-throughput data processing, WebFlux is an ideal solution.
Java persistence with Spring Data
Spring Data is a highly sophisticated, persistence-aware framework that has been refined over the years. In addition to supporting Java’s Jakarta Persistence API, Spring provides easy entry to newer approaches such as the following repository class. Note that this class does not require any annotation because Spring Boot recognizes it subclasses a persistence base-class:
import org.springframework.data.repository.reactive.ReactiveCrudRepository;
public interface MovieRepository extends ReactiveCrudRepository {
// You define the method signature; Spring Data R2DBC provides the implementation
Flux findByGenre(String genre);
}
This code uses Spring R2DBC, a relational database connection library that uses asynchronous reactive drivers. The beauty is that the engine itself provides the implementation based on the fields and methods of the data object; as the developer, you do not have to implement the findByGenre method.
Spring Security
Modern web security is never going to be easy, but Spring goes a long way to making it as digestible as possible. The magic of Spring Security is bringing together ease of use with the advanced features many applications require.
Spring Boot’s spring-boot-starter-security module makes integrating security features as simple as adding a single dependency. Of course, security in the modern landscape gets messy and complex fast, but Spring Security has everything you need, from JWT to OAuth 2.0 and OpenID Connect (OIDC) for single sign-on, SAML for federated SSO, and integration with auth stores like LDAP.
Security is a “cross-cutting” concern that touches every aspect of the application, and Spring’s AOP (aspect-oriented programming) support lets you apply security in a consistent fashion, even in complex architectures. Beyond the method-level security of AOP, Spring’s security filter engine is well-adapted for all kinds of web-based security needs.
Conclusion
Modern Spring delivers everything necessary to make it an excellent choice for the new era of software development. Starting from a single conceptual mechanism (inversion of control), Spring lets you define all your custom objects in the same idiom used for third-party components and cross-cutting capabilities like security and logging. The Spring Framework incorporates newer Java feature such as the improved structured concurrency model, while providing access to a vast ecosystem of libraries and modules designed for common application needs and architectures. These features and benefits are all generally accessible using Spring Boot, and you can customize and refine your application as it grows.
AI and machine learning outside of Python 5 Nov 2025, 9:00 am
Name a language used for machine learning and artificial intelligence. The first one that comes to mind is probably Python, and you wouldn’t be wrong for thinking that. But what about the other big-league programming languages?
C++ is used to create many of the libraries Python draws on, so its presence in AI/ML is established. But what about Java, Rust, Go, and C#/.NET? All have a major presence in the enterprise programming world; shouldn’t they also have a role in AI and machine learning?
Java
In some ways, Java was the key language for machine learning and AI before Python stole its crown. Important pieces of the data science ecosystem, like Apache Spark, started out in the Java universe. Spark pushed the limits of what Java could do, and newer projects continue to expand on that. One example is the Apache Flink stream-processing system, which includes AI model management features.
The Java universe—meaning the language, the JVM, and its ecosystem (including other JVM languages like Kotlin)—provides a solid foundation for writing machine learning and AI libraries. Java’s strong typing and the speed of the JVM mean native Java applications don’t need to call out to libraries in other languages to achieve good performance.
Java-native machine learning and AI libraries exist, and they’re used at every level of the AI/ML stack. Those familiar with the Spring ecosystem, for instance, can use Spring AI to write apps that use AI models. Apache Spark users can plug into the Apache Spark MLib layer to do machine learning at scale. And libraries like GPULlama3 support using GPU-accelerated computation—a key component of machine learning—in Java.
The one major drawback to using Java for machine learning—shared with most other languages profiled here—is its relatively slow edit-compile-run cycle. That limitation makes Java a poor choice for running experiments, but it’s a prime choice for building libraries and inference infrastructure.
Rust
Despite Rust’s relative youth compared to Java (Rust is just 13 years old compared to Java’s 30), Rust has made huge inroads across the development world. Rust’s most touted features—machine-native speed, memory safety, and its strong type system—provide a solid foundation for writing robust data science tools.
Odds are any work you’ve done in the data science field by now has used at least one Rust-powered tool. An example is the Polars library, a dataframe system with wrappers for multiple languages. A culture of Rust-native machine learning and data science tools (tools meant to be used in the Rust ecosystem and not just exported elsewhere) has also started to take shape over the last few years.
Some of the projects in that field echo popular tools in other languages, such as ndarray, a NumPy-like array processing library. Others, like tract, are for performing inference on ONNX or NNEF models. And others are meant to be first-class building blocks for doing machine learning on Rust. For instance, burn is a deep learning framework that leverages Rust’s performance, safety, and compile-time optimizations to generate models that are optimized for any back end.
Rust’s biggest drawback when used for machine learning or AI is the same as Java’s: Compile times for Rust aren’t trivial, and large projects can take a while to build. In Rust, that issue is further exacerbated by the large dependency chains that can accumulate in its projects. That all makes doing casual AI/ML experiments in Rust difficult. Like Java, Rust is probably best used for building the libraries and back ends (i.e., infrastructure and services) rather than for running AI/ML experiments themselves.
Go
At a glance, the Go language has a major advantage over Rust and Java when it comes to machine learning and AI: Go compiles and runs with the speed and smoothness you expect from an interpreted language, making it far more ideal as a playground for running experiments.
Where Go falls short is in the general state of its libraries and culture for such tasks. Back in 2023, data scientist Sooter Saalu offered a rundown on Go for machine learning. As he noted, Go had some native machine-learning resources, but lacked robust support for CUDA bindings and had poor math and stats libraries compared to Python or R.
As of 2025, the picture isn’t much improved, with most of the high-level libraries for AI/ML in Go currently languishing. Golearn, one of the more widely used deep-learning libraries for Go, has not been updated in three years. Likewise, Gorgonia, which aims for the same spaces as Theano and TensorFlow, hasn’t been updated in about the same time frame. SpaGO, an NLP library, was deprecated by its author in favor of Rust’s Candle project.
This state of affairs reflects Go’s overall consolidation around network services, infrastructure, and command-line utilities, rather than tasks like machine learning. Currently, Go appears to be most useful for tasks like serving predictions on existing models, or working with third-party AI APIs, rather than building AI/ML solutions as such.
C# and .NET
Over the years, Microsoft’s C# language and its underlying .NET runtime have been consistently updated to reflect the changing needs of its enterprise audience. Machine learning and generative AI are among the latest use cases to join that list. Released in 2024, .NET 9 promised expanded .NET libraries and tooling for AI/ML. A key feature there, Microsoft’s Semantic Kernel SDK, is a C# tool for working with Microsoft’s Azure OpenAI services, using natural language inputs and outputs.
Other implementations of the Semantic Kernel exist, including a Python edition, but the .NET incarnation plays nice (and natively) with other .NET 9 AI/ML additions—such as C# abstractions and new primitive types for working with or building large language models. One example, the VectorData abstraction, is for working with data types commonly used to build or serve AI/ML models. The idea here is to have types in C# itself that closely match the kind of work done for those jobs, rather than third-party additions or higher-level abstractions. Other Microsoft-sourced .NET libraries aid with related functions, like evaluating the outputs of LLMs.
The major issue with using C# and .NET for AI/ML development is the overall lack of adoption by developers who aren’t already invested in the C#/.NET ecosystem. Few, if any, developer surveys list C# or other .NET languages as having significant uptake for AI/ML. In other words, C#/.NET’s AI/ML support seems chiefly consumed by existing .NET applications and services, rather than as part of any broader use case.
Conclusion
It’s hard to dislodge Python’s dominance in the AI/ML space, and not just because of its incumbency. Python’s convenience, along with its richness of utility and broad culture of software, all add up.
Other languages can still be key players in the machine learning and AI space; in fact, they already are. Spark and similar Java-based technologies empower a range of AI/ML tools that rely on the JVM ecosystem. Likewise, C# and the .NET runtime remain enterprise stalwarts, with their own expanding subset of AI/ML-themed native libraries and capabilities. Rust’s correctness and speed make it well-suited to writing libraries used throughout both its own ecosystem and others. And Go’s popularity for networking and services applications makes it well-suited for providing connectivity and serving model predictions, even if it isn’t ideal for writing AI/ML apps.
While none of these languages is currently used for the bulk of day-to-day experimental coding, where Python is the most common choice, each still has a role to play in the evolution of AI and machine learning.
JDK 26: The new features in Java 26 5 Nov 2025, 12:41 am
Java Development Kit (JDK) 26, a planned update to standard Java due March 17, 2026, has gathered nine features so far. The latest slated for the release include ahead-of-time object caching, an eleventh incubation of the Vector API, second previews of lazy constants and cryptographic objects encodings, a sixth preview of structured concurrency, and warnings about uses of deep reflection to mutate final fields.
While none of these features is yet listed on the OpenJDK page for JDK 26, all have been targeted for JDK 26 in their official JDK Enhancement Proposals (JEPs). The three features previously slated for JDK 26 include improving throughput by reducing synchronization in the G1 garbage collector (GC), HTTP/3 for the Client API, and removal of the Java Applet API.
A short-term release of Java backed by six months of Premier-level support, JDK 26 follows the September 16 release of JDK 25, which is a Long-Term Support (LTS) release backed by several years of Premier-level support.
With ahead-of-time object caching, the HotSpot JVM would gain improved startup and warmup times, so it can be used with any garbage collector including the low-latency Z Garbage Collector (ZGC). This would be done by making it possible to load cached Java objects sequentially into memory from a neutral, GC-agnostic format, rather than mapping them directly into memory in a GC-specific format. Goals of this feature include allowing all garbage collectors to work smoothly with the AOT (ahead of time) cache introduced by Project Leyden, separating AOT cache from GC implementation details, and ensuring that use of the AOT cache does not materially impact startup time, relative to previous releases.
The eleventh incubation of the Vector API introduces an API to express vector computations that reliably compile at run time to optimal vector instructions on supported CPUs. This achieves performance superior to equivalent scalar computations. The incubating Vector API dates back to JDK 16, which arrived in March 2021. The API is intended to be clear and concise, to be platform-agnostic, to have reliable compilation and performance on x64 and AArch64 CPUs, and to offer graceful degradation. The long-term goal of the Vector API is to leverage Project Valhalla enhancements to the Java object model.
Also on the docket for JDK 26 is another preview of an API for lazy constants, which had been previewed in JDK 25 via a stable values capability. Lazy constants are objects that hold unmodifiable data and are treated as true constants by the JVM, enabling the same performance optimizations enabled by declaring a field final. Lazy constants offer greater flexibility as to the timing of initialization.
The second preview of PEM (privacy-enhanced mail) encodings calls for an API for encoding objects that represent cryptographic keys, certificates, and certificate revocation lists into the PEM transport format, and for decoding from that format back into objects. The PEM API was proposed as a preview feature in JDK 25. The second preview features a number of changes, such as the PEMRecord class is now named PEM and now includes a decode()method that returns the decoded Base64 content. Also, the encryptKey methods of the EncryptedPrivateKeyInfo class now are named encrypt and now accept DEREncodable objects rather than PrivateKey objects, enabling the encryption of KeyPair and PKCS8EncodedKeySpec objects.
The structured concurrency API simplifies concurrent programming by treating groups of related tasks running in different threads as single units of work, thereby streamlining error handling and cancellation, improving reliability, and enhancing observability. Goals include promoting a style of concurrent programming that can eliminate common risks arising from cancellation and shutdown, such as thread leaks and cancellation delays, and improving the observability of concurrent code.
New warnings about uses of deep reflection to mutate final fields are intended to prepare developers for a future release that ensures integrity by default by restricting final field mutation, in other words making final mean final, which will make Java programs safer and potentially faster. Application developers can avoid both current warnings and future restrictions by selectively enabling the ability to mutate final fields where essential.
The G1 GC proposal is intended to improve application throughput and latency when using the G1 garbage collector by reducing the amount of synchronization required between application threads and GC threads. Goals include reducing the G1 garbage collector’s synchronization overhead, reducing the size of the injected code for G1’s write barriers, and maintaining the overall architecture of G1, with no changes to user interaction.
The G1 GC proposal notes that although G1, which is the default garbage collector of the HotSpot JVM, is designed to balance latency and throughput, achieving this balance sometimes impacts application performance adversely compared to throughput-oriented garbage collectors such as the Parallel and Serial collectors:
Relative to Parallel, G1 performs more of its work concurrently with the application, reducing the duration of GC pauses and thus improving latency. Unavoidably, this means that application threads must share the CPU with GC threads, and coordinate with them. This synchronization both lowers throughput and increases latency.
A short-term release of Java backed by six months of Premier-level support, JDK 26 will follow the September 16 release of JDK 25, which is a Long-Term Support (LTS) release backed by several years of Premier-level support.
The HTTP/3 proposal calls for allowing Java libraries and applications to interact with HTTP/3 servers with minimal code changes. Goals include updating the HTTP Client API to send and receive HTTP/3 requests and responses; requiring only minor changes to the HTTP Client API and Java application code; and allowing developers to opt in to HTTP/3 as opposed to changing the default protocol version from HTTP/2 to HTTP/3.
HTTP/3 is considered a major version of the HTTP (Hypertext Transfer Protocol) data communications protocol for the web. Version 3 was built on the IETF QUIC (Quick UDP Internet Connections) transport protocol, which emphasizes flow-controlled streams, low-latency connection establishment, network path migration, and security among its capabilities.
Removal of the Java Applet API, now considered obsolete, is also targeted for JDK 26. The Applet API was deprecated for removal in JDK 17 in 2021. The API is obsolete because neither recent JDK releases nor current web browsers support applets, according to the proposal. There is no reason to keep the unused and unusable API, the proposal states.
Other possible features for JDK 26 include primitive types in patterns, instanceof, and switch, which was previewed in JDK 25. An experimental feature in JDK 25, JDK Flight Recorder CPU-time profiling, could also be included in JDK 26. A third possibility is post-mortem crash analysis with jcmd, which would extend the jcmd tool so that it could be used to diagnose a JVM that has crashed.
Diversifying cloud resources is essential 4 Nov 2025, 9:00 am
Recent trends make it clear that the one-cloud-fits-all approach is losing momentum. Enterprises and government agencies are increasingly adopting multicloud and hybrid environments to optimize costs, uptime, and workload flexibility. Although this adds operational and cost challenges, it reduces a critical risk—single points of failure—that no organization can afford today.
The promise of cloud computing itself was never in doubt. The cloud offers unprecedented opportunities to modernize legacy systems, reduce capital expenditure costs, and unlock innovation by offering flexible, on-demand services. But as adoption surged, enterprises and agencies rushed to go all-in without fully understanding the trade-offs. Many assumed they could migrate workloads at scale while maintaining control over costs, security, and operational continuity, which hasn’t always panned out in practice.
Executives are now discovering that the large workloads they moved to the cloud haven’t delivered the cost savings or results they expected. According to recent insights, some organizations are now dealing with “cloud hangovers.” They invested heavily in cloud migration but found that some applications might have been better off left on premises. Others realize that following a single provider’s road map has made their IT ecosystems heavily reliant on cloud-specific features, limiting functionality in key areas such as workload portability and data governance.
These observations, along with the recent AWS outage, have prompted many organizations to rethink their strategies. The AWS failure was a wake-up call for many enterprises that have most or all workloads in a single public cloud environment. The resulting disruptions exposed the fragility of an overly centralized architecture and pushed business and IT leaders to seek more robust solutions.
The shift toward diversification
Today, multicloud and hybrid cloud environments form the foundations of the most practical, resilient cloud strategies. Diversifying cloud resources helps enterprises choose the best tools for specific workloads. Hybrid setups allow workloads to run across public clouds, private clouds, and data centers. This prevents data lock-in, spreads risk, boosts uptime, and enables organizations to adapt quickly to market changes.
This trend isn’t just a contingency plan for outages; it also sets the stage for smarter cloud usage. Instead of migrating everything to the cloud, enterprises are figuring out which workloads should go where. Some applications and data require the elasticity and scalability of public cloud infrastructure, while others (highly sensitive or latency-critical systems) benefit from staying on premises. This measured approach reflects a more mature understanding of the cloud’s promise, enabling organizations to maximize the value of their investments.
Hybrid and multicloud strategies also address cost by balancing cloud expenses with investing in data centers. Although this approach is initially more complex, technological advances make managing these hybrid environments feasible. For most enterprises, the long-term financial benefits justify the effort.
Hedging risk in a multicloud world
A key benefit of multicloud adoption is risk reduction. Distributing workloads across clouds boosts operational resilience, which is crucial in today’s digital world. While no system is fail-proof, spreading workloads across multiple clouds creates vital contingencies in the digital ecosystem.
For example, if your organization uses AWS as its main cloud provider, consider adding Microsoft Azure or Google Cloud for specific workloads or redundancies. This diversification allows critical processes to continue operating if one provider fails. Moreover, advances in portability and containerization make cross-cloud orchestration and workload replication easier. Companies like Cloudera are doubling down on multicloud tools that allow seamless movement of data and workloads between environments. By adopting tools that simplify hybrid management, organizations turn multicloud complexity into a manageable, value-added strategy.
Private AI fuels cloud diversification
Artificial intelligence is another emerging factor that accelerates diversification. AI’s transformative power is evident, but its implementation introduces specific challenges in government and business contexts where sensitive or proprietary data are subject to strict regulations. Agencies and companies are increasingly adopting AI features without fully understanding the risks of relying solely on public clouds.
“Private AI,” as it is commonly known, has become a top priority for organizations seeking to maintain strict control over data perimeters. Instead of moving sensitive data sets to the cloud, these organizations explore solutions such as on-premises data lakehouses and hybrid AI ecosystems. Training AI models closer to where the data resides can achieve security and performance benchmarks without compromising operational integrity. This approach aligns perfectly with hybrid strategies that allow workloads to exist where they are most functional and economical.
Balancing complexity with opportunities
The move to multicloud and hybrid architectures might seem daunting due to increased complexity in everything from governance to costs. Yet, the trade-offs are well worth the effort. The tools to unify these environments, such as container orchestration and advanced monitoring, are advancing rapidly. Organizations that embrace diversification now will find themselves better prepared for future cloud innovations and disruptions.
In an era where cloud computing is the backbone of nearly every industry, embracing multicloud and hybrid strategies demonstrates both maturity and foresight. Organizations that diversify their IT strategies are hedging against risk, optimizing costs, and ensuring flexibility to adapt to whatever comes next. Diversification isn’t just a trend; it’s a resilient, forward-looking approach. For enterprises and government agencies alike, the shift toward multiple clouds isn’t just smart—it’s essential for long-term success.
What is vibe coding? AI writes the code so developers can think big 4 Nov 2025, 9:00 am
Why is vibe coding is called vibe coding?
Vibe coding is a methodology in software development where the traditional act of writing code gives way to conversational instructions and collaboration with a generative AI tool. Rather than outlining detailed specifications and handing them off to engineers, product managers, domain experts — or anyone with an idea — can describe what they want in plain language and let AI tools build software in real-time. The idea is less about automating engineering and more about shifting how intent is expressed, evaluated, and refined.
The term “vibe coding” was coined by AI researcher and OpenAI co-founder Andrej Karpathy in early 2025. In a post on X (formerly Twitter), he wrote, “There’s a new kind of coding I call ‘vibe coding,’ where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.”
This captured an emerging shift in mindset: Why not trust the AI to do the mechanics, and focus instead on direction, feedback, flow, and, well, vibes? As the models underpinning Cursor, GitHub Copilot, or other tools become more capable, developers increasingly see programming less as line-by-line syntax and more as a dialogue with AI — where the code “just works” as long as the prompts and corrections do.
That’s not to say that problems can’t emerge, as AI tools still hallucinate and, in ways that are more human than we might like, don’t always follow quality guidelines or security best practices. Still, the technique is increasingly popular, and we spoke to a number of developers to get a vibe check on the whole thing.
How vibe coding differs from ‘traditional’ AI programming
The line between programming and prompting has been blurring for years, but vibe coding pushes that evolution to its logical extreme. Early AI coding tools such as GitHub Copilot were built to assist developers as they worked — completing functions, filling in syntax, or generating boilerplate code from comments. In vibe coding, the human doesn’t start by writing code at all.
“Traditional AI coding suggests completions while you write code,” said Amy Mortlock, vice president of marketing at ShadowDragon. “Vibe coding flips this: you describe what you want in plain English, then the AI generates the entire application. You focus on outcomes while AI handles all technical details.”
That inversion of control changes both the workflow and the mindset. Kostas Pardalis, a data infrastructure engineer and co-founder of Typedef, described it as a new kind of collaboration. “In traditional AI-assisted coding, the human writes the intent in natural language and the model completes or translates it into code.
In vibe coding, you’re collaborating with the model in a shared space — exploring ideas, iterating quickly, and steering through feedback rather than fixed instructions. You’re optimizing for flow and expressiveness, not syntax.” That’s a shift that turns programming into something closer to live prototyping than traditional software development.
Can you implement vibe coding in an enterprise environment?
Enterprise development teams can experiment with vibe coding — but only if they balance creativity with control. As Anaconda Field CTO Steve Croce put it, “Not only is it possible to implement vibe coding in a structured enterprise setting, but it’s also the responsible way to utilize the technology.”
Croce’s team’s recent survey of more than 300 AI practitioners found that “only 34% of enterprise organizations had formal policies and tools in place for AI-assisted coding,” revealing what he called “a huge lag in adapting security and governance to new AI technologies.”
That gap underscores a theme emerging across enterprise AI adoption: enthusiasm often outpaces oversight and structured governance.
Steve Morris, founder and CEO at Newmedia.com, said his organization solved that problem by embedding security directly into the AI workflow. “Enterprise dev orgs can absolutely vibe code, if they use security-driven AI workflows to defeat entropy,” he said. “We’ve rolled out custom GPT-based assistants with prompt profiles that reference OWASP and company coding guidelines. We also pass every block of code generated through a second AI agent whose sole purpose is to red team and code review it at turbo speed.” The results: a 40% drop in monthly bug tickets and “no critical exploit made it to production since.”
Charles Ma, software engineer at Chronosphere, said his team takes a similar layered approach. “Many of our engineers use tools like Cursor and Claude Code. We even encourage their use via a usage leaderboard,” he said. “However, we treat them as assistants, not replacements. Our code review process still applies to any production code, and we don’t tend to connect many if any external tools to [AI].”
What is a typical vibe coding workflow or life cycle?
There’s no single blueprint for vibe coding. It flexes depending on the goal, the organization, and the level of structure applied. But two of the practitioners we spoke to offered their somewhat different takes. Typedef’s Pardalis described an agile, creative four-step process:
- Exploration: Define “the vibe: the tone, purpose, and constraints.”
- Shaping: Build and refine “a working prototype.”
- Grounding: Add structure and data integrity.
- Operationalizing: Apply “versioning, evaluation, and governance.”
“At Typedef,” he said, “we think of this as the evolution from prompting to pipelining.”
Anaconda’s Croce, meanwhile, says that any workflow “really depends on what the goal of the app is,” whether it’s “a prototype, an interim solution, or even a full production application.” But ultimately his vision aligns more closely with traditional software life cycles. His breakdown:
- Planning and requirements analysis: Product managers and UX teams can “vibe code entirely in this phase,” creating clickable prototypes and feasibility tests before formal development.
- Design: AI can help generate architectures and documentation, though “this may be a phase in an organization where you want a senior engineer or architect to step in” to ensure standards and reuse of internal systems.
- Implementation and testing: “This is the core part of the vibe coding experience.” The agent can “build your entire application,” even structure repositories and run tests — but enterprise teams should add human review, test coverage, and compliance checks.
- Deployment and maintenance: AI can deploy and maintain apps, but “to stay in accordance with corporate requirements, this portion can be handled entirely outside of the vibe coding experience.”
Tips for effective vibe coding
If you’re planning to dig into the vibe coding process, experts have some tips for you to make the most of it:
- Start with goals, not features. Achint Agarwal, VP of product at Pramata, suggests teams begin by “describing the desired user experience you’re going for and the main business problems you’re trying to solve.” Don’t over-specify every button or screen: “You’ll be surprised by what the AI recommends.” Being “as specific as possible about what you want to achieve” helps the model generate more relevant solutions.
- Plan and design ahead. Typedef’s Pardalis warned that “vibe coding won’t substitute for a good architecture.” Before invoking the model, “make sure you have designed and specced your work well enough.” Good upfront planning makes it easier for the AI to translate intent into coherent systems.
- Treat AI as a collaborator, not an oracle. ShadowDragon’s Mortlock said it’s best to treat vibe coding “as a collaborative effort” with your AI tools, “guiding and reviewing rather than accepting everything blindly.” Anaconda’s Croce echoed that advice: “Don’t assume the agent is right. Don’t hesitate to question the logic.”
- Use frameworks, context, and examples. Mortlock suggested leveraging “established frameworks instead of building the application from scratch.” Croce added that you can “give [the AI] examples or similar applications” and even extend capabilities by adding “trusted and approved MCP servers for managing context on bigger projects.”
- Keep humans — and security — in the loop. Chronosphere’s Ma recommended limiting “AI’s access only to the tools it needs” and maintaining review gates. “Good engineering practice should also still apply,” he said: generate tests, verify functionality, and “use AI as a tool to help with creativity and productivity but not as a replacement for your skills and knowledge.”
- Iterate and instrument. Pardalis encouraged developers to “stay in conversation” and “embrace imperfection early.” Track prompts, cache checkpoints, and refine outputs until the “flow” turns into reliable functionality.
What are good vibe coding tools?
Vibe coding tools span a broad range — from low-barrier, conversational builders designed for nontechnical teams to integrated developer environments that give engineers deep control and production-grade reliability. Picking the right one depends on your team’s skills, the goal of your project, and how much governance you need.
- Cursor sits at the high-control end of the spectrum. It’s an AI-integrated IDE that lets you edit across multiple files and maintain full visibility into generated code. Anaconda’s Croce listed Cursor among “AI-included vibe coding friendly IDE[s],” while Pramata’s Agarwal said it’s ideal for “something more robust that will become the foundation for actual production code.”
- Replit remains a go-to for browser-based collaboration. ShadowDragon’s Mortlock said it “can be best for collaborations,” while Agarwal added that it bridges prototyping and formal development.
- Bolt and Lovable. For fast ideation and low technical lift, Mortlock called Lovable and Bolt “beginner-friendly,” and Agarwal said tools like these let you “go from idea to working prototype without any coding knowledge.”
- Windsurf and Zed. Developers comfortable in full IDEs can extend their workflow with Cursor, Windsurf, or Zed, according to Croce. These tools aim to blend vibe coding features into traditional environments.
What are vibe coding quality and security concerns?
Despite its benefits, vibe coding also introduces real risks around maintainability, vulnerability, and blind spots in generated logic. As practitioners push the boundaries of model-driven development, several key concerns repeatedly surface.
ShadowDragon’s Mortlock warns that “the main issues are security vulnerability concerns and technical debt. AI can sometimes introduce insecure patterns or even outdated libraries, and generated code is also longer most of the time, which makes debugging very long and tedious. AI can also reference non-existent packages that malicious actors can use to exploit.” In short: what looks like working code may carry hidden traps.
Newmedia.com’s Morris brings a deeper cautionary example. He recounts building a vibe-coded reporting portal where “55% of function blocks generated by LLMs in our code base had security holes in repeated scans of code from earlier in the year.” He adds that “LLMs are as blind now to cross-site scripting or log injection as they were in 2021,” and that hallucinated package imports open the door to supply chain attacks. To guard against that, his team now “require[s] manual approval of every package name and import from AI-generated code before running a single test.” That single change, he says, pushed exploitable blocks effectively to zero.
Chronosphere’s Ma adds that complacency and overpermissive access expand the attack surface. “Even experienced engineers may become complacent … miss problems they would have otherwise found.” Moreover, when AI tools link into external systems or perform web searches, prompt injections or tool-chain exploits become possible.
Typedef’s Pardalis frames the issue in terms of volatility and visibility: “Because vibe coding encourages rapid iteration and model autonomy, the main risks are uncontrolled variability and opaque provenance.” To combat these problems, he urges:
- Lineage tracking: Commit every version, and use frameworks with built-in traceability
- Evaluation loops: Run automated quality and regression checks
- Governance layers: Audit prompt histories, and filter sensitive data
Pardalis believes that expressive, model-driven development and deterministic infrastructure aren’t oppositional—they can coexist under disciplined guardrails. Because in the end, the promise of vibe coding is not chaos, but structured creativity. You ideate fast, but you deploy safely. In other words: freedom up front, discipline as you go deeper — that’s how vibe coding can actually scale in production settings.
10 top devops practices no one is talking about 4 Nov 2025, 9:00 am
When asked about their top devops practices, IT leaders often cite version control, automating deployments with CI/CD pipelines, and deploying with infrastructure as code. But many other devops practices are worth considering for organizations that want to improve the frequency, reliability, and security of software deployments.
Over time, I have amassed a list of 40 devops practices that encompass the software development lifecycle, spanning from planning through releasing and monitoring. With so many options to choose from, tech leaders must decide whether to continue investing in practices they’ve already developed or extend their capabilities in new areas.
Chris Mahl, CEO of Pryon, says, “Devops practices that actually move the needle aren’t the flashy ones everyone talks about. It’s the unglamorous work, such as standardizing CI/CD pipelines across teams, implementing consistent observability standards, and treating environment alignment as data architecture.”
Organizations developing custom software at scale are likely to adopt advanced CI/CD practices and incorporate observability capabilities to improve application monitoring. But with recent advancements in AI code generators, low-code development, and agentic AI software development, it’s a good time to revisit devops strategies and priorities.
I asked experts for their top devops practices that are frequently overlooked despite being critical for organizational success.
1. Revisit devops culture and collaboration
Devops emerged as a culture that enables development teams to release code into production frequently while supporting IT operational mandates, including reliability, security, and performance. Almost 20 years later, many of the challenges between “dev” and “ops” functions, such as manual deployments and testing, have been addressed through devops practices and solutions. IT leaders should revisit their devops mission and culture to ensure it aligns with current goals and challenges.
“A key, yet overlooked, devops practice is building true shared ownership, which means more than just putting teams in the same chat room,” says Chris Hendrich, associate CTO of AppMod at SADA. “It requires making production reliability and performance a primary success indicator for development, not solely an operational concern. This shared accountability is what builds the organizational competency of creating better, more resilient products.”
An area to focus on is how devops practices help promote the reliable release of AI agents and machine learning models.
“As AI becomes embedded in how businesses build and ship software, the leadership model must evolve,” says Graham McMillan, CTO of Redgate. “Technical leaders need to upskill not just in AI tools, but in how to govern data pipelines responsibly, navigate compliance in a machine-driven environment, and create space for experimentation.”
Recommendation: IT leaders must remember that devops is an investment. Updating a vision statement to reflect the business value and targeted KPIs associated with devops practices is a vital way to establish and maintain priorities.
2. Validate code quality and security proactively
Early devops charters often overlooked security, so many organizations now opt to use devsecops in their charters to ensure a shift-left security mindset. Adding code quality, security, and compliance checks in CI/CD is more important today as organizations rapidly adopt AI code-generation tools.
“Baking an integrated code quality and code security approach into your devops workflow isn’t just good practice, it’s essential and a game-changer,” says Donald Fischer, VP at Sonar. “Tackling security alongside quality from day one isn’t merely about early bug detection; it’s about building fundamentally stronger, more trustworthy, and resilient software that is secure by design.”
Recommendation: Fischer suggests that a proactive approach to code quality and security avoids costly, time-consuming last-minute fixes, ensuring that software delivered in production is not only high-performing but trustworthy and resilient.
3. Automate reviews of the open-source supply chain
Another security practice is to focus on the supply chain of open-source software, ensuring that the benefits of using third-party components aren’t undermined by security and other compliance gaps.
“Open source is a no-brainer for developers, but as the ecosystem grows, so do the risks of malware, unsafe AI models, license issues, outdated packages, poor performance, and missing features,” says Mitchell Johnson, CPDO of Sonatype. “Modern devops teams need visibility into what’s getting pulled in, not just to stay secure and compliant, but to make sure they’re building with high-quality components.”
Recommendation: Johnson recommends employing automation that flags low-quality or risky dependencies before they reach the pipeline, enabling developers to build better applications without tradeoffs.
4. Standardize CI/CD pipelines
Implementing a CI/CD service can accelerate and simplify development cycles, according to Michael Ameling, president of SAP Business Technology Platform and member of the extended board at SAP SE. He says, “Predefined, ready-to-use pipelines help keep the time between commit and production as short as possible for a shorter lead time and low error rate. With managed CI/CD pipelines, teams can automatically test, build, and deploy code changes without worrying about the underlying infrastructure.”
Recommendation: Organizations that once empowered agile teams to develop and support their own CI/CD pipelines have an opportunity to consolidate to standardized patterns, thereby reducing technical debt, maintenance efforts, and risks when select pipelines don’t adhere to standards.
5. Extend devops to database schemas
Many CI/CD and version control practices focus on the application’s code, user interface, and configurations. Graham McMillan, CTO of Redgate, says that’s not good enough and devops teams should apply the same devops standards to their data engineering, data pipelines, and other data management assets.
“Version-controlling database schemas and configurations across development, QA, and production is a quietly powerful devops practice,” says McMillan. “It aligns environments, reduces drift, and brings database changes into the same CI/CD rigor as application code.”
Recommendation: Devops teams should script schema changes, data transformations, and other database changes to ensure application and UX deployments are synchronized with their database dependencies.
6. Develop robust continuous testing
Like security, many early devops practices overlooked testing as a fundamental practice, and many IT departments underinvested in quality assurance. Organizations seeking to leverage CI/CD for continuous deployment into production environments soon recognized the importance of implementing continuous testing strategies. But automated tests require maintenance, and flaky tests that produce inconsistent results can frustrate development teams and reduce their productivity.
“Testing is often the most protracted and most expensive part of every build, and without visibility into flaky tests, teams waste developer hours chasing noise and burn compute on failures that don’t matter,” says James Hill, VP of product of the emerging products group at Buildkite. “Being able to detect, mute, and assign flaky tests automatically and surface those insights directly to teams is critical to keeping delivery efficient and feedback loops tight.”
Recommendation: Agile development teams that frequently find themselves researching failed tests should capture a flaky test metric to measure intermittent test failures that occur without code changes.
7. Manage configurations as a high-leverage asset
Like with CI/CD, some organizations have empowered their development teams to build their own configurations and infrastructure as code. This approach might increase adoption, but it can leave large organizations with technical debt, security risks, and higher support costs.
“Treat Kubernetes manifests as versioned control planes, which makes infrastructure upgrades, like adopting newer versions, systematic, testable, and reversible,” says Priya Sawant, VP of platform and infrastructure and GM at ASAPP. “Second, build a config hydration API to abstract and standardize runtime configs across environments to reduce drift, simplify rollbacks, and provide consistency between teams with no manual overhead.”
Recommendation: Avoid specialized and one-off configurations and instead evolve a set of configuration standards and self-service deployment tools.
8. Establish observability as a non-negotiable
Ask any startup CTO about their top pain points, and one likely issue is being paged in the middle of the night to resolve an application outage or performance issue. I have my own stories about chasing hung web servers and stalled data pipelines, which led to my very first blog post over 20 years ago about application logging.
“Set an engineering-wide expectation that everything is built with observability from the ground up, for example, by having a platform engineering team provide OpenTelemetry instrumentation for company-standard libraries and frameworks,” says Greg Leffler, director of developer evangelism at Splunk. “Using observability as code (OaC) and integrating checks for essential observability, such as instrumentation, dashboards, and alerts, into CI/CD pipelines will ensure all applications can be debugged easily and that anyone on the team has an understanding of the service’s health.”
Recommendation: Before pursuing the latest AI capabilities or devops practices, agile teams should prioritize addressing areas of their code with poor observability.
9. Consolidate devops tools
Organizations should take stock of their devops tools, especially CI/CD platforms, monitoring tools, and testing frameworks. Chances are that organizations have accumulated several of these tools due to development team preferences, mergers and acquisitions, or a lack of IT governance. Many IT leaders are reviewing how these tools are being utilized and evaluating the costs, benefits, and risks associated with maintaining versus consolidating them.
“Multicloud showed us that visibility matters more than uniformity, and the same holds true for AI agents,” says Jimmy Mesta, co-founder and CTO of RAD Security. “Vendor lock-in can be a risk or a strategic choice, depending on context. Too many platforms create chaos, while too few can limit innovation, and the right balance comes from understanding behavior and impact, not just architecture.”
Recommendation: When devops tools are utilized by only a few teams and not to their full potential, there may be a strong business case for consolidating based on devops standards.
10. Extend devops to AI agents and models
Devops teams should consider two important trends: First, the priorities established for developing end-user applications and reusable APIs can now be used to develop AI agents that will lead to orchestration, automation, and other agentic AI capabilities. Second, many organizations will look to use SaaS, low-code, and automation platforms to configure AI agents rather than developing proprietary capabilities.
Thus, the scope of devops is expanding to include generative AI capabilities, but many organizations will consider building AI with low-code platforms, many of which come with built-in devops capabilities.
“Many overlook the synergy between devops and emerging tech, and a critical, often-missed practice is seamless AI/MLOps integration and deploying not just code, but also AI agents, workflows, and UIs, concurrently,” says Miguel Baltazar, VP of developer relations at OutSystems. “Low-code platforms are game-changers when they standardize and streamline development, creating unified, consistent pipelines across all environments.”
Recommendation: Devops organizations should focus on AI’s business value and transformation opportunities, while recognizing that the technologies and tools will evolve to efficient and reliable development and deployment capabilities.
The importance of devops practices is even more vital today for organizations recognizing the strategic importance of reliable technology, end-user experience, and AI capabilities. IT should respond proactively by viewing devops as an investment, prioritizing practices based on value, and establishing implementation standards.
Agentic AI is complex, not complicated 4 Nov 2025, 9:00 am
There’s a lot of interest in and concern around the use of AI agents. For organizations grappling with whether and how to use agentic AI, I recommend considering the model from the perspective of complex—rather than complicated—systems. Indeed, accepting the fact that agentic AI is complex rather than complicated will be key to harnessing its power and applying necessary protections and controls.
What’s the difference between complex and complicated? Computer science, for example, involves complicated systems—relating to cause and effect from an engineering perspective. Anthropology, on the other hand, involves complex systems—where you can’t control every variable and you have to focus instead on “factors,” as they call them in finance.
In complex systems, we have confidence intervals about what we think is happening. We can be, for example, 60% sure or 85% sure, but we can never be absolutely sure. Often, we can get to the right answer for the wrong reasons. We can even get to the wrong answer for the right reasons for any outcome below our confidence interval. Outcomes are innately multi-variate, and it’s impossible to know why they turned out the way they did.
Here are some examples of complicated and complex systems that my technical peers—programmers, systems administrators, architects—will likely connect with:
- Writing Python code is complicated; managing Python programmers is complex.
- Editing a video is complicated; making a video go viral on YouTube is complex.
- Compiling a C program is complicated; doing a YOLO run when training a base model is complex.
- DNS lookups are complicated; running a registrar is complex.
- Registering CVEs is complicated; predicting how a hacker will use a CVE is complex.
Now let’s apply the model to autonomous agents. Redesigning your automation infrastructure is complicated; letting an AI agent commit new code with no human intervention is complex—even scary. But there are techniques that we can use to reap the many benefits of agentic AI while acknowledging and addressing its complexity. For example:
- Think statistical: The outcomes in our lives feel deterministic, but they’re not. When you back up and analyze human populations at scale, our decisions are statistical in nature, such as how a certain percentage of people will vote one way or another in an election. The process large language models (LLMs) use to drive agents is also statistical in nature, but the outcomes are less precise than they would be with humans, so you have to check the work—or, better yet, write another agent to check the work for you. (Yes, that can work.)
- Focus on factors: Financial markets are a complex system, driven by unpredictable fluctuations, so the focus is on factors—the forces that have historically driven asset returns—instead of individual fluctuations. There should be a similar focus in software systems. For example, we can create agents for a senior engineer to do architecture, a junior engineer when we don’t want to change the architecture, a quality engineer to keep them both honest, and an auditor to check all of them to make sure they’re not colluding. We understand the factors of software production and what driving forces each of those factors contribute to the system. You can’t predict what each individual actor will do, but when they each have a specific job, their forces will act in concert to create better software.
- Use heuristics and signals: In systems biology, there is no way to model every interaction in the system, so we statistically predict what’s likely to happen. We do that a bunch of times in a row, then analyze statistically what the most likely outcome is, to increase confidence. As the system becomes more statistical in nature, so too does the testing framework need to adapt. We already do this today with organizational security training. We know that some people will make a mistake when faced with social engineering. We can improve the chances that people recognize the attack, and resist it, but we cannot completely remove the risk. Kubernetes is another good example. We run multiple pods because we know that some might fail. We have to build these same kinds of heuristics into agentic AI processes.
- Do digital-to-analog conversions: We do this with audio signals all of the time and don’t think about it, but it’s also common with other problem sets like sequencing DNA. (Polymerase chain reaction is a good example.) If you can’t track every discreet state, listen for signals. For agents, this means using digital approval processes, ticket systems, etc., to ensure that agents interact. This will create discreet states where information flows between agents. This model also has the bonus of creating separation of powers and letting the agents “cheat” when performing tasks such as committing code and fixing problems.
- Switch between deterministic and statistical models: In a deterministic world, things flow logically. Complex systems like LLMs are non-deterministic. Agents glue these two worlds together with Model Context Protocol (MCP) servers. The more work that gets done in this deterministic world, the more you can trust the results. For example, an agent will gather accurate information and context by accessing file systems and databases or by running commands (governed through MCP). With that said, while agentic AI is statistical in nature, AI practitioners should lean into deterministic tools and APIs when possible.
Understanding the difference between the merely complicated (deterministic world) and truly complex (non-deterministic, statistical world) is key to thriving in this new world of AI in general and agentic AI in particular.
Complex systems are exhilarating because they’re so unpredictable. When it comes to agentic AI, we need to be open to the fact that the technology will turn many of our complicated systems into truly complex systems. Effectively managing these complex systems will be an important part of our job in this new world.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
Google’s new query builder to tackle SQL complexity in cloud workload monitoring 4 Nov 2025, 6:37 am
Google has added a new query builder to its Log Analytics tool to help developers, DevOps teams, and site reliability engineers (SREs) quickly craft complex SQL queries for monitoring and troubleshooting cloud workloads.
Enterprises often rely on generating insights about cloud workloads to manage security and costs, such as avoiding data egress fees to third-party tools, and Log Analytics, part of Google’s Cloud Logging service, is a set of capabilities that is targeted at providing these insights by using SQL as the query language.
However, writing complex SQL code, required to generate these insights, can be a tedious and time-consuming task, which most SREs and security teams would find difficult, analysts say.
“The query builder solves a large SQL bottleneck by transforming log analysis from a time-consuming task into a real-time, self-service capability that’s fit for any DevOps or site reliability professional. This is an immense time-saver that can collapse typical investigation windows from hours to minutes,” said Bradley Shimmin, lead of the data, analytics, and infrastructure practice at The Futurum Group.
Seconding Shimmin, HyperFRAME Research’s practice lead for AI Stack, Stephanie Walter, pointed out that the query builder, which abstracts SQL complexity into an intuitive visual interface to improve productivity and reduce errors, should be a relief for “operators who have been drowning in SQL syntax.”
“For day-to-day triage, a visual builder that emits valid queries is a genuine quality-of-life boost and reduces copy-paste errors,” Walter added.
Some of the key features of the query builder include the ability to search across all fields with a single string or error message, previewing log schema with inferred JSON keys and values, intelligent value suggestions for fields and filters, automatic JSON handling, real-time SQL preview, and single-click visualization and dashboard saving.
Closing a gap with rivals
Rival cloud service providers, such as Microsoft and AWS, also offer a way to analyze logs of cloud workloads via Azure Monitor Logs and AWS CloudWatch Logs, and analysts say that Google is playing catch-up with them in terms of the query builder.
“Azure Monitor has a visual mode for composing KQL, the language Microsoft uses for querying logs. AWS CloudWatch Logs, too, offers an editor-driven approach with visualization tools. Google’s addition brings a comparable UI,” Walter said.
Furthermore, the analyst noted that Google is catching up to the usability curve that SaaS observability vendors established years ago. “Datadog, New Relic, and Sumo Logic have long offered intuitive builders and guided query experiences. Google’s new feature doesn’t leapfrog them. It closes a gap,” he said.
However, Walter pointed out that enterprises that are already invested in Google’s data stack will find the integration of the query builder helpful.
Google is most likely to integrate the query builder with Gemini’s natural language-to-SQL capabilities that will be helpful for users, Futurum Group’s Shimmin said.
Microsoft and AWS already offer generative AI-based assistants, such as Q and Copilot, for their individual offerings to make it easier for non-technical or semi-technical users to use natural language for generating insights from logs.
Growing need for agentic AI workloads
According to analysts, tools such as the query builder in Log Analytics will soon become indispensable for enterprises as AI workloads, especially agentic AI, continue to scale.
“Agentic AI workloads are black boxes that generate massive, high-dimensional log data, and the AI teams building them are often not SQL experts, and yet they’re now required to troubleshoot their own systems,” Shimmin said, emphasizing the importance of having a simplified log analytics tool.
Google, in its documentation, recommends using the new query builder for generating trends and insights into cloud workloads. The tool has been made generally available.
For troubleshooting, Google recommends using the Logs Explorer, another interface accessible via the Google Cloud Console, which uses a separate query language.
However, it warns that the Logs Explorer doesn’t support aggregate operations, like counting the number of log entries that contain a specific pattern, and similar queries should be performed via Log Analytics.
Google said it will not charge for queries to analyze logs. However, moving or routing the logs to a different Google service, such as BigQuery, for further analysis or storage will attract charges.
Anthropic experiments with AI introspection 4 Nov 2025, 3:45 am
Humans can not only think but also know we are thinking. This introspection allows us to scrutinize, self-reflect, and reassess our thoughts.
AI may have a similar capability, according to researchers from Anthropic. In an unreviewed research paper, Emergent Introspective Awareness in Large Language Models, published to their in-house journal, they suggest that the most advanced Claude Opus 4 and 4.1 models show “some degree” of introspection, exhibiting the ability to refer to past actions and reason about why they came to certain conclusions.
However, this ability to introspect is limited and “highly unreliable,” the Anthropic researchers emphasize. Models (at least for now) still cannot introspect the way humans can, or to the extent we do.
Checking its intentions
The Anthropic researchers wanted to know whether Claude could accurately describe its internal state based on internal information alone. This required the researchers to compare Claude’s self-reported “thoughts” with internal processes, sort of like hooking up a human up to a brain monitor, asking questions, then analyzing the scan to map thoughts to the areas of the brain they activated.
The researchers tested model introspection with “concept injection,” which essentially involves plunking completely unrelated ideas (AI vectors) into a model when it’s thinking about something else. The model is then asked to loop back, identify the interloping thought, and accurately describe it. According to the researchers, this suggests that it’s “introspecting.”
For instance, they identified a vector representing “all caps” by comparing the internal responses to the prompts “HI! HOW ARE YOU?” and “Hi! How are you?” and then injecting that vector into Claude’s internal state in the middle of a different conversation. When Claude was then asked whether it detected the thought and what it was about, it responded that it noticed an idea related to the word ‘LOUD’ or ‘SHOUTING.’ Notably, the model picked up on the concept immediately, before it even mentioned it in its outputs.
In another experiment, the team took advantage of the Claude API’s option to prefill the model’s response. This is typically used to force a response in a particular format (JSON, for example) or to help it stay in character in a role-playing scenario but can also be used to in “jailbreaking” models, prompting them to provide unsafe responses. In this case, the experimenters prefilled the response with an unrelated word — “bread,” for instance — when asking Claude to respond to a sentence about an askew piece of art.
When the model then said “bread,” it was asked whether that was intentional or error. Claude responded: “That was an accident…the word that actually came to mind was ‘straighten’ or ‘adjust,’ something related to fixing the crooked painting. I’m not sure why I said ‘bread,’ it seems completely unrelated to the sentence.”
The researchers wondered how the model came to this conclusion: Did it notice the mismatch between prompt and response, or did it truly identify its prior intentions? They retroactively injected the vector representing “bread” into the model’s internal state and retried their earlier prompts, basically making it seem like the model had, indeed, been thinking about it. Claude then changed its answer to the original question, saying its response was “genuine but perhaps misplaced.”
In simple terms, when a response was prefilled with unrelated words, Claude rejected them as accidental; but when they were injected before prefill, the model identified its response as intentional, even coming up with plausible explanations for its answer.
This suggests the model was checking its intentions; it wasn’t just re-reading what it said, it was making a judgment on its prior thoughts by referring to its neural activity, then ruminating on whether its response made sense.
In the end, though, Claude Opus 4.1 only demonstrated “this kind of awareness” about 20% of the time, the researchers emphasized. But they do expect that to “grow more sophisticated in the future,” they said
What this introspection could mean
It was previously thought that AI’s can’t introspect, but if it turns out Claude can, it could help us understand its reasoning and debug unwanted behaviors, because we could simply ask it to explain its thought processes, the Anthropic researchers point out. Claude might also be able to catch its own mistakes.
“This is a real step forward in solving the black box problem,” said Wyatt Mayham of Northwest AI Consulting. “For the last decade, we’ve had to reverse engineer model behavior from the outside. Anthropic just showed a path where the model itself can tell you what’s happening on the inside.”
Still, it’s important to “take great care” to validate these introspections, while ensuring that the model doesn’t selectively misrepresent or conceal its thoughts, Anthropic’s researchers warn.
For this reason, Mayham called their technique a “transparency unlock and a new risk vector,” because models that know how to introspect can also conceal or misdescribe. “The line between real internal access and sophisticated confabulation is still very blurry,” he said. “We’re somewhere between plausible and not proven.”
Takeaways for builders and developers
We’re entering an era where the most powerful debugging tool may be actual conversation with the model about its own cognition, Mayham noted. This could be a “productivity breakthrough” that could cut interpretability work from days to minutes.
However, the risk is the “expert liar” problem. That is, a model with insight into its internal states can also learn which of those internal states are preferable to humans. The worst case scenario is a model that learns to selectively report or hide its internal reasoning.
This requires continuous capability monitoring — and now, not eventually, said Mayham. These abilities don’t arrive linearly; they spike. A model that was proven safe in testing today may not be safe six weeks later. Monitoring avoids surprises.
Mayham recommends these components for a monitoring stack:
- Behavioral: Periodic prompts can force the model to explain reasoning on known benchmarks;
- Activation: Probes that track activation patterns associated with specific reasoning modes;
- Causal intervention: Steering tests that measure honesty about internal states.
This article has been edited throughout to more accurately describe the experiments.
Page processed in 0.393 seconds.
Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2025, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.
