August 2025: AI updates from the previous month

August 31, 2025

59

Anthropic begins testing a Claude extension for Chrome

The extension will allow Claude to take motion on web sites on behalf of the person. “We’ve spent current months connecting Claude to your calendar, paperwork, and plenty of different items of software program. The following logical step is letting Claude work straight in your browser,” the corporate says.

The corporate is beginning off with a small pilot of 1,000 Max plan customers, and can steadily develop this system out to extra folks if the pilot goes properly.

In response to Anthropic, one of many large security challenges with brokers that use the browser is immediate injection assaults, and among the steps the corporate has taken to defend in opposition to them are offering site-level permissions and requiring motion confirmations. This pilot will take a look at how properly these defenses maintain up in real-world eventualities.

Google integrates Gemini CLI into Zed code editor

Google introduced that it has introduced the Gemini CLI to the open supply code editor, Zed. The brand new integration will allow Zed customers to generate and refactor code within the editor, get instantaneous solutions on code or error messages, and chat naturally within the terminal.

Builders will be capable of comply with alongside dwell with the Gemini agent because it makes adjustments. As soon as the agent is finished working, Zed will show the adjustments in a evaluation interface that exhibits a transparent diff for every edit that may be reviewed, accepted, or modified, offering the identical degree of management as a code evaluation.

Customers will even be capable of present context past the codebase by pointing the agent to exterior sources like a URL with documentation or an API spec.

Microsoft packs Visible Studio August replace with smarter AI options

Microsoft has launched the August replace for Visible Studio 2022, including a number of options associated to AI-assisted growth.

The corporate introduced that GPT-5 is now built-in into the IDE, and help for MCP is usually obtainable as properly. MCP help permits builders to authenticate with any OAuth supplier straight from the IDE, carry out one-click set up of MCP servers, and handle MCP entry from GitHub coverage settings.

Copilot Chat was up to date with the power to floor related code snippets extra reliably utilizing improved semantic code search to find out when queries ought to set off a code lookup. Builders can now join fashions from OpenAI, Google, and Anthropic to Visible Studio Chat, as properly.

Agent Mode in Gemini Code Help now obtainable in VS Code and IntelliJ

This mode was launched final month to the Insiders Channel for VS Code to develop the capabilities of Code Help past prompts and responses to help actions like a number of file edits, full venture context, and built-in instruments and integration with ecosystem instruments.

Since being added to the Insiders Channel, a number of new options have been added, together with the power to edit code adjustments utilizing Gemini’s Inline diff, user-friendly quota updates, real-time shell command output, and state preservation between IDE restarts.

Individually, the corporate additionally introduced new agentic capabilities in its AI Mode in Search, corresponding to the power to set dinner reservations primarily based on components like celebration dimension, date, time, location, and most well-liked kind of meals. U.S. customers opted into the AI Mode experiment in Labs will even now see outcomes which can be extra particular to their very own preferences and pursuits. Google additionally introduced that AI Mode is now obtainable in over 180 new nations.

GitHub’s coding agent can now be launched from anyplace on platform utilizing new Brokers panel

GitHub has added a brand new panel to its UI that permits builders to invoke the Copilot coding agent from anyplace on the location.

From the panel, builders can assign background duties, monitor working duties, or evaluation pull requests. The panel is a light-weight overlay on GitHub.com, however builders may open the panel in full-screen mode by clicking “View all duties.”

The agent will be launched from a single immediate, like “Add integration checks for LoginController” or “Repair #877 utilizing pull request #855 for example.” It may additionally run a number of duties concurrently, corresponding to “Add unit take a look at protection for utils.go” and “Add unit take a look at protection for helpers.go.”

Anthropic provides Claude Code to Enterprise, Group plans

With this change, each Claude and Claude Code will likely be obtainable underneath a single subscription. Admins will be capable of assign customary or premium seats to customers primarily based on their particular person roles. By default, seats embody sufficient utilization for a typical workday, however extra utilization will be added in periods of heavy use. Admins may create a most restrict for further utilization.

Different new admin settings embody a utilization analytics dashboard and the power to deploy and implement settings, corresponding to device permissions, file entry restrictions, and MCP server configurations.

Microsoft provides Copilot-powered debugging options for .NET in Visible Studio

Copilot can now counsel acceptable areas for breakpoints and tracepoints primarily based on present context. Equally, it could troubleshoot non-binding breakpoints and stroll builders by way of the potential trigger, corresponding to mismatched symbols or incorrect construct configurations.

One other new function is the power to generate LINQ queries on huge collections within the IEnumerable Visualizer, which renders knowledge right into a sortable, filterable tabular view. For instance, a developer may ask for a LINQ question that may floor problematic rows inflicting a filter difficulty. Moreover, builders can hover over any LINQ assertion and get an evidence from Copilot on what it’s doing, consider it in context, and spotlight potential inefficiencies.

Copilot may now assist builders cope with exceptions by summarizing the error, figuring out potential causes, and providing focused code repair strategies.

Groundcover launches observability answer for LLMs and brokers

The eBPF-based observability supplier groundcover introduced an observability answer particularly for monitoring LLMs and brokers.

It captures each interplay with LLM suppliers like OpenAI and Anthropic, together with prompts, completions, latency, token utilization, errors, and reasoning paths.

As a result of groundcover makes use of eBPF, it’s working on the infrastructure layer and may obtain full visibility into each request. This permits it to do issues like comply with the reasoning path of failed outputs, examine immediate drift, or pinpoint when a device name introduces latency.

IBM and NASA launch open-source AI mannequin for predicting photo voltaic climate

The mannequin, Surya, analyzes excessive decision photo voltaic remark knowledge to foretell how photo voltaic exercise impacts Earth. In response to IBM, photo voltaic storms can harm satellites, affect airline journey, and disrupt GPS navigation, which might negatively affect industries like agriculture and disrupt meals manufacturing.

The photo voltaic pictures that Surya was skilled on are 10x bigger than sometimes AI coaching knowledge, so the workforce has to create a multi-architecture system to deal with it.

The mannequin was launched on Hugging Face.

Preview of NuGet MCP Server now obtainable

Final month, Microsoft introduced help for constructing MCP servers with .NET after which publishing them to NuGet. Now, the corporate is asserting an official NuGet MCP Server to combine NuGet package deal data and administration instruments into AI growth workflows.

“For the reason that NuGet package deal ecosystem is all the time evolving, massive language fashions (LLMs) get out-of-date over time and there’s a want for one thing that assists them in getting data in realtime. The NuGet MCP server offers LLMs with details about new and up to date packages which were printed after the fashions in addition to instruments to finish package deal administration duties,” Jeff Kluge, principal software program engineer at Microsoft, wrote in a weblog put up.

Opsera’s Codeglide.ai lets builders simply flip legacy APIs into MCP servers

Codeglide.ai, a subsidiary of the DevOps firm Opsera, is launching its MCP server lifecycle platform that may allow builders to show APIs into MCP servers.

The answer continually displays API adjustments and updates the MCP servers accordingly. It additionally offers context-aware, safe, and stateful AI entry with out the developer needing to put in writing customized code.

In response to Opsera, massive enterprises might preserve 2,000 to eight,000 APIs — 60% of that are legacy APIs — and MCP offers a method for AI to effectively work together with these APIs. The corporate says that this new providing can cut back AI integration time by 97% and prices by 90%.

Confluent publicizes Streaming Brokers

Streaming Brokers is a brand new function in Confluent Cloud for Apache Flink that brings agentic AI into knowledge stream processing pipelines. It permits customers to construct, deploy, and orchestrate brokers that may act on real-time knowledge.

Key options embody device calling through MCP, the power to hook up with fashions or databases utilizing Flink, and the power to complement streaming knowledge with non-Kafka knowledge sources, like relational databases and REST APIs.

“Even your smartest AI brokers are flying blind in the event that they don’t have recent enterprise context,” mentioned Shaun Clowes, chief product officer at Confluent. “Streaming Brokers simplifies the messy work of integrating the instruments and knowledge that create actual intelligence, giving organizations a stable basis to deploy AI brokers that drive significant change throughout the enterprise.”

Anthropic expands Claude Sonnet 4’s context window to 1M tokens

With this bigger context window, Claude can course of codebases with 75,000+ traces of code in a single request. This permits it to raised perceive venture structure, cross-file dependencies, and make strategies that match with the whole system design.

Longer context home windows are actually in beta on the Anthropic API and Amazon Bedrock, and can quickly be obtainable in Google Cloud’s Vertex AI.

For prompts over 200K tokens, pricing will enhance to $6 / million tokens (MTok) for enter and $22.50 / MTok for output. The pricing for requests underneath 200K tokens will likely be $3 / MTok for enter and $15 / MTok for output.

The corporate additionally prolonged its studying mode designed for college students into Claude.ai and Claude Code. Studying mode asks customers inquiries to information then by way of ideas as a substitute of offering speedy solutions, to advertise essential pondering of issues.

OpenAI provides GPT-4o as a legacy mannequin in ChatGPT

With this replace, paid customers will now be capable of choose GPT-4o when utilizing ChatGPT, together with different fashions like o3, GPT-4.1, and GPT-5 Considering mini.

The mannequin picker for GPT-5 additionally now consists of Auto, Quick, and Considering mode. Quick prioritizes giving the quickest solutions, pondering prioritizes giving deeper solutions that take longer to assume by way of, and auto chooses between the 2.

The corporate additionally elevated the message restrict for Plus and Group customers to three,000 per week on GPT-5 Considering.

Google releases Gemma 3 270M

This new mannequin is “designed from the bottom up for task-specific fine-tuning with sturdy instruction-following and textual content structuring capabilities already skilled in,” in line with Google.

It’s excellent in conditions the place there’s a high-volume, well-defined job; velocity and price issues; person privateness must be protected; or there’s a need for a fleet of specialised job fashions.

Each pretrained and instruction tuned variations of the mannequin can be found for obtain from Hugging Face, Ollama, Kaggle, LM Studio, and Docker. Alternatively, the fashions will be tried out in Vertex AI.

NVIDIA releases newest fashions in Llama Nemotron household

Llama Nemotron are a household of reasoning fashions, and the most recent updates embody a brand new hybrid mannequin structure, compact quantized fashions, and a configurable pondering price range to offer builders extra management over token technology.

This mixture lets the fashions cause extra deeply and reply quicker, with no need extra time or computing energy. This implies higher outcomes at a decrease value,” the corporate wrote in an announcement.

Google’s coding agent Jules will get critique performance

Google is enhancing its AI coding agent, Jules, with new performance that critiques and critiques code whereas Jules remains to be engaged on it.

“In a world of fast iteration, the critic strikes the evaluation to earlier within the course of and into the act of technology itself. This implies the code you evaluation has already been interrogated, refined, and stress-tested … Nice builders don’t simply write code, they query it. And now, so does Jules,” Google wrote in a weblog put up.

In response to the corporate, the coding critic is sort of a peer reviewer who’s accustomed to code high quality ideas and is “unafraid to level out if you’ve reinvented a dangerous wheel.”

GitHub to be folded into Microsoft’s CoreAI org

GitHub’s CEO Thomas Dohmke has introduced his plans to go away the corporate on the finish of the yr.

In a memo to staff, he mentioned that Microsoft doesn’t plan to exchange him; fairly, GitHub and its management workforce will now function underneath Microsoft’s CoreAI group, a bunch throughout the firm targeted on creating AI-powered instruments, together with GitHub Copilot.

“At this time, GitHub Copilot is the chief of essentially the most profitable and thriving market within the age of AI, with over 20 million customers and counting,” he wrote. “We did this by innovating forward of the curve and displaying grit and dedication when challenged by the disruptors in our house. In simply the final yr, GitHub Copilot grew to become the primary multi-model answer at Microsoft, in partnership with Anthropic, Google, and OpenAI. We enabled Copilot Free for hundreds of thousands and launched the synchronous agent mode in VS Code in addition to the asynchronous coding agent native to GitHub.”

Sentry launches MCP monitoring device

Utility monitoring firm Sentry is making it simpler to realize visibility into MCP servers with the launch of a brand new monitoring device.

With MCP monitoring, builders can perceive issues like which shoppers are experiencing errors, which instruments are most used, or which instruments are working sluggish. They’ll additionally correlate errors with occasions like visitors spikes or new launch deployments, or determine if errors are solely occurring on one kind of transport.

In response to Cody De Arkland, head of developer expertise at Sentry, when Sentry launched its personal MCP server, it was getting over 30 million requests per 30 days. He mentioned that at that scale, it’s inevitable that errors will happen, and current monitoring instruments have been battling MCP servers.

bitHuman launches SDK for creating AI avatars

AI firm bitHuman has introduced a visible SDK for creating avatars to be used as chat brokers, instructors, digital coaches, companions, and consultants in several fields.

In response to the corporate, the SDK permits avatars to be created on Arm-based and x86 methods and not using a GPU. The avatars have a small footprint and will be run on-line or offline on gadgets like Chromebooks, Mac Minis, and Raspberry Pis.

Due to their small footprint, these characters will be dropped at a variety of environments, together with lecture rooms, kiosks, cell apps, or edge gadgets.

OpenAI launches GPT-5

OpenAI introduced the supply of GPT-5, which it says is “smarter throughout the board” in comparison with earlier fashions.

Particularly for coding, GPT-5 achieved important enchancment in advanced front-end technology and debugging bigger repositories. Early testers mentioned that it made higher design decisions by way of spacing, typography, and white house, in line with the corporate.

“We predict you’ll love utilizing GPT-5 way more than any earlier AI,” CEO Sam Altman mentioned throughout the livestream. “It’s helpful. It’s good. It’s quick. It’s intuitive.”

Anthropic releases Claude Opus 4.1

This newest replace improves the mannequin’s analysis and knowledge evaluation expertise, and achieves 74.5% on SWE-bench Verified (in comparison with 72.5% on Opus 4).

It’s obtainable to paid Claude customers, in Claude Code, and on Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI.

The corporate plans to launch bigger enhancements throughout its fashions within the coming weeks as properly.

AWS introduces Automated Reasoning checks to scale back AI hallucinations

Automated Reasoning checks are a part of Amazon Bedrock Guardrails, and validate the accuracy of AI generated content material in opposition to area data. In response to AWS, this function offers 99% verification accuracy.

This was first launched as a preview at AWS re:Invent, and with this normal availability launch, a number of new options are being added, together with help for big paperwork in a single construct, simplified coverage validation, automated state of affairs technology, enhanced coverage suggestions, and customizable validation settings.

Google provides Gemini CLI to GitHub Actions

This new providing is designed to behave as an agent for routine coding duties. At launch, it consists of three workflows: clever difficulty triage, pull request critiques, and the power to say @gemini-cli in any difficulty or pull request to delegate duties.

It’s obtainable in beta, and Google is providing free-of-charge quotas for Google AI Studio. It’s also supported in Vertex AI and Customary and Enterprise tiers of Gemini Code Help.

OpenAI publicizes two open weight reasoning fashions

OpenAI is becoming a member of the open weight mannequin sport with the launch of gpt-oss-120b and gpt-oss-20b.

Gpt-oss-120b is optimized for manufacturing, excessive reasoning use circumstances, and gpt-oss-20b is designed for decrease latency or native use circumstances.

In response to the corporate, these open fashions are similar to its closed fashions by way of efficiency and functionality, however at a a lot decrease value. For instance, gpt-oss-120b working on an 80 GB GPU achieved comparable efficiency to o4-mini on core reasoning benchmarks, whereas gpt-oss-20b working on an edge machine with 16 GB of reminiscence was similar to o3-mini on a number of widespread benchmarks.

Google DeepMind launches Genie 3

Genie 3 is a frontier mannequin for producing actual world environments. It may mannequin bodily properties of the world, like water, lighting, and environmental actions.

Customers may use prompts to vary the generated world so as to add new objects and characters or change climate circumstances, for instance.

In response to DeepMind, this analysis is vital as a result of it could allow AI brokers to be skilled in a wide range of simulated environments.