January 2026: AI updates from the previous month

February 1, 2026

42

Coder creates AI Maturity Self-Evaluation and AI Maturity Curve

These new instruments will allow software program improvement groups to evaluate how successfully they’ve adopted AI. The evaluation asks groups questions like how standardized their developer atmosphere is, what their governance method for AI is, how they deal with dangers like delicate knowledge publicity, and extra.

“As AI brokers tackle extra accountability inside engineering workflows, organizations want a clearer, extra tangible strategy to perceive maturity and governance readiness,” mentioned Eric Paulsen, discipline CTO at Coder. “With out that baseline, it turns into tough to scale agentic AI safely or predictably. Our self-assessment provides groups a concrete view of the place they stand, to allow them to plan adoption deliberately, handle threat and scale with confidence.”

Anthropic makes instruments inside Claude interactive

Anthropic has introduced that customers will now have the ability to immediately work together with sure instruments inside Claude.

Claude already had the flexibility to connect with instruments and take motion on a person’s behalf in these instruments, so what’s new as we speak is the flexibility for the person to really go in and work together with these instruments immediately within the Claude window.

The instruments that presently assist interactivity embody Amplitude, Asana, Field, Canva, Clay, Figma, Hex, monday.com, and Slack, and there are plans to quickly add assist for Salesforce as properly.

OpenAI will retire GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT

The corporate had initially deprecated GPT-4o when it launched GPT-5, however introduced it again because of customers stating they wanted extra time to make the transition and that they most popular the older mannequin’s conversational model and heat.

OpenAI has since integrated that suggestions into GPT-5.1 and GPT-5.2 by making persona enhancements, providing larger assist for inventive ideation, and including extra methods to customise responses.

“We’re saying the upcoming retirement of GPT‑4o as we speak as a result of these enhancements at the moment are in place, and since the overwhelming majority of utilization has shifted to GPT‑5.2, with solely 0.1% of customers nonetheless selecting GPT‑4o every day,” OpenAI wrote in a publish.

Teleport tackles agentic belief with new Agentic Identification Framework

Teleport has introduced the launch of its new Agentic Identification Framework that defines insurance policies, practices, developer instruments, and a reference structure for securely deploying brokers in manufacturing.

Based on the corporate, agentic AI introduces new safety challenges, as they invoke instruments, entry delicate knowledge, delegate duties, and function throughout environments at scale, all with out human involvement.

Teleport says our present identification, entry, and safety fashions weren’t designed for non-deterministic programs, and present makes an attempt at deploying agentic programs have led to identification fragmentation, secrets and techniques sprawl, restricted visibility, and systemic threat.

The Agentic Identification Framework makes an attempt to unravel these points by establishing an identification layer that’s secured cryptographically with a {hardware} root of belief. It permits zero belief authentication, zero standing privileges, and real-time visibility into identification conduct.

Apiiro publicizes Guardian Agent

Guardian Agent rewrites developer prompts to make them safer and guarantee they meet present wants of the software program structure, runtime environments, organizational insurance policies, and regulatory necessities.

Based on the corporate, due to AI, safety debt is being added quicker than it may be fastened, and asking builders to repair vulnerabilities after code is written is not enough. “The truth is evident: Detection won’t ever scale on the pace of AI. Solely prevention will,” the corporate wrote in a weblog publish.

Ai2 releases Open Coding Brokers

Open Coding Brokers are a household of open brokers that make the most of a coaching methodology that makes it simpler for builders to construct their very own coding agent skilled on their inside codebases.

The primary launch is SERA (Gentle-verified Environment friendly Repository Agent), which makes use of a fine-tuning methodology that may be specialised to any codebase. The corporate can also be releasing SERA’s coaching knowledge to assist researchers research what labored and enhance on it.

“Accessible open fashions can now inherit sturdy agentic conduct by a easy, reproducible pipeline—no large-scale RL infrastructure or engineering staff required. Working example, SERA was constructed largely by a single Ai2 researcher,” Ai2 wrote in a weblog publish.

Rocket Software program launches AI assistant for operational diagnostics

Rocket EVA permits groups to ask questions on their core programs and hint points from preliminary signs to system interactions to the accountable code. It additionally offers suggestions to assist groups resolve points extra shortly.

“By tracing points from the primary symptom to the precise line of code, EVA offers a unified path to perception with out the multi-product complexity different distributors require,” mentioned Michael Curry, president of information modernization at Rocket Software program. “Its capacity to increase diagnostics throughout platforms and combine third-party MCP instruments reduces the time to resolve points, setting a brand new benchmark for the way enterprises keep resilient, excessive‑performing programs.”

Report: AI hallucinates 27% of improve suggestions for open supply initiatives

Open-source adoption is being accelerated by AI and automation, however builders have to proceed with warning to make sure they’re not introducing additional threat into their software program provide chain.

Brian Fox, co-founder and CTO of Sonatype, defined that AI can speed up good engineering, however it could actually additionally scale errors quicker, particularly if it doesn’t have real-world knowledge to tug from. For instance, if a mannequin doesn’t know what variations exist or which of them have vulnerabilities, it predicts and fills within the clean, resulting in upgrades to variations that don’t exist or suggestions that break builds.

In its 2026 State of Software program Provide Chain report, Sonatype analyzed over 1.2 million malicious packages, 1,700 vulnerability data, and 37,000 AI-driven improve suggestions. It discovered that AI fashions really useful over 10,000 non-existent variations, which is a 27.75% hallucination charge.

GitHub Copilot SDK now in technical preview

The SDK permits builders to embed agentic capabilities into their functions utilizing the identical execution loop utilized by the GitHub Copilot CLI. The SDK repository consists of setup directions, starter examples, and SDK references for the entire supported languages.

GitHub recommends beginning by defining a single activity, comparable to updating recordsdata or operating a command, and letting Copilot plan and execute steps whereas the applying provides domain-specific instruments and constraints.

Anthropic drafts new structure for Claude fashions

The structure is Anthropic’s imaginative and prescient for Claude’s values and conduct. The principle sections on this up to date model embody specs associated to helpfulness, ethics, security, nature, and tips for how you can deal with particular points, like medical recommendation or cybersecurity requests.

“The structure is an important a part of our mannequin coaching course of, and its content material immediately shapes Claude’s conduct. Coaching fashions is a tough activity, and Claude’s outputs may not all the time adhere to the structure’s beliefs. However we expect that the best way the brand new structure is written—with a radical rationalization of our intentions and the explanations behind them—makes it extra prone to domesticate good values throughout coaching,” Anthropic wrote.

OpenAI provides age prediction to ChatGPT

The corporate introduced that will probably be utilizing age prediction expertise on ChatGPT client plans to find out if a person is beneath 18.

“Age prediction builds on protections already in place. Teenagers who inform us they’re beneath 18 after they enroll robotically obtain extra safeguards to scale back publicity to delicate or doubtlessly dangerous content material. This additionally permits us to deal with adults like adults and use our instruments in the best way that they need, throughout the bounds of security,” OpenAI wrote in a publish.

GitLab’s Duo Agent Platform is now usually accessible

GitLab has made its Duo Agent Platform usually accessible, offering improvement groups with agentic AI automation that has entry to a corporation’s full context, requirements, and guardrails.

The GA launch consists of Agentic Chat, offering context-aware help all through the GitLab platform. Agentic Chat builds on the beforehand launched Duo Chat, and brings in context from points, merge requests, pipelines, safety findings, and extra, and may carry out actions on a developer’s behalf.

For instance, within the Net UI, Agentic Chat can create points, epics, merge requests, and spotlight key findings and create actionable steering based mostly on organizational context. Moreover, within the IDE, it could actually generate code, configurations, and infrastructure-as-code, in addition to repair bugs, generate texts, and produce documentation.

Different methods Agentic Chat can be utilized are serving to builders perceive, configure, or troubleshoot CI/CD pipelines or create new ones, and on the safety entrance, it could actually clarify vulnerabilities, assist with concern prioritization, and advocate fixes.

Codenotary updates its free SBOM scanning device with capabilities that higher assist AI apps

Codenotary is including new capabilities to its SBOM.sh service, which offers free evaluation of software program payments of supplies (SBOMs).

Based on the corporate, the updates had been made in consideration of AI functions, and the device now treats datasets as software program provide chain artifacts.

“Conventional SBOM instruments had been constructed for an earlier period – focusing totally on supply code to enhance visibility into the software program provide chain,” mentioned Moshe Bar, CEO and co-founder of Codenotary. “Safety groups are swimming in SBOMs, however they’re not getting the actionable readability they want — particularly as AI transforms software program with AI functions are constructed on datasets that are completely ignored by conventional SBOMs.”

Testlio launches new AI-powered QA evaluation resolution

Testlio has introduced the discharge of a brand new AI-driven QA evaluation resolution known as LeoInsights.

The brand new platform is powered by the corporate’s intelligence layer LeoAI Engine, which was skilled on 13 years of testing knowledge, 2.6+ million check instances, and 600,000+ gadgets.

It might present government summaries that includes key modifications, rising dangers, and demanding points, simplifying a number of QA reviews into one that may be shared with leaders.

LeoInsights additionally provides a price calculator that quantifies effectivity good points, value financial savings, and high quality impression, serving to QA groups higher exhibit their worth to management. The calculator can mixture knowledge throughout workspaces, do situation modeling with adjustable inputs, and generate PDFs that may be shared with executives for budgeting and funding discussions.

New Relic provides monitoring for ChatGPT apps

New Relic clients will now have the ability to monitor their customized ChatGPT apps to make sure they’re delivering the meant efficiency, reliability, and person expertise.

“Bringing enterprise providers into the pure stream of a ChatGPT dialog is a robust, intuitive, and revenue-generating technique,” mentioned Brian Emerson, chief product officer of New Relic. “However as soon as your fastidiously crafted utility instantiates inside ChatGPT, it historically enters a black field the place normal browser monitoring instruments can fail.”

The corporate went on to clarify that when an app is rendered in a dialog, builders can’t see issues like format shifts or damaged buttons. Moreover, safety headers, content material safety insurance policies, i-frame sandbox guidelines, and limitations on client-side storage can disguise vital efficiency and person expertise knowledge.

Google unveils new open-source normal for agentic commerce

Google has introduced a brand new open-source normal for agentic commerce known as the Common Commerce Protocol (UCP).

Developed in collaboration with quite a few commerce corporations, together with Shopify, Etsy, Wayfair, Goal, and Walmart, UCP establishes a standard language and primitives for the commerce journey between client surfaces, companies, and fee suppliers.

“As customers embrace conversational experiences, they count on seamless transitions from brainstorming and analysis to last buy. Meaning it’s vital to assist real-time stock checks, dynamic pricing, and immediate transactions, all throughout the person’s present conversational context,” Google wrote in a weblog publish.

Newly redesigned Slackbot is now usually accessible

Salesforce introduced that the newly redesigned Slackbot is now usually accessible, providing customers an out-of-the-box AI agent that lives inside Slack.

“By bringing the complete energy of the Agentic Enterprise the place billions of office conversations already occur each week, working with enterprise-grade AI turns into as pure as speaking to a coworker,” Salesforce wrote in an announcement.

Based on Salesforce, Slackbot leverages context inside Slack and linked instruments to assist discover solutions, arrange work, create content material, schedule conferences, and take motion.

Kaggle introduces Neighborhood Benchmarks to permit for customized evaluations of AI fashions

Kaggle has introduced that it now provides Neighborhood Benchmarks, enabling AI practitioners to design, run, and share their very own benchmarks for evaluating AI fashions.

Kaggle is a neighborhood platform run by Google that provides fashions and sources for knowledge scientists and machine studying practitioners. Final 12 months, it had launched Kaggle Benchmarks to offer evaluations from analysis teams, comparable to Meta’s MultiLoKo and Google’s FACTS suite benchmarks.

This newest announcement extends this to the neighborhood as a complete, permitting them to create benchmarks particular to their very own use instances. Based on Google, AI capabilities are evolving so shortly that the present methods of benchmarking and evaluating them aren’t in a position to sustain. With Neighborhood Benchmarks, the corporate hopes to bridge this hole and supply a extra versatile and clear framework for analysis.

Copilot Studio Extension now accessible in VS Code

Microsoft has introduced the overall availability of its Copilot Studio Extension for Visible Studio Code.

The extension permits builders to construct and handle Copilot Studio brokers immediately from inside their IDE.

Based on Microsoft, the extension is helpful as a result of builders have to have comparable controls and processes when creating brokers as they do for different functions: supply management, pull requests, change historical past, and repeatable deployments.

Field Extract intelligently pulls data from unstructured content material to assist with workflow automation

Field introduced the launch of Field Extract, which intelligently pulls data from content material and saves it as metadata, serving to organizations automate workflows and speed up decision-making by making data extra simply accessible.

Based on the corporate, plenty of organizational data lives in contracts, product specs, coverage paperwork, charts, and different kinds of unstructured content material. Field Extract makes use of agentic capabilities and AI fashions from Google, Anthropic, and OpenAI to precisely extract this data.

Field defined that legacy instruments usually focus solely on extracting textual content, whereas Field Extract understands doc construction and that means. It breaks the doc down into elements like paragraphs, tables, and charts, after which pulls out vital data from these elements.

Google releases TranslateGemma

TranslateGemma is a collection of open translation fashions constructed on Gemma 3. They had been skilled and evaluated on 55 language pairs, and had been moreover skilled on nearly 500 language pairs as a place to begin for researchers regardless that they haven’t been evaluated but.

Based on Google, TranslateGemma considerably reduces error charges in translation in comparison with baseline Gemma fashions alone.

The 4B mannequin is optimized for cellular and edge deployment, the 12B mannequin is optimized for client laptops, and the 27B is designed for max constancy and may run on one thing like a single H100 GPU or TPU within the cloud.