
Microsoft has launched ExCyTIn-Bench, an open-source benchmarking software developed to evaluate the efficiency of AI programs in cybersecurity investigations.
The software simulates multistage cyberattack eventualities in a safety operations centre (SOC) surroundings constructed on Microsoft Azure, utilizing dwell queries throughout 57 log tables from Microsoft Sentinel and associated companies.

Entry deeper trade intelligence
Expertise unmatched readability with a single platform that mixes distinctive information, AI, and human experience.
Its methodology displays the information quantity and operational complexity that safety groups encounter throughout actual incidents.
In contrast to earlier benchmarks that depend on static information or multiple-choice questioning, ExCyTIn-Bench generates question-answer units from incident graphs constructed by human analysts.
These bipartite alert-entity graphs permit for assessments grounded in genuine SOC information, requiring AI fashions to plan and execute investigative steps throughout a number of information sources.
The benchmark produces granular, stepwise suggestions on every investigative motion, transferring past binary pass-fail grading.
Microsoft applies ExCyTIn-Bench internally to check AI-driven safety features and determine detection or workflow gaps in its personal fashions.
The corporate additionally makes use of it to guage integrations with Microsoft Safety Copilot, Microsoft Sentinel, and Microsoft Defender, monitoring each mannequin efficiency and related operational prices.
The framework goals to supply chief data safety officers (CISO)s, IT leaders, and patrons a constant technique of evaluating AI capabilities in safety contexts.
By capturing how AI brokers decompose investigative targets, work together with instruments, and synthesise proof, ExCyTIn-Bench addresses the constraints seen in benchmarks based mostly on static proof or trivia-style questioning.
Microsoft factors out that even current trade efforts comparable to CyberSOCEval don’t absolutely seize the requirement for brokers to work together with dwell, noisy information in a managed SOC surroundings.
ExCyTIn-Bench is obtainable as an open-source useful resource on GitHub, with Microsoft inviting participation from mannequin builders and safety groups.
The corporate indicated that future updates would come with choices for tailoring benchmarks to particular risk eventualities on the buyer tenant stage.
In September 2025, Microsoft built-in Anthropic’s Claude fashions into Copilot Studio, enhancing its present assist for OpenAI’s giant language fashions.
The rollout has began for early launch clients and might be out there in preview throughout all environments inside two weeks, with full manufacturing deployment anticipated by the tip of 2025.

