Microsoft launches open-source software to evaluate AI efficiency

October 15, 2025

9

Microsoft utilises ExCyTIn-Bench internally to guage AI-driven safety features and pinpoint detection or workflow gaps in its fashions. Credit score: nitpicker/Shutterstock.com.

Microsoft has launched ExCyTIn-Bench, an open-source benchmarking software developed to evaluate the efficiency of AI programs in cybersecurity investigations.

The software simulates multistage cyberattack eventualities in a safety operations centre (SOC) surroundings constructed on Microsoft Azure, utilizing dwell queries throughout 57 log tables from Microsoft Sentinel and associated companies.

Entry deeper trade intelligence

Expertise unmatched readability with a single platform that mixes distinctive information, AI, and human experience.

Discover out extra

Its methodology displays the information quantity and operational complexity that safety groups encounter throughout actual incidents.

In contrast to earlier benchmarks that depend on static information or multiple-choice questioning, ExCyTIn-Bench generates question-answer units from incident graphs constructed by human analysts.

These bipartite alert-entity graphs permit for assessments grounded in genuine SOC information, requiring AI fashions to plan and execute investigative steps throughout a number of information sources.

The benchmark produces granular, stepwise suggestions on every investigative motion, transferring past binary pass-fail grading.

Microsoft applies ExCyTIn-Bench internally to check AI-driven safety features and determine detection or workflow gaps in its personal fashions.

The corporate additionally makes use of it to guage integrations with Microsoft Safety Copilot, Microsoft Sentinel, and Microsoft Defender, monitoring each mannequin efficiency and related operational prices.

The framework goals to supply chief data safety officers (CISO)s, IT leaders, and patrons a constant technique of evaluating AI capabilities in safety contexts.

By capturing how AI brokers decompose investigative targets, work together with instruments, and synthesise proof, ExCyTIn-Bench addresses the constraints seen in benchmarks based mostly on static proof or trivia-style questioning.

Microsoft factors out that even current trade efforts comparable to CyberSOCEval don’t absolutely seize the requirement for brokers to work together with dwell, noisy information in a managed SOC surroundings.

ExCyTIn-Bench is obtainable as an open-source useful resource on GitHub, with Microsoft inviting participation from mannequin builders and safety groups.

The corporate indicated that future updates would come with choices for tailoring benchmarks to particular risk eventualities on the buyer tenant stage.

In September 2025, Microsoft built-in Anthropic’s Claude fashions into Copilot Studio, enhancing its present assist for OpenAI’s giant language fashions.

The rollout has began for early launch clients and might be out there in preview throughout all environments inside two weeks, with full manufacturing deployment anticipated by the tip of 2025.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Microsoft launches open-source software to evaluate AI efficiency

Entry deeper trade intelligence

CrowdStrike to reinforce id safety with SGNL acquisition

Cyera secures Collection F as demand for enterprise AI safety grows

Snowflake to amass AI observability agency Observe

LEAVE A REPLY Cancel reply

Most Popular

Elon Musk says X’s new algorithm might be made open supply subsequent week

ASSISI TRAVEL GUIDE: Greatest Issues to Do, The place to Keep, Meals & Native Suggestions

TLM Color Altering Basis – Flawless Protection, Light-weight System, Adapts to Pores and skin Tone, Hides Wrinkles & Traces, Hydrating & Lengthy-Lasting, BB Cream...

Cook dinner with Coloration Nesting Plastic Mixing Bowl Set with Pour Spouts and Handles, Black Speckled, 6 Piece Non-Slip

Recent Comments

POPULAR PRODUCTS

Porseme 500ml Glass Important Oil Diffuser Aromatherapy Ultrasonic Cool Mist Humidifier 15-21 Operating Hours Waterless Auto-Off Air Diffusers for Sleeping Yoga Workplace Working Spa...

One A Day Multivitamin For Males 50 Plus – Day by day Nutritional vitamins For Males With Nutritional vitamins A, B, C, D, E,...

Everlast FIT Mild Train Stretch Band for Stretching, Flexibility, Pilates, Yoga, Ballet, Gymnastics and Rehabilitation

POPULAR POSTS

Elon Musk says X’s new algorithm might be made open supply subsequent week

ASSISI TRAVEL GUIDE: Greatest Issues to Do, The place to Keep, Meals & Native Suggestions

TLM Color Altering Basis – Flawless Protection, Light-weight System, Adapts to Pores and skin Tone, Hides Wrinkles & Traces, Hydrating & Lengthy-Lasting, BB Cream...

POPULAR CATEGORY

ABOUT US

FOLLOW US

Microsoft launches open-source software to evaluate AI efficiency

Entry deeper trade intelligence

Join our every day information round-up!

Give your enterprise an edge with our main trade insights.

LEAVE A REPLY Cancel reply

Most Popular

Recent Comments

POPULAR PRODUCTS

POPULAR POSTS

POPULAR CATEGORY

ABOUT US

FOLLOW US