Close Menu
ManufacturingManufacturing
  • Home
  • Automation
  • Industrial Data & AI
  • Innovation
  • Leadership
  • Sustainability
  • More
    • Digital Transformation
    • Web Stories
    • Press Release
    • Spotlight
What's On

All aboard: the NIST Cybersecurity for IoT Program is headed to our next stop! Share your input on where we’re headed during our Future Directions Two-Day Workshop on March 31st.

16 April 2026
Manufacturing in America – Contributing to Our Economy, Employment, and Innovation

Manufacturing in America – Contributing to Our Economy, Employment, and Innovation

16 April 2026
NIST Spectroradiometry Short Course | NIST

NIST Spectroradiometry Short Course | NIST

15 April 2026
NIST’s Glenn Bell Is Awarded Institution of Structural Engineers Gold Medal

NIST’s Glenn Bell Is Awarded Institution of Structural Engineers Gold Medal

15 April 2026
Building a Path Forward for Meaningful Cannabis Breathalyzer Realization

Building a Path Forward for Meaningful Cannabis Breathalyzer Realization

15 April 2026
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
ManufacturingManufacturing
Subscribe
  • Home
  • Automation
  • Industrial Data & AI
  • Innovation
  • Leadership
  • Sustainability
  • More
    • Digital Transformation
    • Web Stories
    • Press Release
    • Spotlight
ManufacturingManufacturing
Home » Analyzing Transcripts from AI Agent Evaluations
Industrial Data & AI

Analyzing Transcripts from AI Agent Evaluations

manufacturing.com.deBy manufacturing.com.de15 April 2026No Comments2 Mins Read
Facebook Twitter LinkedIn Telegram Pinterest Tumblr Reddit WhatsApp Email
Analyzing Transcripts from AI Agent Evaluations
Share
Facebook Twitter LinkedIn Pinterest Email

In December, CAISI published a write-up on how AI models can cheat on agentic evaluations, including lessons from our experience building and using AI-enabled transcript analysis tools to find and fix examples of cheating from our evaluations.

In that post, we highlighted the potential of AI-enabled transcript analysis tools to help evaluators scale their capacity to detect measurement issues in evaluations — particularly as they evaluate agentic AI systems that can work on tasks for longer periods of time. We emphasized the need for continued collaboration on shared practices and tooling to help the evaluation community adopt, scale and improve transcript review practices.

Recently, we contributed several of the practices and takeaways we identified in our research to a new joint research paper with the UK AI Security Institute and other AI evaluators. The paper outlines a multi-step process for building and using transcript review tools, from preparing log data to designing and validating a scanner in an iterative loop. At each step, it provides concrete examples and implementation considerations, based on experiences and takeaways aggregated from evaluators’ different transcript analysis projects and use cases.

The paper also includes implementation case studies using a new open-source transcript analysis framework, Inspect Scout, built by the UK AISI working closely with Meridian Labs. We’ve been able to collaborate with the developers to inform the design of features based on our own use cases, and are excited to see the development of more technical frameworks and tools that can help enable the wider adoption of transcript analysis by the AI evaluation community.

We’re excited to share these collaboratively developed examples and practices to aid other evaluators, and to continue our work to contribute to frameworks, tools, and practices that can help advance more rigorous, valid, and impactful AI measurement science.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email
Previous ArticleInsights into AI Agent Security from a Large-Scale Red-Teaming Competition
Next Article Cheating On AI Agent Evaluations

Related Posts

All aboard: the NIST Cybersecurity for IoT Program is headed to our next stop! Share your input on where we’re headed during our Future Directions Two-Day Workshop on March 31st.

16 April 2026

Accelerating AI Innovation Through Measurement Science

15 April 2026
Analyzing Transcripts from AI Agent Evaluations

Reflections from the Second NIST Cyber AI Profile Workshop

15 April 2026
Top Posts
Manufacturing in America – Contributing to Our Economy, Employment, and Innovation

Manufacturing in America – Contributing to Our Economy, Employment, and Innovation

16 April 2026
NIST Spectroradiometry Short Course | NIST

NIST Spectroradiometry Short Course | NIST

15 April 2026
NIST’s Glenn Bell Is Awarded Institution of Structural Engineers Gold Medal

NIST’s Glenn Bell Is Awarded Institution of Structural Engineers Gold Medal

15 April 2026

Subscribe to Updates

Get the latest Manufacturing news and updates directly to your inbox.

© 2026 Manufacturing. All Rights Reserved.
  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.