ExplorerData ScienceMachine Learning
Research PaperResearchia:202607.03004

Online Safety Monitoring for LLMs

Mona Schirmer

Abstract

Despite alignment training, LLMs remain prone to generating unsafe outputs at deployment time. Monitoring outputs online and raising an alarm when safety can no longer be assumed is therefore critical. We study a simple real-time monitor that turns a verifier signal from an external model into an alarm decision by thresholding, with the threshold calibrated via risk control. In experiments on mathematical reasoning and red teaming datasets, we show that this simple design is competitive with mor...

Submitted: July 3, 2026Subjects: Machine Learning; Data Science

Description / Details

Despite alignment training, LLMs remain prone to generating unsafe outputs at deployment time. Monitoring outputs online and raising an alarm when safety can no longer be assumed is therefore critical. We study a simple real-time monitor that turns a verifier signal from an external model into an alarm decision by thresholding, with the threshold calibrated via risk control. In experiments on mathematical reasoning and red teaming datasets, we show that this simple design is competitive with more advanced monitors based on sequential hypothesis testing.


Source: arXiv:2607.02510v1 - http://arxiv.org/abs/2607.02510v1 PDF: https://arxiv.org/pdf/2607.02510v1 Original Link: http://arxiv.org/abs/2607.02510v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jul 3, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark
Online Safety Monitoring for LLMs | Researchia