ExplorerComputational LinguisticsNLP
Research PaperResearchia:202602.10026

Paradox of De-identification: A Critique of HIPAA Safe Harbour in the Age of LLMs

Lavender Y. Jiang

Abstract

Privacy is a human right that sustains patient-provider trust. Clinical notes capture a patient's private vulnerability and individuality, which are used for care coordination and research. Under HIPAA Safe Harbor, these notes are de-identified to protect patient privacy. However, Safe Harbor was designed for an era of categorical tabular data, focusing on the removal of explicit identifiers while ignoring the latent information found in correlations between identity and quasi-identifiers, which...

Submitted: February 10, 2026Subjects: NLP; Computational Linguistics

Description / Details

Privacy is a human right that sustains patient-provider trust. Clinical notes capture a patient's private vulnerability and individuality, which are used for care coordination and research. Under HIPAA Safe Harbor, these notes are de-identified to protect patient privacy. However, Safe Harbor was designed for an era of categorical tabular data, focusing on the removal of explicit identifiers while ignoring the latent information found in correlations between identity and quasi-identifiers, which can be captured by modern LLMs. We first formalize these correlations using a causal graph, then validate it empirically through individual re-identification of patients from scrubbed notes. The paradox of de-identification is further shown through a diagnosis ablation: even when all other information is removed, the model can predict the patient's neighborhood based on diagnosis alone. This position paper raises the question of how we can act as a community to uphold patient-provider trust when de-identification is inherently imperfect. We aim to raise awareness and discuss actionable recommendations.


Source: arXiv:2602.08997v1 - http://arxiv.org/abs/2602.08997v1 PDF: https://arxiv.org/pdf/2602.08997v1 Original Link: http://arxiv.org/abs/2602.08997v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Feb 10, 2026
Topic:
Computational Linguistics
Area:
NLP
Comments:
0
Bookmark
Paradox of De-identification: A Critique of HIPAA Safe Harbour in the Age of LLMs | Researchia