ExplorerData ScienceMachine Learning
Research PaperResearchia:202605.23049

Lumberjack: Better Differentially Private Random Forests through Heavy Hitter Detection in Trees

Christian Janos Lebeda

Abstract

Random forests are widely used in fields involving sensitive tabular data, but existing approaches to enforcing differential privacy (DP) typically degrade performance to the point of impracticality. In this paper, we introduce Lumberjack, a differentially private random forest algorithm that achieves substantially higher utility by constructing large random decision trees and then applying aggressive, privacy-preserving pruning to retain only sufficiently populated nodes. A key component of our...

Submitted: May 23, 2026Subjects: Machine Learning; Data Science

Description / Details

Random forests are widely used in fields involving sensitive tabular data, but existing approaches to enforcing differential privacy (DP) typically degrade performance to the point of impracticality. In this paper, we introduce Lumberjack, a differentially private random forest algorithm that achieves substantially higher utility by constructing large random decision trees and then applying aggressive, privacy-preserving pruning to retain only sufficiently populated nodes. A key component of our approach is a novel (ε,δ)(\varepsilon,δ)-DP heavy hitter detection algorithm for hierarchical data, whose error is Oε,δ(logh)O_{\varepsilon,δ}(\sqrt{\log h}) for trees of height hh and may be of independent interest. This favorable scaling enables the use of significantly deeper trees than in prior work, leading to improved expressiveness under privacy constraints. Our empirical evaluation on benchmark datasets shows that Lumberjack consistently outperforms prior DP random forest methods, establishing a new state of the art. In particular, our approach yields substantial improvements in the privacy-utility trade-off for practical privacy budgets. Our findings suggest that carefully designed DP random forests can close much of the utility gap, highlighting a promising and underexplored direction for future research.


Source: arXiv:2605.22756v1 - http://arxiv.org/abs/2605.22756v1 PDF: https://arxiv.org/pdf/2605.22756v1 Original Link: http://arxiv.org/abs/2605.22756v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 23, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark
Lumberjack: Better Differentially Private Random Forests through Heavy Hitter Detection in Trees | Researchia