ExplorerData ScienceMachine Learning
Research PaperResearchia:202606.18005

The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalog using Machine Learning

V. Samuel Pérez-Díaz

Abstract

We present a framework to cross-match sources from the Chandra Source Catalog (CSC v2.1) with optical sources from Gaia Data Release 3. Unlike purely spatial approaches, we use source properties such as magnitudes, colors, and distances to identify true counterparts, detect chance coincidences, and resolve ambiguities when multiple plausible candidates exist. We define a training set of high-confidence matches using NWAY, a Bayesian cross-matching framework that accounts for positional errors an...

Submitted: June 18, 2026Subjects: Machine Learning; Data Science

Description / Details

We present a framework to cross-match sources from the Chandra Source Catalog (CSC v2.1) with optical sources from Gaia Data Release 3. Unlike purely spatial approaches, we use source properties such as magnitudes, colors, and distances to identify true counterparts, detect chance coincidences, and resolve ambiguities when multiple plausible candidates exist. We define a training set of high-confidence matches using NWAY, a Bayesian cross-matching framework that accounts for positional errors and source densities. We train a gradient-boosted classifier (LightGBM) on a variety of features from both catalogs. Of the ~254254k unique X-ray sources, we find counterparts for ~113113k sources, of which plausible multiple counterparts are found for ~77k. We find no counterparts for ~2020k sources for which separation-based cross-matching does find a match, and attribute half of these to chance coincidences. We validate the pipeline on the Chandra Orion Ultradeep Project (COUP), where the machine-learning matches reproduce 95% of NWAY cross-matches without using any positional information. We release a catalog of the ~113113k Chandra-Gaia counterparts, together with ~77k alternative matches and ~2020k ambiguous NWAY associations, supporting future population studies of sources detectable by both Chandra and Gaia. We discuss limitations and provide a generalization of the framework that is applicable in other cross-matching scenarios.


Source: arXiv:2606.19329v1 - http://arxiv.org/abs/2606.19329v1 PDF: https://arxiv.org/pdf/2606.19329v1 Original Link: http://arxiv.org/abs/2606.19329v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 18, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark