A machine learning framework can distinguish molecules made by biological processes from those formed through non-biological processes and could be used to analyze samples returned by current and future planetary missions.

Wall-e_graffiti_on_hoardings_at_the_former_Civic_Centre,_Plymouth,_Devon_-_November_2023_(2)

Source: Mutney

Wall-e graffiti by Spray Saint on hoardings at the former Civic Centre, Plymouth, Devon - November 2023

José C. Aponte, Amirali Aghazadeh, and colleagues analyzed eight carbonaceous meteorites and ten terrestrial geologic samples using two-dimensional gas chromatography coupled with high-resolution time-of-flight mass spectrometry.

READ MORE: Breathing alien air: the search for biosignatures on exoplanets

READ MORE: Efforts to find alien life could be boosted by simple test that triggers microbes

Using this data, the authors developed LifeTracer, a computational framework that processes mass spectrometry data and applies machine learning to identify patterns distinguishing abiotic from biotic origins.

Biosignature detection

A logistic regression model trained on compound-level features achieved over 87% accuracy in classifying samples as meteoritic or terrestrial. The analysis identified 9,475 peaks in meteorite samples and 9,070 in terrestrial samples, with statistically significant differences between the two sample types in molecular weight distributions and retention times, which describes how long it takes the compound to move through the chromatograph’s two columns.

Low-Res_LifeTracer

Source: Saeedi et al.

Visualization of the distribution of compounds in meteoritic samples and terrestrial geologic samples and the regression coefficients of the logistic regression model trained in LifeTracer.

Organic compounds in meteorite samples showed significantly lower retention times, consistent with higher volatility in abiotically formed materials. The framework identified polycyclic aromatic hydrocarbons and alkylated variants as key predictive features, with naphthalene emerging as the most predictive compound for abiotic samples.

According to the authors, the approach enables scalable, unbiased biosignature detection and could be a powerful tool for interpreting complex organic mixtures that will be returned by current and future planetary sample return missions.

The paper is published in PNAS Nexus.