Abstract
Geologic carbon storage (GCS) is a promising pathway to mitigate greenhouse gas emissions. Injecting readily
available, impure CO2
from emission sources can lower costs, but even minor impurity fractions can alter trapping dynamics and compromise storage performance. This study develops and validates machine-learning (ML)
models to predict CO2
trapping behavior in saline aquifers and quantify the impact of common impurities on
storage outcomes. We combine compositional reservoir simulations with laboratory evidence to identify the key
controls on storage efficiency and to benchmark ML predictions against physics-based simulators. Exploratory
data analysis (EDA) guided feature engineering was performed for key predictors (e.g., injection rate, impurity
fraction, temperature, salinity, permeability, and porosity). A Pearson correlation heatmap was developed to detect collinearity among input variables. Trend analyses across impurities highlighted systematic shifts in trapping
mechanisms (structural, residual, solubility, mineral), indicating the need to account for interaction effects between time and impurity fraction. Four supervised ML models, including Random Forest (RF), Extreme Gradient
Boosting (XGB), Gradient Boosting (GB), and Adaptive Gradient Boosting (AGB), were trained on 126,525 data
points generated from 723 simulation cases, each recorded over 175 time steps, to forecast CO2
trapping behavior.
The models achieve high fidelity (𝑅2 ≥ 0.99) in predicting trapping metrics and reproduce the sensitivity of storage performance to impurity levels. Numerical experiments indicate that co-injecting CO2
with impurities such
as N2
, H2S, and CH4
can change both trapping efficiency and plume migration. For example, adding 10% CH4
reduces solubility trapping by ∼1 Mt of CO2
, while adding 10% N2
increases the horizontal migration distance
by ∼23% after 30 years of injection. Impurities also affect geochemistry, influencing pH, mineral dissolution,
and precipitation. Among the mixtures studied, CO2–CH4
mixture yields the highest structural trapping and the
lowest solubility trapping compared to pure CO2
and CO2
–H2S mixture. The ML framework delivers rapid, accurate forecasts at a fraction of the computational cost of traditional simulators, enabling more efficient screening
and optimization of GCS operations where CO2
purification is economically burdensome. These results provide
actionable guidance for designing cost-effective and reliable GCS with impure CO2
streams.