Empirical Investigation of the Riemann Hypothesis Using Machine Learning: A Falsifiability-Oriented Approach

TLDR

A machine learning–based framework that combines classification, explainability, contradiction testing, and generative modeling to provide empirical evidence consistent with the Riemann Hypothesis, demonstrating that machine learning and explainable AI not only reproduce the known statistical properties of zeta zeros but also reinforce the absence of contradictions to RH under extended empirical exploration.

Abstract

The Riemann Hypothesis (RH) asserts that all non-trivial zeros of the Riemann zeta function lie on the critical line Re(s) = 0.5, yet no general proof exists despite extensive numerical verification. This study introduces a machine learning–based framework that combines classification, explainability, contradiction testing, and generative modeling to provide empirical evidence consistent with RH. First, discriminative models augmented with SHAP analysis reveal that the real and imaginary parts of ζ(s) contribute stable explanatory signals exclusively along the critical line, while off-line regions exhibit negligible attributions. Second, a contradiction-test framework, constructed from systematically sampled off-line points, shows no indication of spurious zero-like behavior. Finally, a mixture-density variational autoencoder (MDN-VAE) trained on 10,000 zero spacings produces synthetic distributions that closely match the empirical spacing law, with a Kolmogorov–Smirnov test (KS = 0.041, p = 0.075) confirming statistical indistinguishability. Together, these findings demonstrate that machine learning and explainable AI not only reproduce the known statistical properties of zeta zeros but also reinforce the absence of contradictions to RH under extended empirical exploration. While this framework does not constitute a formal proof, it offers a falsifiability-oriented, data-driven methodology for exploring deep mathematical conjectures. This empirical evaluation involved a baseline of 12,005 points, including 5 known zeros, and 15 off-line test points. The Random Forest classifier achieved high accuracy in distinguishing critical-line zeros, and consistently rejected off-line points. The generative models further corroborated these findings.