Large language model-based AI agents are now able to autonomously execute substantial portions of a high energy physics (HEP) analysis pipeline with minimal expert-curated input. Given access to a HEP dataset, an execution framework, and a corpus of prior experimental literature, we find that Claude Code succeeds in automating all stages of a typical analysis: event selection, background estimation, uncertainty quantification, statistical inference, and paper drafting. We argue that the experimental HEP community is underestimating the current capabilities of these systems, and that most proposed agentic workflows are too narrowly scoped or scaffolded to specific analysis structures.
We present a proof-of-concept framework, Just Furnish Context (JFC), that integrates autonomous analysis agents with literature-based knowledge retrieval and multi-agent review, and show that this is sufficient to plan, execute, and document a credible high energy physics analysis. We demonstrate this by conducting analyses on open data from ALEPH, DELPHI, and CMS to perform electroweak, QCD, and Higgs boson measurements. We present two of those results in a condensed short paper form — a CMS Run 1 Open Data $H \to \tau^+\tau^-$ to demonstrate performance on a well-established result, and the first Lund plane measurement on LEP data — a genuinely novel result and, to our knowledge, the first produced autonomously by an AI agent.
Rather than replacing physicists, these tools promise to offload the repetitive technical burden of analysis code development, freeing researchers to focus on physics insight, truly novel method development, and rigorous validation. Given these developments, we advocate for new strategies for how the community trains students, organizes analysis efforts, and allocates human expertise.
Figure 1: Diagram of how an AI-agent workflow (JFC) mirrors the typical high-energy physics analysis pipeline.
Measure the properties of the Z boson — its mass, total width, and hadronic peak cross section — from a lineshape scan, together with the number of light neutrino species extracted from the invisible width. Events are classified into hadronic and leptonic final states, and luminosity-independent counting ratios constrain the partial widths.
Perform a high-precision measurement of the primary Lund jet plane density using archival $e^+e^-$ collision data collected at the Z pole. The observable isolates fundamental properties of the QCD radiation pattern by mapping emissions in the kinematic phase space. Results are fully unfolded to correct for detector effects and compared against leading Monte Carlo event generators.
Measure the two- and three-point energy-energy correlators (E2C, E3C), their ratio, and the energy-energy correlator asymmetry (AEEC) in hadronic Z decays. These energy-weighted angular correlations probe collinear and back-to-back QCD dynamics and the transition to hadronization, with the E3C/E2C ratio further suppressing normalization and hadronization sensitivity. Detector-level data are unfolded to particle level and compared against precision calculations and modern parton-shower generators.
Perform a simultaneous measurement of the heavy-flavor partial decay widths ($R_b$, $R_c$) and the forward-backward asymmetry ($A_\text{FB}^b$) of the Z boson. To reliably isolate bottom and charm quark decays, the analysis relies on an impact-parameter tagging algorithm that identifies displaced secondary vertices. This constitutes a precision test of the Standard Model electroweak sector.
Determine the strong coupling $\alpha_s(M_Z)$ from six classic event-shape distributions — thrust, heavy jet mass, the wide and total jet broadenings, the $C$-parameter, and the Durham $y_{23}$ resolution — in hadronic Z decays. The distributions are unfolded to particle level and fit with NNLO+NLL QCD predictions, extracting $\alpha_s$ together with the non-perturbative power corrections and comparing against the world average.
Earlier exploratory runs produced with Claude Opus 4.6. These illustrate the framework's reach but are not presented as validated measurements in the paper; their repositories are archived and not public.
Determine the number of light neutrino generations ($N_\nu$) by measuring the invisible decay width of the Z boson. The analysis subtracts the visible hadronic and leptonic partial widths from the total Z width obtained via lineshape fits. The final extracted value tests the fundamental structure of the Standard Model by confirming the existence of exactly three active neutrino families.
An independent measurement of the primary Lund jet plane density utilizing the DELPHI detector dataset, serving as a critical cross-check against ALEPH results. The analysis constructs coordinates of partonic emissions to map the intricate structure of QCD splittings in a model-independent way. Unfolded observations highlight the robust capability of AI agents to replicate complex, high-dimensionality measurements across different collaborative datasets.
Characterize the geometric flow of hadronic events using six well-established event shape variables: Thrust, Heavy Jet Mass, Total Broadening, Wide Jet Broadening, C-parameter, and the Jet Resolution Parameter. The distributions are corrected for acceptance and hadronization effects. Through rigorous NLO+NLL theoretical fits, an accurate determination of the strong coupling constant $\alpha_s(M_Z)$ is achieved.
Investigate the internal composition of jets originating from light quarks and heavy flavor decays using modern grooming techniques like Soft Drop. The measurements target essential substructure observables such as jet mass and $k_T$ splitting scales. This provides crucial insight into non-perturbative QCD phenomena and helps tune modern parton shower models.
Directly probe the Yukawa coupling of the Higgs boson to fermions by measuring its signal strength in the $H \to \tau\tau$ decay channel. The analysis specifically targets the semi-leptonic $\mu\tau_h$ final state utilizing the 8 TeV CMS Open Data release. A comprehensive profile likelihood fit is performed to establish a robust measurement of this critical Standard Model parameter.
Indicative cross-model comparison: the identical $H \to \tau\tau$ prompt and framework re-run with two other driving models (Claude Opus 4.8 is used throughout the paper), recovering compatible but distinct results.