A study of how injecting calibrated noise into hidden states during fine-tuning reshapes attention, entropy, and the truthfulness of large language models.
This page collects the per-model analyses behind our paper — “Noise Augmented Fine-Tuning for Mitigating Hallucinations in Large Language Models.”
For each of five model families, we evaluate five independent generations on held-out test data using Grok 3.0, then probe the internal state geometry (entropy, sparsity, effective rank, hidden-state norms) across layers under a range of noise configurations.
We score five independent generations per prompt against ground-truth answers using Grok 3.0, then average across 208 held-out test items. Below — each family's Base model, its plain fine-tune (Base FiT), and its best NoiseFiT configuration.
For each architecture: all evaluated NoiseFiT configs as a range, compared against Base and Base FiT.
Beyond the in-house hallucination probe, we evaluated every NoiseFiT-trained model on the HELM-style public benchmark suite — and ran a full loss-component ablation plus a head-to-head against existing noise-augmentation methods on Llama-2-7B / Alpaca.
Average relative improvement of the top-5 NoiseFiT configs over BaseFiT, taken across all five architectures.
Best NoiseFiT recipe for each model family, evaluated on 8 public benchmarks. Δ is the absolute change vs BaseFiT.
(n, σ, r) = layers, std, H/L-SNR
Llama-2-7B fine-tuned on Alpaca · checkpoint 3 · higher is better
Each component matters — isolated KL or consistency losses collapse the model.
Llama-2-7B / Alpaca · runtime, peak GPU memory, and TruthfulQA MC2 by method.