Authors:
Vimalkumar Kumaresan
Addresses:
Department of Data Science, Tredence Inc., San Jose, California, United States of America.
This paper investigates self-calibrating foundation models in the high-stakes decision-making setting, where incorrect decisions are prohibitively expensive. At the heart of it is the aim to reduce hallucinations and spurious confidence that researchers sometimes witness in large language models, especially in safety-critical applications such as healthcare and finance. The model incorporates an uncertainty-aware feed-forward loop to reflexively and flexibly adjust its confidence thresholds in situ based on entropy-sensory signals. The researcher evaluated the proposed system on a carefully prepared dataset of 459 fully anonymized high-stakes cases from two domains: financial credit risk assessment and medical triage. The experiment was implemented using the PyTorch and Hugging Face Transformer libraries, with the Llama-2 architecture as the base model. For uncertainty estimation, the researcher used software-based methods, including Monte Carlo dropout and Deep Ensembles. The results show that adding both terms improves calibration quality rather than overall performance. Uncertain outputs can also be flagged for inspection by human experts, thus providing a hybrid intelligence approach. In this paper, researchers present the architectural modifications required to ensure self-calibration, along with a rigorous definition of its advantages over non-calibrated references, which justify its potential applicability in demanding real-world applications.
Keywords: Uncertainty Quantification; Foundation Models; Feedback Loops; Risk Assessment; Deep Ensembles; Large Language Models; Hybrid Intelligence Approach; Calibration Quality.
Received on: 12/05/2025, Revised on: 03/07/2025, Accepted on: 24/09/2025, Published on: 07/03/2026
DOI: 10.69888/FTSFDS.2026.000620
FMDB Transactions on Sustainable Finance and Data Science, 2026 Vol. 1 No. 1, Pages: 1–9