Natural language explanations can potentially be easy to follow and unlimited in expressivity, but their faithfulness is typically questionable, such as with the simple answer-then-explain setting which tends to lead models into fabulating their explanations. Moreover, it is questionable whether LLMs produce their outputs in a thought process that is anyhow related to human reasoning, as they are in essence mere enhancements of traditional n-gram models. Chain-of-thought reasoning is one notable improvement of the decision process, but it is too computationally intensive for ubiquitous use.
In the research paper, we propose to ground natural language explanations, as well as the answers, in a suitable resource-efficient LLM reasoning process. When converted to a sequence of tokens, the result of the reasoning process can then become part of the context observed by the model when producing its final answer or explanation.
The grounding reasoning sequence does not have to be directly human-readable, as it merely has to encode the explanation together with the answer. This information can then be simply decoded from the reasoning sequence to natural language when the model generates the final answer or explanation. In order for the explanations to be credible, a joint predict-explain setting can be used, in which the answer and explanation are inferred independently of each other.