精选解读：[R] LLMs are Locally Linear Mappings: Qwen 3, Gemma 3 and Llama 3 can be converted to exactly equivalent locally linear systems for interpretability

2025-06-07 AIFlare

本文是对AI领域近期重要文章 **[R] LLMs are Locally Linear Mappings: Qwen 3, Gemma 3 and Llama 3 can be converted to exactly equivalent locally linear systems for interpretability** (来源: Reddit r/MachineLearning (Hot)) 的摘要与评论。

Original Summary:

This research proposes a novel approach to LLM interpretability by demonstrating that large language models (LLMs) like Qwen-3, Gemma-3, and Llama-3 can be effectively represented as locally linear systems. The authors achieve this by identifying a “linear path” through the transformer architecture. Nonlinear components are detached from the gradient calculation, and the Jacobian matrix with respect to input embeddings is computed. This “detached Jacobian” acts as a linear transformation, taking input embeddings and producing output embeddings with extremely high accuracy (error ≈ 10⁻⁶). This linear representation allows for precise token attribution, offering a significant improvement over existing approximate methods. The method doesn’t alter the LLM’s weights or outputs, making it a practical approach for enhancing understanding of LLM decision-making. The approach is implemented and evaluated on several state-of-the-art LLMs.

Our Commentary:

The finding that LLMs, despite their complex nonlinear architecture, can be effectively approximated by locally linear mappings at inference time is a significant contribution to LLM interpretability. This linearization allows for a drastically simplified model for analysis, replacing dozens of layers of nonlinear computations with a single matrix multiplication. This simplification has profound implications for understanding how LLMs arrive at their predictions. Precise token attribution, enabled by this method, can help identify crucial words influencing the output and potentially reveal biases or flaws in the model’s reasoning. The extremely low error rate suggests the local linearity is a robust property, not a mere approximation. However, the “local” nature needs further clarification; its limitations regarding the extent of the input space where this linearity holds true should be investigated. Future research could explore using this linear representation for model compression, debugging, or even adversarial attack detection. Overall, this work represents a promising step towards making the “black box” of LLMs more transparent and understandable.

中文摘要：

这项研究提出了一种新颖的LLM可解释性方法，通过证明像Qwen-3、Gemma-3和Llama-3这样的大型语言模型(LLM)可以有效地表示为局部线性系统。作者通过识别Transformer架构中的“线性路径”实现了这一点。非线性组件与梯度计算分离，并计算关于输入嵌入的雅可比矩阵。这个“分离的雅可比矩阵”充当线性变换，以极高的精度（误差≈10⁻⁶）将输入嵌入转换为输出嵌入。这种线性表示允许进行精确的标记归因，比现有的近似方法有了显著改进。该方法不会改变LLM的权重或输出，使其成为增强对LLM决策过程理解的实用方法。该方法已在多个最先进的LLM上进行了实现和评估。

我们的评论：

大型语言模型（LLM）即使具有复杂的非线性架构，但在推理阶段也能被局部线性映射有效逼近这一发现，是对LLM可解释性的重大贡献。这种线性化使得分析模型得到了极大的简化，用一次矩阵乘法代替了数十层非线性计算。这种简化对理解LLM如何得出预测结果具有深远的影响。这种方法能够实现精确的标记归因，有助于识别影响输出的关键词，并可能揭示模型推理中的偏差或缺陷。极低的错误率表明局部线性是一个稳健的特性，而不仅仅是近似。然而，“局部”的性质需要进一步澄清；其局限性，即这种线性关系成立的输入空间范围，应该得到研究。未来的研究可以探索使用这种线性表示进行模型压缩、调试甚至对抗性攻击检测。总的来说，这项工作代表着朝着使LLM的“黑盒”更加透明和易于理解迈出的有希望的一步。

本文内容主要参考以下来源整理而成：

https://www.reddit.com/r/MachineLearning/comments/1l4rpe2/r_llms_are_locally_linear_mappings_qwen_3_gemma_3/

AI Flare

抓住下一波人工智能浪潮

精选解读：[R] LLMs are Locally Linear Mappings: Qwen 3, Gemma 3 and Llama 3 can be converted to exactly equivalent locally linear systems for interpretability

2025-06-07 AIFlare