Language models are injective and hence invertible
Best AI papers explained - A podcast by Enoch H. Kang
Categories:
The academic paper argues that decoder-only Transformer language models, such as GPTs, are almost surely injective, meaning that distinct input prompts map to distinct internal hidden states, preserving input information without loss. This contrasts with the common assumption that non-linear components make models lossy. The authors mathematically prove that this injectivity is a structural property established at initialization and preserved during standard training procedures like gradient descent. To exploit this finding, the paper introduces SIPIT (Sequential Inverse Prompt via ITerative updates), an algorithm demonstrated to efficiently and exactly reconstruct the original input text from the model’s hidden activations, achieving 100% accuracy in linear time across empirical tests on state-of-the-art models. Ultimately, the work establishes invertibility as a foundational and exploitable property of these models, with implications for interpretability and safety.
