How do Large Language Models integrate concepts/words/patterns? Existing theory assumes they store a large dictionary and do simple computation (linear addition) with concepts from that dictionary. I show multiple kinds of evidence that LLMs store and use complex integrations (non-linear relationships) between concepts/words/patterns, revealing this ability with a relatively simple mechanistic interpretability method — and that this area has been overlooked in the field of mechanistic interpretability.

Link to blog on project: https://omarclaflin.com/llm-interpretability-project-dual-encoding-in-neural-network-representations/

https://arxiv.org/abs/2507.00269

[I’ll add AAAI link here later]

Quick presentation for AAAI 2026 conference

(Edit: added Youtube video link now)

Poster:



AAAI 2026 Website link:

https://underline.io/lecture/139523-422-feature-integration-spaces-joint-training-reveals-dual-encoding-in-neural-network-representations

https://assets.underline.io/lecture/139523/paper/6db90bf3068d670029c14f41637d480e.pdf?Expires=1769793431&Signature=Q4-S4xw4TONwyFKpKkiwN5xWwxZuCfDcT1bfRe6ahuElErN~QgA1ruPLuS4B3FXCZpeXq1jEPRFaqhB4X6F2R5P1MZEgiuXG-Gmn-8YUbqTSVmEZxzTCXkLyLVOg5O~LSbL225N7mVLvkbQ8VoeCrWYiQn2kPiONsRaCZW37xy~T9m-Bjx2DnMprFzWc88DGP7DlTvzL3OB8b-fikE30n4pGC-jLVnlGg4k4zKILtnV~Qr~nnC2UxQUtGfWnBPz4SEuDxa0y8KfO3QLUJCHW89RZPOseli0V~PS-PwOscjD-HWXrhGxxgUoyHYscH1C71B0WUmex3Cfmg8ycS1~vJQ__&Key-Pair-Id=K2CNXR0DE4O7J0

Posted in