recent posts
- AAAI 2026: Dual Encoding Spaces in LLMs for improved Mechanistic Interpretability
- [Archived] Modelling Human Cognition with Five Attributes
- Deception as Information Appreciation and Focus: why society, organizations, and our minds (and human-made intelligence agents) suffer from self-deception
- AI Hype post: how LLM chatbots helped me get a research paper accepted at a very competitive conference over a couple weekends
- Zomboid Brains: Necromancing a Frankenstein Intelligence
about
Category: LLM Feature Integrations
-
I’ve been trying to catch up on the field of interpretability these last few weeks in my free time, and have been going back over some materials that I either skimmed or simply referenced from another source. One such interesting paper/post was: https://www.alignmentforum.org/posts/rZPiuFxESMxCDHe4B/sae-reconstruction-errors-are-empirically-pathological Gurnee, W. “SAE reconstruction errors are (empirically) pathological.” AI Alignment Forum, March…
-
(This is a continuation of the previous posts: https://omarclaflin.com/2025/06/19/updated-nfm-approach-methodology/; Intro to idea: https://omarclaflin.com/2025/06/14/information-space-contains-computations-not-just-feature/) Paper: Feature Integration Beyond Sparse Coding: Evidence for Non-Linear Computation Spaces in Neural Networks Background: LLMs are commonly decomposed via SAE (or other encoders) into linearly separable parts, that lend surprising interpretability of those features. This project aims to explore the non-linear…
-
This post is an update on: https://omarclaflin.com/2025/06/14/information-space-contains-computations-not-just-features/ related this repo: https://github.com/omarclaflin/LLM_Intrepretability_Integration_Neurons This post covers NFM tricks and tips applied to LLMs. I will update the new repo link here when I make my next post (). Summary: Can we model feature integrations in a scalable and interpretable way? We assume feature integrations are features interactions…