Industry defectors are becoming increasingly loud that LLMs are not the path forward to human intelligence. What structure in current AI is missing from a human intelligence point of view?
In the last couple weeks, a couple prominent AI researchers have voiced their disbelief that LLMs are not the path forward to human-level AI and remain at a sub-intern usefulness level (despite the recently reassuring economic suicide pact of a trillion+ dollars of interlocking deals, along with the more recent teasing phylactery proposal to bind the US government tax base to the death pact, since, you know, our 401ks/retirement packages are already bonded to the Lich, anyways).
We can address the economics in a future post, but I want to focus on the tech. In summary, the main point of Andrej Karpathy (OpenAI, Tesla) and Yann LeCun (Meta) is roughly:
- The current approach of simply ingesting a unidimensional output of human intelligence (language), with the transformer architecture (as impressive as it is), isn’t enough to produce moderate or human-level intelligence.
My summaries below for each of them (if you don’t want to watch):
- Andrej Karpthy
- we might need to cause declarative memory degeneration
- reproduce evolutionary pressures rather than educational processes
- make RL higher dimensional
- manage working memory better
- “we’re summoning ghosts, not building animals” w current approaches
- Yann LeCun
- need to focus on inputting sensory (non-text) information
- removing auto-regression/generative approaches
- inference by optimization rather than feedforward
- focus on hierarchical world model building
As most LLM-savvy audience know already, modern chatbots are a stitched-together amalgamation of brain pieces: LLMs (language centers), with CNNs for PDF and image parsing (occipital cortex), and separate RNNs/transformers providing audio capabilities as well (temporal cortex), along with MCPs/tools/’skills’ (premotor cortex). Andrej argues, effectively in my view, that transformers represent a general cortical tissue equivalent, but perhaps lacking deeper brain structures.
While there’s been advances for sure in terms of scaling and tricks (mixed experts, different types of caching, chain-of-‘reasoning’, etc), with a lot more curated banks of question-answers (used to fine-tune) developed in-house at Anthropic, OpenAI, etc, the essential reality seems the same as when I reviewed it two years ago (I wouldn’t recommend watching the video, but around 6 minutes in, I “diagnose” chatGPT):
- identify the symptomology of neurological deficits including confabulation, anosognosia, anterograde (& retrograde) amnesia, lack of insight, etc,
- which leads me to “diagnose” it with Wernicke-Korsakoff syndrome, an Axis 1 thought disorder, etc
From that, analogously, you could presume:
- a lack of hippocampal circuitry, dorsomedial thalamic-prefrontal connections, brainstem areas, periventricular areas, etc
- — and analogously connect those to the missing computational functions that we need to develop, but that might be carrying the metaphor too far, or overly relying on a ‘bio-inspired’ model of intelligence.
However, the take-away is that it still does come off as an industrially-scaled-up Broca/Wernicke area, devoid of most of the rest of the critical areas of the brain, kept in warehouse-sized server factory-farms in rural America, like mass-produced headless chickens, hyperfattened so that its corpus sprawls across a couple hundred GPUs, overfed on declarative internet content. Ironically, all while the corporate lich lords frighten us about its collective superintelligence, rather than the impressively-mass-produced but witless echolalic it actually is.
I will say briefly, as an AI user, it’s great as:
- a natural language layer between different coding environments/tools
- simple code generation, code bug finding
- replacement for internet search
- great for human language translation
- document summarization, and even Socratic Q&As about it
- making customer service even more annoying
However, once you stray into agentic workflows tackling anything of moderate difficulty, even with copious agent usage, these factory-farmed Broca-Wernicke zomboids, quickly become overstrained. Even with tons of manually-crafted instructions specifically trying to constrain and prop up each piece with supplementary intelligence (before LLM acolytes message me, I am well aware of ‘auto evaluators’, meta-loops of prompt editing, agentic frameworks, etc), each chunk of brain pulp trying to hold up their part of a rickety mechanical Turk, usually for leadership proofs-of-concepts delivering on a very small part of the overall ask, it doesn’t scale into intelligence with any real takeoff velocity. At least in my experience.
As Andrej (and many others) have pointed out, its lack of reasoning (despite the marketing language), inability to form new memories (again), other cognitive deficits, and even its mismanaged working memory (presumably from poor executive control, not just capacity), makes it difficult to reasonably compare this to human intelligence.
Putting the comprehensive list of Yann and Andrej’s suggestions aside, I find a couple of them worth commenting on. Pushing for an evolutionary search for a general intelligence algorithm makes sense on paper, and I would add doubly so given the amount of compute we’re building. This might be a sensical strategic switch once we’re sitting around with overexpanded compute infrastructure post-hype/post-collapse.
DeepMind recently published an interesting paper on an automated search for a better RL algorithm. Like many of their papers, I’m sure there’s large caveats limiting the generalizability of their approach but this seems to be in a similar vein of what Andrej is suggesting (and what I’ve written in failed proposals). That is, can we build processes which automatically meta-search for an intelligence architecture? This search currently is done manually with teams of human AI scientists.
When I was younger, I read books on genetic approaches and swarm approaches, had assumed directed evolutionary approaches, enabled by expectedly large compute (given Moore’s laws) would be the primary principles we’d be leveraging by this decade. Our acephalic political-industry-government leadership may end up accidentally stumble towards that vision, just yet, once LLM-economic bubbles start to leak and compute markets saturate.
Hopefully, sooner-than-later, rather than ensnaring industry-public money further into a singular LLM intelligence strategy, we will diversify. Without diving too much into that in this article, here are some of my thinking (and I’m sure overlapping with others thoughts, too) on some paths forward:
- design pure cognition tests — we need to spend more effort on sophisticated cognition tests (tasks as separable from memorization as possible, with less memory requirements than even current IQ or ARC tests [which are brute force solvabe], and tasks which are unsolvable by mass memorization, such as pure RL tasks to min/max entropy with hidden changing rules, etc)
- evolve cognitive architectures through meta-learning — generate variant reasoning-producing architectures using generated environments (different RL algorithms, training procedures, etc) using better cognition tests as evolutionary assessments to select for architectures. Essentially, if we’ve proven cognitive circuits can “grow” automatically in a transformer architecture with training data, how can we explore architectures in a more automated way. Is one path forward using genetic algorithms and swarms, to probe for architectures?
- force circuit evolution on reasoning — borrowed thought from Andrej, and I’m still unsure if intelligence works in a way where memory and reasoning are entirely disentangle-able, but its an interesting thought; even if partially true, forcing active degradation of memorization pathways might accelerate the development of separate reasoning circuits. The rationale is partly based on the neuroscience insight that memory fragmentation is a deliberate feature enabling flexible cognition (memory recall in humans is a reconstructed simulation). From a computer science rationale point of view, we’d like to emphasize the cost/loss function of reasoning more than memory in the old IQ formula (intelligence = memory + reasoning).
- better circuit mapping tools — to disentangle declarative knowledge retrieval from implicit memory processes, with better visualization/characterization tools, granting more rapid transparency into newly generated reasoning architectures. Mechanistic interpretability is currently revealing in LLM neural code where information is stored and where parallel, dynamic circuits which transform inputs live within current transformer architecture. Can we generalize these tools, make them robust, and tease apart reasoning? This would give production researchers a robust lens to monitor a future-built evolutionary search for intelligence architectures.
While a ton of engineering effort into transformer scaling and inference cost reduction has occurred over the last couple years, and, undoubtedly, lots of directed effort into curated datasets specifically to fine-tune-prepare them against the public metrics they’re tested against, current LLM performance (which I use almost daily) still seems akin to picking the Oracle of Delphi off the ground, linguistically fluid and decent with metaphor, rehabilitating and dressing her up to make her more presentable, and then realizing, even sober, she has advanced dementia and no real connected intuition to the world.
My optimism on the eventual resting place for LLMs is that they will be used for what they are good at. I believe future AI will contain more kinds of architecture, and LLMs will end up being a critical laboratory lens to examine it, potentially operating as an interface layer between components, but definitely as human-artificial interface layer within a larger more sophisticated automated intelligence scheme.
For now, I’ll enjoy my writing aid (alternate metaphors that I declined: Mad Hatter, Jabberwocky, Hitchhiker’s babel fish) and cooking recipe tool, and watch the direction of intelligence research.