Beyond Broca


What if we’ve got one of the most important things in our understanding of who we are, and what makes us intelligent, utterly wrong?


Chris von Csefalvay


15 October 2023

There’s something special about language. It is ‘our own’, it is ‘us’, in a profound way, and quite surprisingly, more so than art. I was deeply struck by this when I first saw reactions to large generative language models that created realistic, human-ish prose. Notably, those mature enough to reach a non-professional audience – ChatGPT based on GPT-3 and later GPT-4 – came quite some time after models that could create fairly acceptable visual ‘art’.1 The appearance of synthetic language-like products (SLLPs), as I like to call the output of such generative models, came well after the appearance of synthetic simulacra of visual art,2 yet elicited much less fervent responses.

1 I don’t mean to insinuate that what Stable Diffusion and DALL·E produce are ‘art’ in the sense we understand that concept. However, neither is what GPT produces ‘language’. They are both simulators of outcomes based on stochastic approximations over a sufficiently large training set to be able to approximate the outcome of human activities we know as ‘art’ and ‘language’, respectively.

2 For a given value of “well after”. Time, in this discipline, moves with an unsettling alacrity.

This was quite striking, for three reasons.

Ross, Don. 2019. ‘Consciousness, Language, and the Possibility of Non-Human Personhood: Reflections on Elephants’. Journal of Consciousness Studies 26 (3-4): 227–51.

Which, of course, leads us to the key question: what if we got one of the most deeply enshrined beliefs about language, intelligence and the relationship between the two utterly, dreadfully wrong?

Our precious words

A large language model (LLM) is, essentially, a very simple machine that knows a large number of conditional probabilities. Given a sequence of tokens \(k_0, k_1, \cdots, k_{n}\), it associates every possible token \(k^{\prime}\) with a probability \(p(k_{n + 1} = k^{\prime} | k_0, k_1, \cdots, k_{n})\) – or in other words, given a token sequence \(k_0, k{1}, \cdots, k_{n}\), it assigns to every point in a probability space a conditional likelihood that that point’s corresponding token will be the \(k_{n+1}\)-th token. Or, using my preferred formulation, which looks at the inverse probability: given the token sequence, it creates a probability distribution of the next token and draws stochastically, weighted by token likelihood, so that a draw from the region of highest probability is most likely.

It turns out that if the model’s understanding of these conditional probabilities is sufficiently good, it can simulate knowledge quite well, a point I belaboured elsewhere. This is not overly surprising. If a model knew the conditional probability of rain on day \(d\) – let’s call this \(p_r(d)\), given a vector \(\theta_r\) of length \(n\) that tells us whether it rained on days \(d-1\), \(d-2\), …, \(d-n\), we’d trust it to tell us whether we’d need our raincoat on that given day. All it would have to do for that is to learn the conditional probability of \(p_r(d) | \theta_n\), which of course it could easily do by representing \(p_r(d) | \theta_n\) as \(f(d, \theta_n)\), then learning the parameters of that function so as to minimise a loss function \(J(f(d, \theta_n), r(d), \theta_n(d))\), where \(r(d)\) is of course whether it rained on day \(d\) and \(\theta_n(d)\) is the \(\theta_n\) history vector for the day \(d\). Iterate this often enough (over not single values of \(r(d)\) and \(\theta_n(d)\) but vectors thereof), and you can learn a pretty decent conditional probability function. The model would know no more about rain or shine than LLMs know about language or the subject matters of language, but simulating tokens gets you quite a long way towards being useful as a simulacrum of knowledge.

Indeed, this is to the point that what comes out of such a model might well appear human-like: modern GPT implementations can produce prose that is a little stilted at times, but certainly often only distinguishable from human prose by the conspicuous absence of grammatical and spelling errors. This is interesting because of how it was perceived: quite immediately, this was connected to a kind of intelligence that was almost human, or indeed at times better than human. People suddenly started to worry about a dumb token simulator taking over their jobs.

Clearly, language hit a nerve.

The medium is the message

JARVIS. Siri. Alexa. WOPR. The AIs of fiction and our every-days have one thing in common: they use language as the presentation layer. This is deceptive, because neither of these systems are, well, particularly smart. Compared to models that can, say, quantitatively infer the activity of a small molecule drug from its structure (QSAR models, on which see Karpov, Godin, and Tetko 2020; and also see Zhong and Guan 2023; Guntuboina et al. 2023), Siri is pretty pathetic. However, it has something QSAR models and other very impressive applications of machine learning don’t: the human presentation layer, i.e. language.

Karpov, Pavel, Guillaume Godin, and Igor V Tetko. 2020. ‘Transformer-CNN: Swiss Knife for QSAR Modeling and Interpretation’. Journal of Cheminformatics 12 (1): 1–12.
Zhong, Shifa, and Xiaohong Guan. 2023. ‘Developing Quantitative Structure–Activity Relationship (QSAR) Models for Water Contaminants’ Activities/Properties by Fine-Tuning GPT-3 Models’. Environmental Science & Technology Letters.
Guntuboina, Chakradhar, Adrita Das, Parisa Mollaei, Seongwon Kim, and Amir Barati Farimani. 2023. ‘PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction’. arXiv Preprint arXiv:2309.03099.
Endicott, Timothy, Joshua Getzler, and Edwin Peel. 2006. ‘Properties of Law: Essays in Honour of Jim Harris’.

How we treat a system seems to be conditional on how it talks to us. In that sense, the medium is profoundly how we treat the message. To use the terminology of J.W. Harris’s writings on human rights, we associate the ‘right’ to be considered to be intrinsically connected to being capable of engaging in human ’discourse (see Endicott, Getzler, and Peel 2006). And that, of course, means language.

This is not overly surprising, either. Our understanding of language has been that of a watershed moment in evolution. Humans became what they are when they learned to use language. Tool use is great, but tool use only makes a human at best. What makes humans, plural, is language. This is intrinsically connected, of course, to society. Language is not an arbitrarily selected activity, nor is it really necessarily the kind of evolutionary game changer that tool use is. Rather, it is the tool, the sine qua non, the cornerstone and the absolutely fundamental instrument of social interaction. Language creates society. Society recognises human individuals and gives that recognition a meaning. The fact that I am a human being, and recognised as such (I hope), has a meaning that is different from me recognising that my dog is an individual of the species Canis lupus familiaris, because it does not merely acknowledge me as being of a certain species, but also of being of a certain kind of agent capable not only of having rights but also of speaking for them. Language is how all that happens (e.g. Budwig 2000; but see Browning 2023).

Budwig, Nancy. 2000. ‘Language, Practices and the Construction of Personhood’. Theory & Psychology 10 (6): 769–86.
Browning, Jacob. 2023. ‘Personhood and AI: Why Large Language Models Don’t Understand Us’. AI & SOCIETY, 1–8.

The language of intelligence (or vice versa)

What, then, if we got one of the most important things about humanity, and human intelligence, dreadfully wrong altogether? What if language is not a product of intelligence (as we understand it in the human context) but rather a necessary instrument thereof?

The evolution of something as crucial as language remains shrouded in a perplexing mystery to this date. What we know is that at some point, about 50-100,000 years ago, something happened that gave rise to language. We don’t quite know what it was, or how it specifically transpired. Indeed, despite advances in our understanding of cognitive neuroscience, we haven’t found evidence of the ‘language faculty’ proposed by Hauser, Chomsky, and Fitch (2002; but see the criticisms by Jackendoff and Pinker 2005).3 The genetics of language production – which centres around FOXP2 these days (see Enard et al. 2002; Enard 2011) – hasn’t gotten us a lot further, and there are way too many edge cases (dissociations, as the term in evolutionary neuroscience goes), where either there is a significant intellectual deficit despite preserved language ability (Williams syndrome being the textbook example of this, viz. Bellugi et al. (2013)) or the inverse (e.g. Developmental Verbal Dyspraxia, where there is impairment to language production but not to overall intellect, viz. Vargha-Khadem et al. (2005)) to be able to confidently make this connection on an individual level.

Hauser, Marc D, Noam Chomsky, and W Tecumseh Fitch. 2002. ‘The Faculty of Language: What Is It, Who Has It, and How Did It Evolve?’ Science 298 (5598): 1569–79.
Jackendoff, Ray, and Steven Pinker. 2005. ‘The Nature of the Language Faculty and Its Implications for Evolution of Language (Reply to Fitch, Hauser, and Chomsky)’. Cognition 97 (2): 211–25.

3 not to be confused with the brain areas responsible for speech, which perplexingly are part, but not the whole, of the language faculty

Enard, Wolfgang, Molly Przeworski, Simon E Fisher, Cecilia SL Lai, Victor Wiebe, Takashi Kitano, Anthony P Monaco, and Svante Pääbo. 2002. ‘Molecular Evolution of FOXP2, a Gene Involved in Speech and Language’. Nature 418 (6900): 869–72.
Enard, Wolfgang. 2011. ‘FOXP2 and the Role of Cortico-Basal Ganglia Circuits in Speech and Language Evolution’. Current Opinion in Neurobiology 21 (3): 415–24.
Bellugi, Ursula, Shelly Marks, Amy Bihrle, and Helene Sabo. 2013. ‘Dissociation Between Language and Cognitive Functions in Williams Syndrome’. In Language Development in Exceptional Circumstances, 177–89. Psychology Press.
Vargha-Khadem, Faraneh, David G Gadian, Andrew Copp, and Mortimer Mishkin. 2005. ‘FOXP2 and the Neuroanatomy of Speech and Language’. Nature Reviews Neuroscience 6 (2): 131–38.

On the other hand, on a broader level, it is hard to discount the relationship. What is more complex is the direction of this relationship. There are, really, three possible scenarios:

  1. Language is a consequence of human intelligence. The kind of intelligence we associate with modern human cognitive capabilities necessarily presupposes, absent some marginal exceptions, language.
  2. Language is an epiphenomenon of human intelligence. It evolved in parallel, but neither requires human intelligence (see Williams syndrome) nor does human intelligence require it (see Developmental Verbal Dyspraxia).
  3. Human intelligence is largely a consequence of language, which is its necessary but not sufficient condition. It is the evolutionarily most stable representation layer for information, and allows reasoning through complexity.

While the second of these is a convenient way to hand-wave away the entire question and account for the edge cases I discussed above, I find the third of these much more compelling. It is not defeated by the argument from either of the edge cases: it is not defeated by arguments from intact language despite intellectual deficits, because it does not assume that language is sufficient, merely that it is necessary. It is not defeated by the inverse, either, because it permits a small number of deviations. Language is not the only possible representational layer that could underpin intelligence. It is, however, vastly more evolutionarily advantageous through its efficiency. It is so much stronger and so much more efficient that it can be considered almost absolutely dominant – which indeed accounts for the fact that disorders of language with preserved intellectual functioning are vanishingly rare. If the efficiency of language as the ‘operating system’ of intelligence weren’t so strongly dominant, such disorders would not be disorders, indeed, but alternate ways of cognitive existence that are equally evolutionarily stable.


BibTeX citation:
  author = {Chris von Csefalvay},
  title = {Beyond {Broca}},
  date = {2023-10-15},
  url = {},
  doi = {10.59350/vynvf-0k137},
  langid = {en-GB}
For attribution, please cite this work as:
Chris von Csefalvay. 2023. “Beyond Broca.”