AI.7 Create The First Draft Of The Essay
Add supporting sentences to each paragraph to convert the outline into a draft.
The paragraphs at the end of Section AI.6 represent my essay’s outline. The outline contains topic sentence/concluding sentence pairs, strung together in a narrative order. The outline includes several notes reminding me of additional points or material to insert.
The Section AI.6 outline began as a set of index cards; each card has a paragraph topic on one side, and a pair of sentences on the other. The completed set of index cards represents the end of using the Chapter 2 scaffold. I next entered the index card information into a word processor (creating, for instance, the paragraph sequence which ended Section AI.6).
To create my essay, my next step was to convert the outline into a first draft. I used the word processor to add supporting sentences between each topic sentence/concluding sentence pair (Section 4.2). When supporting sentences were added to every paragraph, the first draft was complete.
The outline provided a scaffold which constrained what supporting sentences were added to each paragraph. As noted in Chapter 4, every paragraph has a logical structure; topic sentences, supporting sentences and concluding sentences are all related. I created my first draft by reading a paragraph’s existing material in the outline, and then by immediately adding supporting sentences to flesh out the paragraph. I did not expect the supporting sentences to be perfect; after all, I didn’t even expect the existing topic and concluding sentences to be perfect either! I planned to take several editing passes to revise and polish the draft (Sections 4.3 and 4.4).
Nevertheless, adding supporting sentences struck me as more serious writing. At a local level – within each paragraph – I tried to create sentences which worked together to communicate a topic. I did not worry about the whole manuscript’s narrative structure when adding supporting sentences. I didn’t worry about narrative structure because I was confident I paid sufficient attention to its design during my earlier outlining stages.
When all supporting sentences were added, I had created my first draft. Writers expect first drafts to be so terrible they should never be shared (Becker, 2020; Koch, 2003; Lamott, 1995). However, to illustrate my process for creating the Section AI.2 essay, I provide my (terrible) first draft below:
We live in an artificial intelligence (AI) revolution fueled by a new invention called a large language model (LLM). LLMs are built from deep belief networks, which are artificial neural networks capable of learning to perform complicated tasks because they contain many layers of intermediate processors called hidden units (LeCun et al., 2015). LLMs differ from traditional deep belief networks by including additional architectural properties which aid their ability to learn and to process language (Dong et al., 2023). LLMs trained on a huge amount of text taken from the internet, learn to predict which words should follow from a stimulus sentence. “Give them a human language description or several examples of what one wants them to do, and they can perform tasks for which they were never trained” (Manning, 2022, p. 132).
LLMs are revolutionary because they can generate long, detailed, meaningful responses to short text prompts. LLMs are now commonly used to accomplish a variety of complex tasks, including editing scientific manuscripts, writing or checking programming code, and brainstorming ideas (Mitchell & Krakauer, 2023; Stokel-Walker & Van Noorden, 2023). OpenAI reports its most recent LLM, GPT-4, can pass a number of professional and academic benchmarks. For example, GPT-4’s score on a simulated bar exam placed it in the top 10% of test takers. “What is clear is that these models use language in a way that is remarkably human” (Piantadosi, 2023, p. 4, his italics).
LLMs’ performance has generated many questions in both the popular press and scholarly journals (Mitchell & Krakauer, 2023). A recent headline in the New York Times read “Microsoft Says New A.I. Shows Signs of Human Reasoning.” Do LLMs understand language? Are LLMs intelligent? Are LLMs sentient or conscious? Such questions are very polarizing; Mitchell and Krakauer report 51% of scholars believe LLMs understand language.
Speaking as a cognitive scientist, I feel such questions miss the key point. I am interested in a different question: ‘Can LLMs inform cognitive science?’ Below, I argue LLMs may indeed be able to inform cognitive science – but only if researchers expend considerable effort to study the internal structure of LLMs in order to discover how LLMs produce their amazing behavior. LLMs may provide new theories to cognitive science, but only if researchers look inside them to pull theories out.
Modern AI’s excitement and controversy comes from an LLM’s ability to generate paragraphs of meaningful sentences in response to short prompts or questions. For example, my third-year students in my ‘History of Modern Psychology’ class recently wrote a two-page essay in response to a broad final exam. I explored how OpenAI’s ChatGPT would respond if I only used an exam question as a prompt. I tested ChatGPT with six different possible questions. For each question, ChatGPT generated six paragraphs of well-written prose whose sentences were definitely related to a question’s theme. LLMs consistently generate well-written, interpretable and surprising responses to short, vague prompts.
Cognitive science has studied human language for decades. Cognitive science’s most influential account, generative grammar (Chomsky, 1965, 1966, 1995), proposes human language involves specialized rules or processes manipulating complex mental representations of sentences. In general, generative grammar represents sentences as a tree-like structure called a phrase marker which encodes the order of words in a sentence, the parts of speech to which words belong, and the hierarchical structure which organizes the sentence. Rules, called transformations, convert one phrase marker into a different phrase marker – for instance, to convert a statement into a question. By focusing on symbols and rules (i.e., phrase markers and transformations), Chomsky’s generative grammar not only transformed linguistics but also inspired theories in cognitive science for many decades.
Cognitive scientists who believe human language is the rule-governed manipulation of symbols do not believe LLMs inform cognitive science (Chomsky et al., 2023; Veres, 2022). For example, in a recent New York Times opinion piece, Chomsky et al. point out “We know from the science of linguistics and the philosophy of knowledge that [LLMs] differ profoundly from how humans reason and use language. These differences place significant limitations on what these programs can do, encoding them with ineradicable defects.” Cognitive scientists have, for decades, followed the motto ‘no cognition without computation’. The motto claims we can only explain cognition by appealing to symbols and rules, which cognitive scientists assume are core properties of computation (Dawson, 2013, 2022).
Others believe the success of LLMs suggest alternatives to generative grammar, like statistical language learners, are worthy of cognitive science’s interest (Contreras Kallens et al., 2023). UC Berkeley psychologist Steven Piantadosi agrees with Chomsky et al. (2023) that LLMs do not use grammatical rules (Piantadosi, 2023). However, he then argues an LLM’s high level performance without using rules refutes Chomskyan linguistics. “The success of large language models is a failure for generative theories because it goes against virtually all of the principles these theories have espoused. In fact, none of the principles and innate biases that Chomsky and those who work in his tradition have long claimed necessary needed to be built into these models” (Piantadosi, 2023, pp. 14-15, his italics).
My own research examines cognitive science’s foundations, focusing on relations between theories based on rules and symbols and theories based on artificial neural networks. I therefore recognize a historical precedent for Piantadosi’s position on Chomskyan theory, a precedent relevant to answering the question about whether LLMs can inform cognitive science.
In the mid-1980s, cognitive science found itself in the midst of what is now called its connectionist revolution. The new networks, called multilayer perceptrons, were powerful enough to serve as theories about human cognitive phenomena. The power of the new networks arose from their containing a layer of hidden units; with enough hidden units a multilayer perceptron could in principle learn any mapping between stimuli and responses (Lippmann, 1989).
The rise of multilayer perceptrons caused a revolution in cognitive science because proponents of artificial neural networks attacked traditional theories which appealed to the rule-governed manipulation of symbols. For example, one network was trained to convert present-tense verbs into their past-tense form (Rumelhart & McClelland, 1986). Rumelhart and McClelland proposed their network indicated the past-tense network performed linguistics without using grammatical rules like those proposed by Chomsky: “We suggest that lawful behavior and judgements may be produced by a mechanism in which there is no explicit representation of the rule” (p. 217).
My interest in the connectionist revolution focused on a curious aspect of the revolutionaries’ argument: they assumed networks abandoned symbols and rules, but never provided evidence to support their assumption, or to show what their networks used to replace symbols and rules. I call their approach gee whiz connectionism (Dawson, 2009). I tried to distance myself from gee whiz connectionism by training multilayer perceptrons on various tasks, and by conducting detailed analyses of the internal structure of my trained networks.
When I looked inside my trained networks, I discovered structures which resembled theories based on symbols and rules. For example, my students and I trained one network to solve a number of different logic problems. When we looked inside the network, we discovered formal rules of logic of the sort philosophy students would learn in an introductory logic course (Berkeley et al., 1995). In another study, my students and I trained networks to classify mushrooms as being edible or poisonous. When we looked inside, we found we could translate network states into a traditional symbol/rule system called a production system (Dawson et al., 1997). Such results reveal surprising similarities between network models and symbolic models, blurring the distinctions between the two approaches (Dawson, 1998, 2004, 2013, 2018).
Importantly, my students and I did not usually find network structure which replicated existing formal theories. Instead, we usually found new structures which could inform a cognitive science based on symbols and rules. For instance, my recent work on interpreting artificial neural networks trained to make musical judgements finds structures strongly related to traditional music theory (e.g., preference for particular musical intervals) or to the formal set theory of music (e.g., Fourier representations of musical sets) (Dawson, 2009, 2018; Dawson et al., 2020; Perez et al., 2023). However, I often discover the formal properties of networks depart in surprising ways from traditional music theory. For example, music theory usually represents Western music as consisting of twelve different pitch-classes (C, C#, B, and so on). In contrast, my musical networks generate a formal theory which consists of only six different pitch-classes, and which treats pitch-classes which are six semitones apart in traditional theory (such as C and F#) as being identical. In short, when I looked inside my networks, I found new kinds of formal structures for cognitive science to explore.
My own research makes me suspect LLMs will only inform cognitive science when researchers abandon mere assumptions about what makes LLMs different from rule and symbol models, and instead seek evidence about both the similarities and differences between both types of models.
Why must we look inside LLMs to inform cognitive science? Cognitive scientists have long known psychologically plausible performance can be produced by methods completely unrelated to the processes of human cognition. One famous example was the conversational program ELIZA which carried out convincing conversations with human participants (Weizenbaum, 1966). ELIZA’s performance deliberately did not require the program to understand language. “ELIZA shows, if nothing else, how easy it is to create and maintain the illusion of understanding, hence perhaps of judgment deserving credibility. A certain danger exists there” (Weizenbaum, 1966, pp.42-43). (When ELIZA’s danger, and Weizenbaum’s intent in creating ELIZA, were ignored Weizenbaum abandoned artificial intelligence research altogether (Weizenbaum, 1976)). Examples like ELIZA show why cognitive scientists are more concerned about comparing processes than comparing performance.
I strongly suspect LLMs use methods radically different from those used by humans because they represent stimuli and responses with encodings unrelated to any proposed by cognitive scientists. For instance, while LLMs process sentences, they do not represent sentences as sentences, or even as a collection of words. First, they break words into smaller components, called tokens. Then they encode a token as a long vector of numbers in a scheme which assigns similar vectors to similar tokens. In one LLM, BERT, each token is represented by a 768-dimensional vector (Manning et al., 2020). To my knowledge, no cognitive scientist has proposed representing text using such high-dimensional codes or by decomposing words into smaller components. If LLM representations are unrelated to human cognition, then LLMs do not refute Chomsky’s approach. Instead, they refute the applicability of Chomsky’s approach to the explanation of LLMs!
LLM proponents recognize LLMs use some – potentially novel — method to produce a remarkable facility with language. “The theory is definitely in there” (Piantidosi, 2023, p. 8, his italics). To inform cognitive science, to defend claims like ‘LLMs refute Chomsky’, researchers must do the hard work to discover what methods LLMs use, and to compare the discovered methods to those discovered by research on human cognition.
However, understanding how LLMs convert stimuli into responses is extremely challenging, because LLMs are intimidatingly large and complex systems. For example, OpenAI’s ChatGPT is reported to have approximately 175 billion parameters which can be adjusted by learning and has been trained on text consisting of approximately 300 billion words. Another LLM, BERT, consists of twelve different layers of intermediate processors. Mitchell and Krakauer (2023, p. 1) note “the inner workings of these networks are largely opaque; even the researchers building them have limited intuitions about systems of such scale.” Piantadosi (2023, p. 8, his italics) concurs: “In fact, we don’t deeply understand how the representations these models create work.”
Fortunately, researchers recognize the need to extract potentially novel theories or representations from LLMs and are developing new techniques to understand a LLM’s internal structure. For example, consider the work of Stanford linguist and computer scientist Christopher Manning (Manning, 2022; Manning et al., 2020). Manning and his colleagues have developed methods which probe the internal structure of an LLM in an attempt to determine whether the network represents structures found in generative grammar.
One of Manning’s studies (Manning et al., 2020) examines a language-specific component of an LLM, components called attention heads. An attention head determines the relevance of one word in a sentence presented to an LLM to other words in a sentence, or to different words in the output being generated by the LLM. The more related two words are, the greater the amount of attention is assigned to them. Manning et al. discovered attention being assigned to word pairs captured, in part, linguistic properties of word relationships. For instance, the amount of attention assigned linked objects to appropriate verbs, linked prepositions to appropriate objects, linked noun premodifiers to appropriate nouns, and so on. Importantly such relationships are linguistic – represented in the hierarchical structure of a phrase marker – because two words may be far apart in a sentence but may still be linguistically related.
Manning et al. (2020) also describe a structural probe method which they use to detect phrase marker trees represented in processors in an LLM’s layers of hidden units. The method involves measuring the distance between different vectors (representing tokens) in the network. With an appropriate distance metric, items whose vectors are close together in the LLM’s space are also close together in the phrase marker structure representing words in a complete sentence. Manning et al. report they can use their distance metric to reconstruct a phrase marker from network properties. In short, “these models learn and represent the syntactic structure of a sentence” (Manning, 2022, p. 131).
My hope is more work of this sort is on the horizon. As researchers explore LLM representations, as well as how the representations are used to generate responses, we move closer to relating LLM to human cognition.
Piantadosi (2023, p. 30) claims “large language models rewrite the philosophy of approaches to language. Do LLMs refute Chomsky’s approach? Do LLMs represent a new connectionist revolution for cognitive science? I believe we can’t answer such questions – yet. Answering such questions requires researchers to discover the nature of an LLM’s representations, as well as how its representations are used to generate responses.