Recently, on a listserv for professionals interested in spelling, some participants were bothered by my suggestion that Reading “Science” gets some things deeply wrong about language. Some participants argued for the “value” of telling children “little white lies” about language, and for the common phonocentric understanding of English orthography that reduces it to little more than a flawed representation of sound.
While the listserv participants don’t hesitate to call on “linguistic information” and “facts” in their own posts, some of them objected to my efforts to do the same. And since my understanding of the linguistic facts of English orthography posed a bit of a threat to the commercial interests of the listserv moderator, she removed me from the list with no personal comment, but the following public one:
“Students learn best when taught in a manner consistent with the way the brain processes and organizes linguistic information information. Consider this…the system which is biologically hard-wired is perfect (the phonemes of language are a constant); the artificial system of written language which was created by humans…not so much.”
Another participant concurred: “The fact that different dialects of English vary in their surface representations of those constant phonemes does not diminish the perfection!”
My broader conversations about these claims about the perfection of phonemes sparked the interest of my fellow University of Chicago alum, Dr. Alexander Francis, who is associate professor in the Department of Speech, Language, and Hearing Sciences at Purdue University, a Program Faculty member in Linguistics, and holds a courtesy appointment in Psychological Sciences. In other words, Dr. Francis is a bona fide expert in “the way the brain processes and organizes linguistic information” — particularly sound.
So, in the interest of LEX’s stated purpose of providing a meaningful exchange between linguistics and education, I am delighted to depart briefly from my orthographic investigations to offer this first guest post, in which Dr. Francis addresses the above perspectives about phonology and the brain that pervade the field of Reading (and Spelling) “Science”.
Gina, the claim that “the system which is biologically hard-wired is perfect (the phonemes of language are a constant)” makes sense from the perspective of early theories of speech perception. However, these older theories have been almost completely replaced on the basis of a wide variety of new evidence about how infants learn their first language, and how adults learn and/or recognize the sounds of languages beyond their first one. In fact, our understanding of the human capability for speech has changed so radically in the past 30 or 40 years that, by now, most speech researchers would probably consider every part of the statement to be completely false.
Here, I’ll go through each part of the statement, and discuss (in very broad terms) evidence that suggests that it is false. I’ll also try to name some of the key researchers who have contributed to the literature in each area. And, in keeping with the idea that this is an Exchange, I’d be happy to respond to questions and/or requests for more information from you or anyone reading this.
First, the perception and production of speech sounds are not biologically hard-wired. Human infants develop language-specific perceptual abilities in the first few months of life but, at birth, infants seem to be “pluripotent” – capable of learning the sounds of any, or every, language they might be exposed to (see, especially, work by Pat Kuhl, Janet Werker, Linda Polka, Amanda Seidl, Peter Jusczyk, and many, many others). It is quite likely (though not universally accepted and not yet scientifically proven) that human infants are biologically predisposed to learn human speech, but there is no biological predisposition to learn e.g. English as opposed to, for example, !Kung (with its 5 different places of click articulation!). A child born of English-speaking parents but raised in an !Kung environment would grow up to become a perfectly fluent !Kung speaker, and vice versa. So, there’s no hardwiring of speech sounds – they‘re acquired through exposure to the ambient language(s) during infancy.
There is some evidence that long-term exposure to the use of specific acoustic properties in a linguistically relevant manner may have measurable effects on brain organization – my colleagues Ravi Krishnan and Jack Gandour have (with their students) shown that native speakers of Chinese, which is a tonal language, process the acoustic property of fundamental frequency, related to the perception of pitch, in a more accurate manner than do monolingual native speakers of English. But this is a very subtle effect, and it’s not clear whether we might not see the same kind of thing in other kinds of experienced listeners, for example musicians (see work by Patrick Wong, Nina Kraus, Aniruddh Patel, Robert Zatorre, and many others). In other words, there may be some development of “firmware” through auditory experience, but it’s not clear yet to what degree this is a purely linguistic phenomenon.
More importantly, even though there may be some possibly permanent changes in neurophysiology as a consequence of early language experience, the adult brain remains remarkably plastic – able to adapt in response to linguistic input. This plasticity plays out in a variety of ways. Multilinguality is by far the dominant pattern of language use among humans, and the ability to switch between one language, with one system of sounds and sound-meaning correspondences, and another (often within a single sentence, in the case of “code mixing”) is probably fundamental to human speech . Indeed, recent research (by Lori Holt and colleagues, among others) is starting to show that even monolingual listeners can adapt the way they use the speech signal over the course of just a few seconds. Work by Howard Nusbaum and colleagues (including some work in my lab) has shown that longer-term changes (i.e. lasting over 6 months) in the way listeners understand speech sounds can be introduced with just a few hours of exposure to a new talker. Thus, many modern speech scientists conceptualize phonemes as the output of a highly flexible, cognitive system involving both bottom-up (signal-dependent) and top-down (knowledge-based) processing. There are certainly researchers who emphasize the bottom-up aspect of the process more than do others, but nearly everyone agrees that adult phoneme recognition is flexible and manifestly *not* hardwired. (Reference: Kluender, K. R., and Alexander, J. M. (2007). “Perception of speech sounds,” in P. Dallos and D. Oertel (Eds.) Handbook of the Senses: Audition (Elsevier: London).
Second, under normal circumstances, phoneme recognition is not perfect. Speech is often produced in contexts of background noise including the presence of competing speech (the so-called “cocktail party problem”) and our perceptual/cognitive system learns, over many years, to cope with the kind of variability, ambiguity, distortion, and simple lack of sufficient acoustic information, introduced by these problems. Listeners *constantly* make “slips of the ear” which are then typically corrected by higher-level understanding (though, again, this correction is not always perfect). This is especially true under poor listening conditions, such as in the presence of background noise, but it happens even under perfect listening conditions as well. For example, consider the famous phrase “How to wreck a nice beach.” (say it out loud – in a casual way). If you heard this while looking a picture of a huge bulldozer pushing piles of sand around, or perhaps a picture of a row of tacky beach houses, you might assume that you heard it correctly (as written). But if you heard the exact same sequence of sounds in a lecture on speech technology, you would almost certainly interpret it as the phrase “how to recognize speech.” Ambiguous sound sequences abound, in every language, and are the source of some of our greatest anecdotes and humor: “Why did the three brothers name the cattle ranch they inherited from their father ‘Focus’? Because that’s where the sons raise meat.” (Think about it). [Editor’s note: These phrasal phenomena are called holorimes.]
Sound ambiguities are, of course, also the source of frequent, frustrating confusion that should be familiar to everyone. But, again, our brains are very good (but not perfect) at compensating. The next time you speak with someone on a cell phone, try replacing all the “th” sounds (as in “think“ and “thank you“ etc.) with [f] (i.e. say “fank you very much.”) They won‘t be able to hear the difference, because the cell phone simply doesn‘t transmit the frequencies that distinguish a [θ] from a [f]. But they will *think* that they hear a [θ] – it’s not that they think “I don’t know what that sound is, so I’ll guess.“ They actually *hear* the correct sound. Some researchers have even proposed that this is a kind of hallucination, though I don’t know of any evidence that would prove that. The fact is, our auditory system simply cannot depend on the speech signal to be unambiguous – In fluent speech, words are almost invariably distorted by their context such that, when heard in isolation, they are significantly less recognizable than when heard in context (this was first shown in the 1950s by George Miller and colleagues). But this is not a result of errors, or sloppiness – this is an unavoidable consequence of the biology and physics of speech production, in a phenomenon called “coarticulation.“ It is nearly impossible, and demonstrably detrimental to communication, to try to pronounce every sound of every word in a sentence as it would be produced in isolation. So adult listeners have very sophisticated mechanisms for deciphering ambiguous sounds, but it‘s definitely not perfect – in fact, it can be tricked quite easily.
In a well-known phenomenon called the “Ganong Effect” (named for W.F. Ganong who first wrote about it), a speech sounds that is intentionally ambiguous between [t] and [d] can be shown to be heard more often as a [t] when followed by “-ask” (where hearing it as a [t] creates the English word “task”), while the exact same sound is heard more often as a [d] when followed by “-ash” (when hearing it as a [d] creates the English word “dash.”) In other words, listeners’ brains make up for the ambiguity by making the relatively reasonable assumption that they are hearing a real word from their language (i.e. “task” or “dash”) rather than some made up word (i.e. “dask” or “tash.) There is nothing perfect about hearing (or producing) phonemes – it is accomplished by a very sophisticated mechanism that is no less complex, and no less dependent upon experience and cognitive processing, than that involved in reading.
Finally, the phonemes of a language are by no means constant. Obviously there are historical changes (such as the one that turned some, but not all “oo“ words from having the vowel [u] (as in “food“) to having the vowel [ʊ] (as in “hood“ or “good“)), but even if we simply consider the language as it’s spoken at this precise moment in time, there is considerable variability across dialects (consider the different phonemes in a southern American English production of the word “fire” (more like “fahr”) vs. a Midwestern pronunciation. But, even within speakers of a single dialect, it’s not even clear whether the mental representations of speech sounds (i.e. the phonemes) that are used by one listener are the same as those used by another listener. Research in the past 20-30 years has shown that speech perception is highly influenced by experience, not just experience in infancy, but also subsequently. One of the currently dominant theories of speech perception, Exemplar Theory, proposes that every individual exemplar of a speech sound that is ever heard contributes to the unique representation of that sound in the brain of that specific listener (see work by David Pisoni, Keith Johnson, and colleagues). There is a nice analogy to word meanings: When I hear the word “dog” it evokes, among many other things, the image of my own dog. Presumably, as you have never met my dog, it does not evoke that same image for you. Your concept of the meaning of “dog” is subtly, but fundamentally, different from mine, by virtue of the differences in our respective experiences with dogs. In the same way, according to this theory, my best friend growing up, a guy who watched a lot of Monty Python, probably has a very different conceptualization of the sound [r], because he‘s heard British (r-less) pronunciations much more often than I have. Again, the distinction is probably quite subtle, but studies by David Pisoni, Lynne Nygaard and colleagues have shown that listeners are better at recognizing a word that they‘ve heard before if it‘s spoken by the same person as they heard it from previously – the sounds of that word are subtly different for that listener simply by virtue of having heard that person say it before. Thus, the phonemes of a language are not only not constant over time, they are not really constant across different speakers of the same language.
I don’t know if this really gets at what you’re working on with respect to spelling, but, in general, I guess I’d say it doesn’t really help to think of speech perception (or production) as a model of a flawless system. Because it’s not. It’s pretty amazing, but it’s not by any means perfect.