I’m saying goodbye to 2016 in appropriate fashion: spending time with my family, eating a lot, fighting a cold, and studying word things.
Over the years that I’ve been at this word study and teaching and training thing, I’ve encountered references to a 1966 study known as The Stanford Spelling Survey, by Hanna, Hanna, Hodges, and Rudorf, four professors of education who analyzed 17,310 English words and wrote up their research in an article that’s cited over and over and over. From this analysis of less than 2% of English words and a lot of number crunching, Hanna et al. concluded that English is 67% “regular.” That study has been used as the foundation of so much of modern phonics, including pedagogical decisions based on what patterns are considered “regular,” “common,” and “exceptions.”
This 50-year-old phonocentric study was brought to my attention again while I was working on my dissertation this past week, and also by a comment on my last post which I did not publish out of deference to the writer, who, like me, is a business owner with a public profile; unlike me, she runs a phonics center that trains people in Wilson and LETRs and other shopkeeping packages that I’ve countered with linguistic evidence many times before. She wrote a comment to argue that the “frequency of occurrence with regard to nonsense words” matters, and cited a table from a 2010 book (which I have) that was copied from a 1976 book (which I also have), which itself was citing an article from 1966 (which I also have), that was in turn built on one author’s question from 1949 (yes, I have that too).
Paul Hanna’s 1949 question was “regarding the correspondences [of graphemes and phonemes] and their consistency in spelling,” as explained in the 1966 article. Twice I was directed to that 1966 article in my studies this week; there are no coincidences. As I said, I run into citations of that study frequently. It’s common. But this week’s two encounters were louder in my head than usual. My email response to the LETRs Lady was clear and direct: I explained clearly that the “frequency of occurrence” of nonsense words is zero, and the “frequency of occurrence” of actual phonemes and graphemes in nonsense words is zero. The only evidence she had given me at all was a citation of a book citing another book citing an article, right? So I decided to trace it back to its source.
That table (which can be googled) was first published by Elsie D. Smelt in 1972 and has been cited widely since; her figures are taken from the 1966 Stanford Study. Smelt’s table says that “the most common way of writing each vowel sound is with one letter,” and this claim is attributed to the Stanford study as well. But what exactly do we mean by “common” or “frequent,” and how does that knowledge help readers and spellers? While single-letter vowel spellings may be the default grapheme for “long” and “short” vowel phonemes, spelling and reading strategies are not based on statistical calculations by proficient readers. Moreover, while we have only 6 single-letter vowel graphemes, we have more than 30 vowel digraphs and trigraphs, a ratio that troubles the notion of single letters being the “most common” spelling. Let’s see what Hanna et al. actually say.
Here’s the basic framework they offer:
“These structural components of oral language include: (A) the phonetic reservoir from which a phonemic code is selected, (B) the phonemic base, (C) the morphological base, that is, the arrangement of phonemes into speech units which minimally express meaning, (D) the syntactic and grammatical base, that is, the arrangement of morphemes into syntactic patterns, and (E) the semantic base, which conveys meanings in terms of the conceptual system of a language community.” [I’m substituting his numbers with letters to make this post easier to write.]
Two things struck me right away: first, that these educators at least acknowledge a distinction between phonetic and phonemic concerns, which is more than I can say for many present-day phonics resources; and second, that they — and everyone who has followed in their formidable footsteps — have the way a language works totally backwards. Now, they’re talking about oral language rather than written, but the point is the same: you don’t start with phonetics and end up in meaning; rather, you start with meaning and from there, can analyze words (lexemes) into their sublexical (smaller-than-word) structures, including morphemes, phonemes, and the graphemes that pinpoint and reveal them.
In the word study I’m engaged in, we ask four questions:
(1) What does it mean?
(2) How is it built?
(3) What are its relatives?
(4) What segments and features of pronunciation matter to meaning? These segments are the only ones that are revealed in the spelling.
Question 1 has to be first — there’s no point in knowing how to write a word whose meaning you don’t know. And Question 4 has to be last — you can’t figure out the orthographic phonology until you have evidence for the other pieces. But Questions 2 and 3 can and do toggle considerably in any investigation. So you start with meaning, and you stay rooted in meaning all the way through. What does it mean? And even Question 4, which deals with pronunciation, only concerns itself with aspects of pronunciation that matter to the meaning. So it’s the Stanford Study’s fifth and final concern — semantics, “the conceptual system of a language community” — which is where we actually need to start.
Our second question, How is it built?, is captured more or less in the Study’s third and fourth concerns, in which “the morphological base” and “the arrangement of morphemes” is considered. They define morphology as “the arrangement of phonemes into speech units which minimally express meaning.”
Oh if only there were some way to make those “speech units” that we use to “express meaning” visible!
Working backwards still, the Study’s second concern is phonology, the “phonemic base.” The reason there’s any fifth piece is because they’re talking about oral language, so phonetics is a thing because it’s actually spoken, and because although they differentiate phonetics from phonemics, they don’t seem to have any idea in the article that phonetics has nothing to do with orthography.
Of course, the Stanford Spelling Study doesn’t even mention etymological relatives, because it has no idea about the etymological governance of graphemes. It can tell you that 10% of the 17,000 words that have /i:/ are spelled with <ee>, and 10% are spelled with <ea>, but it can’t tell you why <beech> and <beach> make sense. This study knows nothing about etymological markers or why words have a single, final, non-syllabic <e>. We know better now, so why is 21st-century so-called reading research still so married to a half-century-old, roundly debunked understanding of graphemes?
Seriously, professionals need to stop embarrassing themselves by clinging to these relics.
I also took a look at the numbers and at the phonemic and graphemic inventories used by this seminal study. It’s a bloodbath. I am not exaggerating. The phonemic inventory is lifted directly from the Merriam Webster Dictionary, which is important, because even if dictionaries were actually right about everything (they’re not), we’re still talking about a dictionary that has been updated and changed multiple times, including with regards to its pronunciation key, over the past 50 years. So the “research” that people want me to consider is based on a 50-year-old dictionary, interpreted by 50-year-old research, cited 40 years ago, and then re-cited in very recent years, none of which is evidence of anything at all about the language other than what cruddy research practices we have in literacy education.
The authors themselves “readily admit that this pronunciation key [from the Merriam-Webster Dictionary] has several critical weaknesses.” They also acknowledge that linguists don’t always agree about everything, and that their graphemic inventory (which was all about how easily a computer could process 17,310 words) was also flawed.: “Unfortunately, complete consistency with this criterion could not be maintained, and so some exceptions to this general rule will be found among the list.” So we’re in exception-land, which is really not science. They do ask questions like “Is <I> a part of the graphemic option <TI> or <IO> in nation? In conscience, is <I> a part of the graphemic option <SCI> or <IE>?”, and they conclude that “Again linguists disagree upon this point.”
Well, folks, linguists may have disagreed on that point a half century ago, but orthographic linguists don’t disagree about it now. I already laid out proof in another post that there’s no <ie> in conscience — no matter that Louisa Moats says there is as though she proved it (she didn’t). Linguistics is a science, and we know more now about these kinds of questions — we have better tools now than we had 50 years ago, like the lexical word matrix, the orthographic word sum, the mini matrix maker, and the Online Etymology Dictionary, and better, faster ways of disseminating and discussing investigations and new information (in real time online classes, on editable websites and social media. We don’t have to carry around some dusty old misunderstanding like it’s our last keepsake from our long lost Pappy.
For reals, why are professionals — researchers and educators, of all people — clinging to 50-year-old research that didn’t even conceive of today’s scientific tools? Can you imagine if a surgeon or a rocket scientist did that? Mayhem. Can you imagine if we elected someone who ignored and denied modern climate science as President? Oh, wait… Sigh.
Science matters. Understanding the difference between factual, physical evidence, scientific consensus, and the repeated sub-letting of citations from, uh, wherever, something sciency-sounding, is just so critical to everything.
Among the lettery circus freaks that the Stanford Study offers in its admittedly troubled graphemic inventory are a *<bt> in debt, a *<ua> in guard and a *<cc> in occur. In real life, the <b> in debt is an etymological marker (debit); the <u> in guard, guarantee, guerilla, guest, etc., is part of <gu> digraph that can mark an etymological relationship to cognates with a <w>: guard~warden, guarantee~warrantee, guerilla~war, guide~guise~guywire~wit~witness (‘to see’), guile~wily. And as any regular reader already knows, the two <c>s in <occur> are each in separate morphemes. That’s like saying that there’s an <ea> in react or a <th> in hothouse. Big fat can of graphemic nope.
I could go on and on and on and on, but I’m gonna go hang out with my kid and watch a ball drop on this crazy calendar year. I’m not much for resolutions, but I’d welcome resolve to move into 2017 not clinging to antiquated phonics research like it’s a bible or a gun and something evil is after you.
I’m sorry that modern phonics is built on a rickety, outdated, dismantled, misguided, misquoted old study. I’m not sorry for pointing it out, and I’m not sorry for yelling a little. If you were clinging to a life raft of the same age and quality and I had a new speedboat, I’d be yelling just as loudly to save your life as I am now.