Common Knowledge

I’m saying goodbye to 2016 in appropriate fashion: spending time with my family, eating a lot, fighting a cold, and studying word things.

Over the years that I’ve been at this word study and teaching and training thing, I’ve encountered references to a 1966 study known as The Stanford Spelling Survey, by Hanna, Hanna, Hodges, and Rudorf, four professors of education who analyzed 17,310 English words and wrote up their research in an article that’s cited over and over and over.  From this analysis of less than 2% of English words and a lot of number crunching, Hanna et al. concluded that English is 67% “regular.” That study has been used as the foundation of so much of modern phonics, including pedagogical decisions based on what patterns are considered “regular,” “common,” and “exceptions.”

This 50-year-old phonocentric study was brought to my attention again while I was working on my dissertation this past week, and also by a comment on my last post which I did not publish out of deference to the writer, who, like me, is a business owner with a public profile; unlike me, she runs a phonics center that trains people in Wilson and LETRs and other shopkeeping packages that I’ve countered with linguistic evidence many times before.  She wrote a comment to argue that the “frequency of occurrence with regard to nonsense words” matters, and cited a table from a 2010 book (which I have) that was copied from a 1976 book (which I also have), which itself was citing an article from 1966 (which I also have), that was in turn built on one author’s question from 1949 (yes, I have that too).

Paul Hanna’s 1949 question was “regarding the correspondences [of graphemes and phonemes] and their consistency in spelling,” as explained in the 1966 article. Twice I was directed to that 1966 article in my studies this week; there are no coincidences. As I said, I run into citations of that study frequently. It’s common. But this week’s two encounters were louder in my head than usual.  My email response to the LETRs Lady was clear and direct: I explained clearly that the “frequency of occurrence” of nonsense words is zero, and the “frequency of occurrence” of actual phonemes and graphemes in nonsense words is zero. The only evidence she had given me at all was a citation of a book citing another book citing an article, right? So I decided to trace it back to its source.

That table (which can be googled) was first published by Elsie D. Smelt in 1972 and has been cited widely since; her figures are taken from the 1966 Stanford Study. Smelt’s table says that “the most common way of writing each vowel sound is with one letter,” and this claim is attributed to the Stanford study as well. But what exactly do we mean by “common” or “frequent,” and how does that knowledge help readers and spellers? While single-letter vowel spellings may be the default grapheme for “long” and “short” vowel phonemes, spelling and reading strategies are not based on statistical calculations by proficient readers. Moreover, while we have only 6 single-letter vowel graphemes, we have more than 30 vowel digraphs and trigraphs, a ratio that troubles the notion of single letters being the “most common” spelling.  Let’s see what Hanna et al. actually say.

Here’s the basic framework they offer:

“These structural components of oral language include: (A) the phonetic reservoir from which a phonemic code is selected, (B) the phonemic base, (C) the morphological base, that is, the arrangement of phonemes into speech units which minimally express meaning, (D) the syntactic and grammatical base, that is, the arrangement of morphemes into syntactic patterns, and (E) the semantic base, which conveys meanings in terms of the conceptual system of a language community.” [I’m substituting his numbers with letters to make this post easier to write.]

Two things struck me right away: first, that these educators at least acknowledge a distinction between phonetic and phonemic concerns, which is more than I can say for many present-day phonics resources; and second, that they — and everyone who has followed in their formidable footsteps — have the way a language works totally backwards. Now, they’re talking about oral language rather than written, but the point is the same: you don’t start with phonetics and end up in meaning; rather, you start with meaning and from there, can analyze words (lexemes) into their sublexical (smaller-than-word) structures, including morphemes, phonemes, and the graphemes that pinpoint and reveal them.

In the word study I’m engaged in, we ask four questions:
(1) What does it mean?
(2) How is it built?
(3) What are its relatives?
(4) What segments and features of pronunciation matter to meaning? These segments are the only ones that are  revealed in the spelling.

Question 1 has to be first — there’s no point in knowing how to write a word whose meaning you don’t know.  And Question 4 has to be last — you can’t figure out the orthographic phonology until you have evidence for the other pieces. But Questions 2 and 3 can and do toggle considerably in any investigation. So you start with meaning, and you stay rooted in meaning all the way through. What does it mean?  And even Question 4, which deals with pronunciation, only concerns itself with aspects of pronunciation that matter to the meaning. So it’s the Stanford Study’s fifth and final concern — semantics, “the conceptual system of a language community” — which is where we actually need to start.

Our second question, How is it built?, is captured more or less in the Study’s third and fourth concerns, in which “the morphological base” and “the arrangement of morphemes” is considered. They define morphology as “the arrangement of phonemes into speech units which minimally express meaning.”

Oh if only there were some way to make those “speech units” that we use to “express meaning” visible!

Working backwards still, the Study’s second concern is phonology, the “phonemic base.” The reason there’s any fifth piece is because they’re talking about oral language, so phonetics is a thing because it’s actually spoken, and because although they differentiate phonetics from phonemics, they don’t seem to have any idea in the article that phonetics has nothing to do with orthography.

Of course, the Stanford Spelling Study doesn’t even mention etymological relatives, because it has no idea about the etymological governance of graphemes. It can tell you that 10% of the 17,000 words  that have /i:/ are spelled with <ee>, and 10% are spelled with <ea>, but it can’t tell you why <beech> and <beach> make sense. This study knows nothing about etymological markers or why words have a single, final, non-syllabic <e>. We know better now, so why is 21st-century so-called reading research still so married to a half-century-old, roundly debunked understanding of graphemes?

Seriously, professionals need to stop embarrassing themselves by clinging to these relics.

I also took a look at the numbers and at the phonemic and graphemic inventories used by this seminal study. It’s a bloodbath. I am not exaggerating. The phonemic inventory is lifted directly from the Merriam Webster Dictionary, which is important, because even if dictionaries were actually right about everything (they’re not), we’re still talking about a dictionary that has been updated and changed multiple times, including with regards to its pronunciation key, over the past 50 years. So the “research” that people want me to consider is based on a 50-year-old dictionary, interpreted by 50-year-old research, cited 40 years ago, and then re-cited in very recent years, none of which is evidence of anything at all about the language other than what cruddy research practices we have in literacy education.

The authors themselves “readily admit[] that this pronunciation key [from the Merriam-Webster Dictionary] has several critical weaknesses.”  They also acknowledge that linguists don’t always agree about everything, and that their graphemic inventory (which was all about how easily a computer could process 17,310 words) was also flawed.: “Unfortunately, complete consistency with this criterion could not be maintained, and so some exceptions to this general rule will be found among the list.” So we’re in exception-land, which is really not science. They do ask questions like “Is <I> a part of the graphemic option <TI> or <IO> in nation? In conscience, is <I> a part of the graphemic option <SCI> or <IE>?”, and they conclude that “Again linguists disagree upon this point.”

Well, folks, linguists may have disagreed on that point a half century ago, but orthographic linguists don’t disagree about it now. I already laid out proof in another post that there’s no <ie> in conscience — no matter that Louisa Moats says there is as though she proved it (she didn’t). Linguistics is a science, and we know more now about these kinds of questions — we have better tools now than we had 50 years ago, like the lexical word matrix, the orthographic word sum, the mini matrix maker, and the Online Etymology Dictionary, and better, faster ways of disseminating and discussing investigations and new information (in real time online classes, on editable websites and social media. We don’t have to carry around some dusty old misunderstanding like it’s our last keepsake from our long lost Pappy.

For reals, why are professionals — researchers and educators, of all people — clinging to 50-year-old research that didn’t even conceive of today’s scientific tools? Can you imagine if a surgeon or a rocket scientist did that? Mayhem. Can you imagine if we elected someone who ignored and denied modern climate science as President? Oh, wait… Sigh.

Science matters. Understanding the difference between factual, physical evidence, scientific consensus, and the repeated sub-letting of citations from, uh, wherever, something sciency-sounding, is just so critical to everything.

Among the lettery circus freaks that the Stanford Study offers in its admittedly troubled graphemic inventory are a *<bt> in debt, a *<ua> in guard and a *<cc> in occur. In real life, the <b> in debt is an etymological marker (debit); the <u> in guard, guaranteeguerillaguest, etc., is part of <gu> digraph that can mark an etymological relationship to cognates with a <w>: guard~warden, guarantee~warrantee, guerilla~war, guide~guise~guywire~wit~witness (‘to see’), guile~wily.  And as any regular reader already knows, the two <c>s in <occur> are each in separate morphemes. That’s like saying that there’s an <ea> in react or a <th> in hothouse. Big fat can of graphemic nope.

I could go on and on and on and on, but I’m gonna go hang out with my kid and watch a ball drop on this crazy calendar year. I’m not much for resolutions, but I’d welcome resolve to move into 2017 not clinging to antiquated phonics research like it’s a bible or a gun and something evil is after you.

I’m sorry that modern phonics is built on a rickety, outdated, dismantled, misguided, misquoted old study. I’m not sorry for pointing it out, and I’m not sorry for yelling a little. If you were clinging to a life raft of the same age and quality and I had a new speedboat, I’d be yelling just as loudly to save your life as I am now.


  1. Tom Berend says:

    Arguing against phonics runs into the ‘false news’ barrier: people don’t accept facts that conflict with their world view. Contrary evidence simply causes people to retreat into their beliefs even more strongly.

    Thanks to the internet, each of us has access to the widest pool of knowledge ever assembled. Anyone can hunt down the Hanna paper if they want to confirm your claim. And you would think that educators and curriculum designers would want to, but they don’t and won’t.

    An extreme case of false-news is the vaccination-autism link. It is based on a single study that was quickly retracted and shown to be outright fraud, yet we all treat that link as a real thing, something that one could have an opinion on.

    There’s another falsehood closely related to your sermon on phonics – the link between reading and phonological awareness. That idea is the foundation of the entire dyslexia-disability industry, and is mostly based on the uber-famous paper: Liberman, Shankweiler & Liberman (1989), “The Alphabetic Principle and Learning to Read”. The famous dowel-tapping experiment.

    Everything in that paper was wrong. The conceptual basis was flawed, the experimental results were discredited, the arguments were not supported by anything that could be called ‘science’. But the conclusion – that some children have particular difficulty in learning to read because of a general deficiency in their phonological skills – sticks to us like a burr.

    So go spread the good word. But don’t be surprised if phonics wins, if 2017 becomes “The Year of Phonics”.

  2. Pete says:

    Thanks for digging into that old research and showing how some parts of it just get passed down and down and then re-presented as though it is currently relevant without any critical analysis whatever.

    Reading your story here made me think of an odd kind of irony that I hadn’t considered before. I’m not sure if I’ve nailed down what I’m thinking, but I’ll give it a go…

    Research like what you describe here enters the “common knowledge” through an uncritical accepting of statements from some research source that conforms to existing assumptions. It’s a kind of inertia of ideas with no friction offered by the scientific/educational community. But then when ideas arise that counter the long-standing assumptions that same research that was received uncritically is actively drawn out to critically challenge the alternate hypothesis.

    In scientific inquiry, EVERY new hypothesis should be challenged vigourously. But there is a kind of unscientific irony when the mantle of science is claimed to challenge new hypotheses by drawing on old hypotheses which were accepted with zero challenge in the first place.

    Revealing that the genesis of chart that is pulled out for a current scientific debate is actually 50 years old with many obvious false analyses (e.g. A *bt digraph in debt!) offers a very important illustration of the kind of scientific dialogue that played a role in bringing us an education system in which the standard “research based” literacy instruction demonstrably misrepresents the basic function of our writing system.

    We need hypotheses about how our writing system works and literacy instruction to receive “friction” from the research and educational communities. But that friction needs to come from demonstrable evidence — not from untested assumptions passed down and down that happen to have the word “research” attached to it.

    • Pete this comment is beautifully concise and clear — it certainly reads to me like someone who has “nailed down” their thinking.

      This is the best part, for me: “[T]here is a kind of unscientific irony when the mantle of science is claimed to challenge new hypotheses by drawing on old hypotheses which were accepted with zero challenge in the first place.”

  3. rebeccamarsh says:

    Thank you so much, Gina, for the time you take to shine your light into the shadows and far dark corners of our ignorance and complacency. Thanks for introducing friction, as Pete analogized, to the slippery slopes down which edubabblers have long been joyriding, without regard for the damage in their wake.

    By the way, your blog clock thinks a.m. is p.m.; your recent posts and attendant comments appear 12 hours before their timestamps.

    Looking forward to “Zero Allophone” (pretty soon , I hope).

    Best wishes for a much better than we anticipate 2017! Rebecca Marsh


  4. Marlene says:

    Have you by any chance read any of Robert Port’s work?

    • [UPDATE: I do plan to read the article, and will post it if I think it’s helpful — and it very well may be. My reaction below was defensive and I’m sorry to Marlene and to others. I misapprehended her post as more Phonics apologia and I was wrong to do so.]

      Why no, Jelly and Bean Phonics Lady, I have not. And I’m not publishing the link you sent here, unless and until you can answer the following questions:

      1. Have you ever read anything I’ve written about the difference between citing other people’s work without explanation versus being able to articulate your own understanding of something, with actual evidence? Citation is only evidence of your ability to cite and to pass English class, not of your ability to think or of your understanding.

      2. What do YOU understand Port to be saying, and what’s the evidence for that? Why is that understanding germane to this discussion?

      3. If you and I were colleagues at a university, and you had an article you wanted me to read, would you walk into my office, put it on my desk, and walk out? Or would you say, “I think you’ll enjoy this because ____________”? I’d like to think it’s the latter, because of course we are both grown-ups and we both know that we both already have way too many things to do, think, and read, so you’d better have a good reason for insisting that this make the top of my pile.

      So, why should I read what you sent me? Why should I publish that link here?

  5. Benita B Belsley says:

    Thank you, Gina! Your holiday sounds similar to mine…cold included. Hope you’re feeling better. Happy, healthy 2017!

  6. Marlene says:

    Professor Robert Port, of Indiana University, has been writing since 2005, that phones and phonemes are not cognitive units, and that they are letter/speech blends. He writes that we do not need them for spoken language, and it is only when we learn to read and write and we have to link sounds to letters in real words that we need them

    I have been arguing against synthetic phonic teaching and the Phonic Screening Check here in the UK, for over 5 years. I have found Robert Port’s work to be central to this debate. If I had a means of contacting you about him, other than a comment on your blog, I may have done so.

    However, it is statutory to teach children to read using synthetic phonics in England. Teachers cannot avoid teaching children phonics, GPCs, in isolation in daily discrete lessons in their first two years at school and all children have to take the Phonics Screening Check (20 real words and 20 pseudo word) at the end of Year 1. (Reception Year and Year 1)

    Your blog from yesterday was the first I had come across. I have tweeted a link to it.

    I hope I have answered your questions. However, my intention was to make sure you knew about the work of Robert Port. The link was for you.


    • Thanks, Marlene, and please forgive me for coming on strong. I have had an absolute onslaught of phonics pholks sending me links to read with no explanation, and with a great deal of passive aggression. I apologize for being in defense mode reflexively, and I am pleased to learn more about you. I have the PDF you linked and I’ll have a look at it. Thanks for engaging and for your patience. You can contact me privately through my website at the link that says “contact me.”

      I know about statutes and mandates in education. It’s one of many reasons I’m self-employed: no one pays me to care about any of that stuff anymore. I really have no interest in what British or Australian or American governments mandate about education; I work with people all over the world, of all ages, and of a variety of educational experiences, not just children in British primary schools.

      Thanks again for weighing in and I’ll read the Port article when I can.

      • Pat Stone says:

        I have had many ‘discussions’ with phonics fanatics in UK and Australia – most of them are selling their own wares. I understand you don’t want to get involved in other countries’ Govt nonsense, but these ppl in Aus and NZ and USA are pushing for the phonics screening check, used already in England, and all the crap that goes with it. I don’t think you’ll be best pleased if it gets a foothold in America or amongst your students. It is an abomination. These people do not listen to reason or accept bona fide research. They have their own ‘evidence’ with which to bamboozle the uninitiated.

  7. lynnheasley says:

    Hi Gina,
    I took your training in Bangor, Maine a couple of years ago and really enjoyed hearing you share your passion. I will admit that I do not have an ounce of your understanding of the English Language and find the process slightly daunting, though I have poked and stabbed at it with my son and with a couple of clients. I hope to begin this work in my Read 180 program I currently teach at a local high school.

    I am currently on the training committee of a local Adult Literacy Volunteer organization here in Maine and we are looking heavily at assessments. Do you have any thoughts on assessments we could use with adults to get a sense of their current reading to be used as initial, formative, and post assessment material?

    • Hi Lynn. I totally remember you from Bangor! We ate lunch together and you had relocated there not too much before then… I never expect anyone to know everything I know; I do expect that people will take ownership for what they do understand, though, and not just say poingt to some class they took or some article they read or some national mandate.

      I’m a linguist, not a psychometrician. I do not recommend specific assessments; in my experience, every standardized language assessment I’ve ever encountered has some serious design flaws with regards to the language, which is what they’re trying to measure knowledge of. It’s a mess. The best way to assess what any student knows about language is for teachers to actually understand how the language works. I can think of no single investment that matters even half as much. I have no idea how much your committee is going to spend on assessments and on training people to administer them, and having working in the non-profit wold on both ends (funder and fundee), I know how important everyone thinks assessments are. But I am a self-employed linguist now and no one is paying me to study assessments or to have well-informed recommendations to make. I personally and professionally think they’re mostly hogwash, based on my experience with them in both my personal and professional life. But anyhow, my studies are focused on the language itself, not on assessments, and I’m 100% clear about that.

      I have worked in adult literacy and family literacy as well and it remains some of the hardest and most rewarding work I’ve ever done. I wish you Godspeed.

  8. Pat Stone says:


  9. Bill Keeney (from DVFS) says:

    I noted in this blog your saying, “This study knows nothing about etymological markers or why words have a single, final, non-syllabic . We know better now. . .” I am interested in this topic and want to know what it is we know. I searched your blogs for the non-syllabic and found some uses of it (“Come Home” was fascinating), but no comprehensive discussion of this topic. Could you point me in the direction of that knowledge? Thanks. –Bill

    • My LEX Grapheme Deck, for starters, has a final E card, and also discusses markers.
      My LEX InSight Word Decks have information on etymological markers in specific words and word families.
      My TED videos all talk about markers and the final E.
      My LEXinars are rich with discussions with other scholars about these very topics, as well as a zeroed grapheme or zero allophone.

      If you are asking me for discussions beyond my own circle of scholarship, I can’t help you. I am still waiting for everyone else to catch up. Maybe you can tell me how you searched the blog, and what for, so I can be of better help.

      • Bill Keeney (from DVFS) says:

        No, that’s fine. I will look at your grapheme deck. I think I know now what you were referring to (its multiple uses, such as after a s, v, and z), and I think I was asking or thinking about a different question that has been running around in my head, which is why we sometimes use the final, non-syllabic to mark “long” vowels and sometimes use vowel digraphs instead. In fact, I sort of think of VC-e as a “split digraph.” But I was hoping there might be an explanation as to why this convention developed and if there were any pattern or principles governing its use (or the converse, the use of a digraph instead)–especially when there are no homonyms that would explain a different spelling.

        I just love finding these things out. I remember, for example, that Melvyn had an explanation about when and when would have been the preferred spelling. It had something to do with which vowel letter from the Futhark was used to spell the word in Old English. (I can’t quite remember what it was, but could find it in my notes.) So, when I saw your comment I was hoping there was some reason why the final, non-syllabic as a “long” vowel marker happened.

        I also got kind of obsessed at one point with AI and why it only seems to appear before certain letters.

        • Hey, Bill, I had to edit your comment. You can’t use angle brackets in a comment because the site reads it as HTML and your words disappear.

          I do not think of vowel-consonant-e patterns as a “split digraph” any more than I think of them as unicorns. Split digraphs aren’t a thing. A final e can be a marker, and one of the things it can mark is the phonology of a preceding vowel. To say that when there’s a “long vowel” [sic] all of a sudden the e is no longer a marker, but part of some new, made-up thing? Nothing elegant about that explanation, as I see it.

          I read a lot of phonocentrism in your questions and I think I can help. You’re privileging “long vowels” (I prefer to differentiate between tense and lax vowels, if I have to categorize vowels; short & long are pedagogical terms, not linguistic. I know I know I’m a giant snob.) If you want to suggest that “long” vowels mean that a final e is part of some so-called “split digraph,” then why don’t you say the following words have “split trigraphs”: please, waive, deuce, etc.? Then you can make up two new things instead of just one.

          Just look at words like babe/baby, craze/crazy, whine/whiny — are you suggesting that one word in each pair ha a “split digraph” and the other one has a 2nd vowel that somehow replaces half of the split digraph? Instead of just a finale E being replaced by a vowel suffix… See how messy that way of thinking gets when you follow its tail?

          Why make things up at all? Why not just observe what’s actually happening in the system, catalog it, analyze it, and don’t privilege the phonology? That’s what I did when I made the LEX Grapheme Deck. It’s cool that you checked the word-searcher for words with AI — that’s a good place to look, but that’s not the (only) place to look if you want to understand WHY an AI rather than just that it’s an AI. You ended up with a small list of letters that can follow AI. You’re spinning you wheels in phonocentrism, trying to study phonemes and graphemes absent of any meaningful context. Just strings of letters.

          Why waive and wave? Start with meaning and look for structure and relatives. Then try to make sense of the graphemes.

          Waive is related to waif — both of these denote something unclaimed, abandoned.
          Wave is related to weave — like ate to eat and mate to meat and fact to feat.

          Why pain and pane? Pain is elated to subpoena and penal; pane is related to panel.
          Main and mane? Mane is related to manilla, and I like a synchronic connection to mantle, mantilla. Main, like many words with an AI digraph, has relatives with a G, C, or Y: might, may, machine, mechanic (‘power’)
          rail~regal~royal (something that rules)

          There is no aigh or aign. There’s an a.igh and an or ai.g.n.

          Words of Old English origin, like name or stone or ride, typically had vocalic inflections on the end. Latinate monosyllables like sane or face or rite also had something vocalic where that e is now. That final e is no longer syllabic, but it’s there for a reason, often several.

          I wrote this in another comment, and it’s critical: Grapheme choice is governed by a series of competing influences and constraints; the optimal form is what surfaces (just like in a super-simplified version of Optimality Theory in phonology, for any linguists reading). It is the survival of what fits. There is a reason that whole has a WH, for example (think holistically), and it’s not because of its root (OE hāl). It’s because HOLE already means something else.

  10. Michelle Montali says:

    Will you point me to the entry where you discuss the “ie” letter sequence in “conscience”?

Leave a Reply

Your email address will not be published. Required fields are marked *