All About Readability
By Cheryl Stephens
Why are we looking at readability tests?
The use of readability tests in the plain language process is a controversial topic. Now that readability scores are easy to obtain by using computerized grammar and style checking software programs, there is new pressure to adopt them. While some people use readability tests to help them make their writing plainer, other people are fervently opposed to their use.
For example, ten years ago the International Reading Association and the U.S. National Council of Teachers of English were advising members against uncritical use of readability tests to assess educational materials. At about the same time, two government reports in England validated the accuracy and reliability of the tests. Some of the disputes about readability tests arise because people make use of them for different purposes and different purposes from those that lay behind the development of the tests.
We want to look at the original reasons for development of readability tests, the historical development of the tests and the purposes to which they are now put. From there, we can discuss how they ought to be used in the plain language process.
What is Readability
Readability describes the ease with which a document can be read. Readability tests, which are mathematical formulas, were designed to assess the suitability of books for students at particular grade levels or ages.
The tests were intended to help educators, librarians and publishers make decisions about purchase and sale of books. They were also meant to save time - because before the formula were used those decisions were made on recommendations of educators and librarians who read the books. These people were taking books already written and figuring out who were the appropriate reading groups.
Webster's defines "readable" as:
Obviously, readability formulas cannot measure features like interest and enjoyment. Also, when we ask whether text is understood by its reader we are questioning is "comprehensibility". Readability formulas cannot measure how comprehensible a text is. And they cannot measure whether a text is suitable for particular readers needs.
A brief historical overview
The First Formulas
Readability formulas were first developed in the 1920s in the United States. From the earliest efforts to today, readability tests have been designed as mathematical equations which correlate measurable elements of writing - such as the number of personal pronouns in the text, the average number of syllables in words or number of words in sentences in the text.
Factors like these are usually described as "semantic" if they concern the word used and "syntactic" if they concern the length or structure of sentences. Both semantic and syntactic elements are surface-level features of the text, and do not take into account any the nature of the topic or the characteristics of the readers.
Designers of one early formula began with 289 elements of content, style of expression and presentation, format and organization and reduced them down to the 5 style factors which could be counted most reliably and would be most relevant to the needs of adults with limited reading skills. Four of the fhe factors were:
How and Why Were They Developed
The very first readability study was a response to demands by junior high school science teachers to provide them with books which let them teach scientific facts and methods rather than get bogged down in teaching the science vocabulary necessary to understand the texts. The earliest investigations of readability were conducted by asking students, librarians, and teachers what seemed to make texts readable.
The publication in 1921 of The Teacher's Word Book by Thorndike provided a means for measuring the difficulty of words and permitted the development of mathematical formula. Thorndike tabulated words according to the frequency of their use in general literature. Later other word lists and reading lessons were adopted to measure word difficulty. It was assumed that words that were encountered frequently by readers were less difficult to understand than words that were appeared rarely. Familiarity breeds understanding. There is some soundness to this. There are today more than 490,000 words in the English language and another 300,000 technical terms. It is unlikely that an individual will use more than 60,000 and the average person probably encounters between 5,000 and 10,000 words in a lifetime.
Readability formulas today
How Do They W ork?
Readability formulas measure certain features of text which can be subjected to mathematical calculations. Not all features that promote readability can be measured mathematically. And these mathematical equations cannot measure comprehension directly. Readers can be questioned or tested on material they have read and the material can be tested with formulas. The readers success in understanding the material as measured on an exam can be correlated to the readability score of the text itself. This is one method to validate the formulas.
Other features of a document are just as important as word length and sentences to determining reading ease. Other aspects of language, sentence structure, and organization of ideas are significant to comprehensions. Also physical aspects of the document are important. These are type styles, layout, design, use of graphics and so on.
Other features of clear writing are:
So readability formulas are considered to be predictions of reading ease but not the only method for determining readability. And they do not help us evaluate how well the reader will understand the ideas in the text.
What Factors Do They Measure?
Today readability formulas are usually based on one semantic factor (the difficulty of words) and one syntactic factor (the difficulty of sentences). Studies have confirmed that the inclusion of other factors in the formula contributes more work than it improves the results. Put another way, counting more things does not make the formula any more predictive of reading ease but takes a lot more effort.
Words are either measured against a frequency list or are measured according to their length in characters or syllables. Sentences are measured for the average length in characters or words.
Graphs, Charts and Computer Functions
Readability tests can be performed manually by counting and doing a mathematical calculation, or be referring to a chart or graph. Readability tests can be performed by computer. Most grammar or editing software today can perform several readability tests.
The Fog Index is computed this way:
The Flesch Scale
The Flesch Reading Ease Scale is the most widely used formula outside of educational circles. It is the easiest formula to use, and it makes adjustments for the higher end of the scale. It measures reading from 100 (for easy to read) to 0 (for very difficult to read). A zero score indicates text has more than 37 words on the average in each sentence and the average word is more than 2 syllables. lesch has identified a "65" as the Plain English Score. In response to demand, Flesch also provided an interpretation table to convert the scale to estimated reading grade and estimated school grade completed.
In 1963 Fry published his readability graph which was easier than manual computations. The graph was revised in 1977 and then became the most widely used formula. A hand-held calculator was developed to do the Fry test, and now it is incorporated in computer programs.
Also in 1963, the first computerized readability formula was developed and many others have been devised since. Some computer formulas are based on characters per word and characters per sentence while others measure syllables. The difference between computerized measures today depend on the developers decisions about how to measure sentences or words. For example, some programs treat a period, colon, or semi-colon as the sign of the end of a "sentence". This is in keeping with some research which concludes that the sentence is not the unit for measure. Rather the "sousphrase" which we might consider to be a clause represents the unit of thought for measure because it is the cognitive decoding unit.
Today most grammar software programmes provide more than one readability measure as well as comparisons to well-known writing. In addition to word, sentence and paragraph statistics, Grammatik IV gives the Flesch Readability Scale, Gunning's Fog Index in years of education, and the Flesch-Kincaid Reading Grade Level. In addition to a qualitative assessment of the writing, Stylewriter, a plain-English editorial program, provides word and sentence statistics with an index percentage of the passive verbs used as well as a count words in various categories: complex, jargon, abstract, legal, tautologies and so on.
What is Cloze procedure?
The "cloze" procedure for testing your writing is often treated as a readability test because a formula exists for translating the data from "cloze tests" into numerical results. The name "Cloze" comes from the word "closure". In this procedure, words are deleted from the text and readers are asked to fill in the blanks. By constructing the meaning from the available words and completing the text, the reader achieves "closure". (elaboration below)
In 1953 the "cloze procedure" was developed and later, after 1965, formulas were developed for its use. It became a popular method for measuring the suitability of text for a particular audience. It was popular because its scoring was objective; it was easy to use and analyze; it used the text itself for analysis; and it yields high correlations to other formulas.
The cloze technique does not predict whether the materials is comprehensible; it is an actual try-out of the material. It tells you whether a particular audience group can comprehend the writing well enough to complete the cloze test.
Cloze procedure consists of deleting words in a text and asking the reader to fill in the appropriate or a similar word. Usually every fifth word is deleted. Cloze is thought to offer a better index of comprehensibility than the statistical formulas. The ability to identify the missing word or to insert a satisfactory substitute for the original word indicates that the reader comprehends the content of the text.
Close testing has been called a "rubber yardstick" because Cloze scores reflect both the difficulty of the text and the readers abilities or resources. Like any readability test, the problem arises over what is considered a successful completion of the text: inserting 50% of missing words, 75% or 100%. Today educators recognize that cloze procedre4us are more suitable to assess readers' abilities than to measure the readability of text. Critics have pointed out that cloze can operate on the basis of measuring redundancy -- that in some texts it measures the number of redundant words rather than implicit words.
In particular, critics suggest that Cloze is inappropriate for measuring text or reader's abilities in languages other than their native language. The results of close testing reflect the reader's basic intuition about the structure and vocabulary of the target language -- and that does not exist for the language student.
Cloze testing is widely used now to assess the abilities of readers, but is usually combined with other tests measuring grammar skills and writing ability. One educator comments:
"The underlying assumption in cloze testing is that a close relationship exists between reading comprehension and writing skill. The test measures the student's ability to select appropriate words if occasional gaps occur in a passage, based on their ability to infer meaning from context and cultural experience. The word cloze is related to the concept of closure, the human tendency to complete a partly finished pattern, to pick out key words and rely on language repetition in English discourse. The theory origin ated in Gestalt psychology and assumes that in figuring out the missing word, the mind goes through a process of sampling, predicting, testing, and confirming the appropriate word choice. The argument is that this process involves both recognition skills (required in discrete formal testing) and the production of a significant content (required in written passages). In theory at least, the cloze test is an integrated rather than a formal test, but the advantage is that it can be marked efficiently and objectively." ("Assessment Report, Communications Discipline", by Roslyn Dixon, Communications Assessment Coordinator, Douglas College, June 1, 1989)
One critic discussed Cloze in the context of it use in languages other than English:
"There is controversy regarding the use of cloze procedure in determining the readability of written materials. This controversy is based on the fact that cloze is a subjective evaluation that mirrors the language ability and background of information of the person taking the test. Also, some researchers feel that multiple cloze passages should be de4veloped from each piece of material for the results to be valid. For example, a test deleting every fifth word should be prepared in five versions, omitting a different word each time. Though these views are shared by other countries, for want of a better technique, cloze procedure is widely used." (Annette T. Rabin, "Determining Difficulty Levels of Text Written in Languages Other than English" in Zakaluk and Samuels, p.46-76)
Should you use readability formulas?
Some say that readability formulas measure word length or frequency and sentence length. In using the formulas we accept that these features affect readability in a significant way.
Yet it can be argued that long sentences and difficult words are merely signals that the text is not written for ease of understanding. Some say difficult text often contains difficult words because it discusses abstract ideas while easy text uses common words because it discusses concrete experiences. Chosing smaller words and shorter sentences may not be as much help as reconstructing the sentences and using familiar vocabulary.
The Delegates Assembly of the International Reading Association resolved against using grade-level scores in 1981. And the (U.S.) National Council of Teachers of English advise against uncritical use of readability formulas in assessing text for school use. After 1981, the College Entrance Examination Board decided not to use grade-level measures to ascertain reading abilities of college applicants.
In recent years, researchers have emphasized that readability tests can only measure the surface characteristics of text. Qualitative factors like vocabulary difficulty, composition, sentence structure, concreteness and abstractness, obscurity and incoherence can not be measured mathematically. They have pointed out that material which receives a low-grade level score may be incomprehensible to the target audience. As an example, they suggest that you consider what happens if you scramble the words in a sentence, or on a larger scale, randomly rearranged the sentences in a whole text. The readability score could be low, but comprehension would be lacking.
example: Fall Humpty had Dumpty great a.
Things They Can Do
Things They Can't Tell You and Why
Because the readability formula are based on measuring words and sentences, they cannot take into account the variety of resources available to different readers. Reader resources are word recognition skills, interest in the subject, and prior knowledge of the topic. The formula cannot measure the circumstances in which the reader will be using the text or form - both the psychological and the physical situations. The formula cannot adjust for the needs of people for whom the text is written in a second or additional language.
Studies have shown that readability, interest and prior knowledge in the reader are equally important factors in comprehension and retention of information. The ease of reading that the reader experiences is also directly influenced by the writer's use of physical, syntactic, semantic and contextual cues which cannot be measured by these tests. Such clues include the use of personal pronouns, the lay-out and design of the text, the typography (use of highlighting and italics, etc), the use of signal words (now, then, but, later) and so on.
Readability tests cannot tell you whether the information in the text is written in a way to interest the reader, nor can they tell you whether reader has sufficient background information to appreciate the new information provided in the text.
How to use readability tests
Researchers have been critical of using readability tests on readers of an additional language. They point out that these tests cannot take into account that we mentally process our first language differently than we do additional languages we have acquired. Therefore a reader does not approach the text with the same or similar intuition for the language existing among native users. This is important when using cloze testing on text intended for people reading in an additional language. It is also significant when designing the testing groups for cloze tests or try-outs of the material. A population which meets the same criteria for first language must be used to accurately assess the readability of material written in a second or additional language.
Keep the readability formula out of the writing process itself.
Follow other guidelines to writing. If you like to work with guidelines in checklists, use the Document Design Centre's Guidelines, the CBA/CBA Guidelines, the CLIC Red Alert Editing System or Fry's Writeability Checklist.
Use the formulas for feedback only:
Remember that the readability test is only a screen and offers only a prediction. Remember that the score is only a prediction that the text is suitable for a particular reading grade. Remember that the formulas do not take into account other features which contribute to comprehension so they may underestimate or overestimate the suitability of the material.
Bear in mind that at higher grade levels the scores are not reliable because background and content knowledge become more significant than style variables.
Consider again the purpose of the text. Material which is intended for training readers can be more challenging to their resources than material whose purpose is to inform or entertain. As well, higher motivation in the readers may keep them reading challenging material which they might otherwise abandon out of frustration.
Pick a formula that works best for you and for the task at hand. Choose one that is easy to use. It should contain two variables whether words and sentences or characters per sentence and characters per word. For significant projects, use more than one test and expect slightly different grade level scores.
Test a large sample of the text or the whole text if using a computer program. By hand, test at least 3 sections of 100 words to arrive at an average score. Be cautious of doing so if there are great differences between sections of the text.
Combine the use of formula with other methods of testing
There are other methods for assessing text for suitability for readers. You can devise a document audit instrument which takes into account other characteristics that formulas cannot predict. Prepare a questionnaire to review with the document to seek out features known to make reading easier.
Or use experts. In education it is common to use teachers and librarians to review material and assign an appropriate grade level for the use of the text. In other fields, find experts who will know the needs and characteristics of your audience and get their expert opinions.
Or use "protocol-aided revisions" as a method. These are "try-outs" on individuals or small groups who match your audience's key characteristics. Formal testing with focus-groups is often beyond the budget and capabilities of those preparing materials. But informal, or casual, testing of materials with readers is very effective even on a small scale.
Readability formula are not guides to writing well. The notion of "writing to formula" has been condemned by formula designers from the beginning. They call it "cheating" and compare it to holding a match under a thermometer to warm a room. Klare has said that formulas can play a useful screening role in the prediction of readability, where only index variables in language are needed. But formulas cannot be used in the production of readable writing, because index variables are insufficient for the purpose. For producing readable writing more variables must be considered in both the text and the reader. (Klare, A Second Look at the validity of Readability Formulas Journal of Reading Behaviour, 1976, 8 129-152, and present reference)
Readability: It's Past, Present, & Future Beverly L. Zakaluk and S. Jay Samuels, editors, published by the International Reading Association, Newark, Delaware, 1988
Small Claims Court Materials: Can They Be Read? Can They Be Understood? by Richard Darville and Marilyn Hiebert Canadian Law Information Council, CLIC Papers on PLEI, no. 7, 1985
Â© 2000 Cheryl Stephens. All rights reserved.