Readability Assessment of Health Information on the Internet
Purpose
The Internet has the potential to reduce barriers in access to information for patients, but only if the online material can be read and understood by many types of individuals. Most studies estimate that more than half of US residents read at the 9th grade level or lower (Davis, 2000; Root and Stableford, 1999). Health-related information has been shown to be more difficult to comprehend than most other types of information (Root, et al., 1999).
The reading ability of patients varies widely and is generally lower than the level of school they have completed. One study of English-speaking diabetic patients found that while 60% could understand information written at the 6th grade level, only 21% could understand information written at the 9th grade level (Overland, et al., 1993). Other studies have found a median reading level of 9th to 10th grade in emergency department patients (Williams, et al., 1996) and a median reading level of 7th to 8th grade in cancer patients (Foltz and Sullivan, 1996), patients in urban clinics (Wilson, 1995), and parents of pediatric patients at a University Hospital (Davis, et al., 1994). In one study of hospitalized patients, only 7% could comprehend information written at the 5th grade level, and just 30% could comprehend material written at the 9th grade level (Estey, et al., 1994).
Limited reading skills may be more prevalent among certain patient and population sub-groups. For example, according to the 1993 National Adult Literacy Survey, 75% of welfare recipients (including but not limited to Medicaid beneficiaries) read at or below the eighth grade level and 50% read at or below the fifth grade level (Kirsch, et al., 1993). Immigrants and refugees from less-developed countries may be more even more likely than their U.S.-Born counterparts to have low educational attainment and as a result, limited reading skills. Among recent Central American immigrants and refugees from El Salvador and Guatemala, only slightly more than 20% reported having completed high school (Lopez, 1996). Among foreign-born Hispanics living in greater Los Angeles, 10 percent reported no schooling, 3% reported elementary school attendance, 21% reported at least some high school (but no college) and only 5 percent reported a college degree (Cheng and Yang, 1996).
In this study, we assess the readability of written information from 19 English- and 7 Spanish-language health sites. Specifically, we wanted to know: What grade level reading ability is required to understand health information regarding four common medical conditions on English- and Spanish-language health Web sites?
Methods
Assessing Readability
There are several methods of assessing the readability of a document. The most direct way to measure readability is to administer a comprehension test based on the document of interest to a group of readers of known reading ability. Readability can also be measured by the judgment of a literacy expert. A third approach uses reading formulas, rather than experts or test subjects. Reading formulas are mathematical equations that estimate the reading level of a document based on the words that are used and the lengths of sentences.
The methods using test subjects and experts are more costly and time-consuming, but also more precise than reading-level formula methods (Klare, 1974). The reading-level formulas can be thought of as automatic approximations of the other methods. In this study, time and resource constraints dictated the use of formula-based methods.
Readability Formulas Employed
We conducted a literature search in order to identify reading formulas appropriate for Spanish- and English-language documents. Although we found references to numerous readability formulas, few were appropriate for both English and Spanish documents.
Three readability assessment methods were applied to the text from the Spanish and English Web sites: the Fry Readability Graph (FRG), the SMOG grading formula, and the newer The Lexile Framework®. The first and third methods are applicable to both English and Spanish documents; only the third is currently implemented in software.
The FRG has been validated for Spanish and English-language documents (Gilliam, et al., 1980; Fry, 1969, 1977). The FRG uses three sample passages of text, each exactly 100 words in length, from the beginning, middle, and end of the source document. The grade level is computed as a function of the number of sentences and words contained in the three samples of text. Application of the FRG to Spanish-language documents is similar to its application to English-language documents, with the exception of syllable counting. In Spanish an adjustment compensates for the fact that Spanish text contains more syllables per word than English text of the same reading level. (Gilliam, Peña and Moutain, 1980).
Unlike the FRG, the SMOG grading formula is only applicable to English-language documents.
The SMOG uses three passages of 10 sentences each from the beginning, middle and end of the source. The reading level is a function only of the number of polysyllabic words (words with three or more syllables) in the sampled text, with more polysyllabic words corresponding to higher reading levels.[4] The SMOG grading formula has been used widely and has been adopted by the National Cancer Institute as the preferred method for assessing the readability of patient communications after a comprehensive review of advantages and disadvantages of alternative readability formulas (Romano, 1979).
The Lexile Framework® is a relatively new software program that estimates the readability level of a document based on two factors: average sentence length and word familiarity[5]. Passages consisting of shorter sentences are assumed to be easier to read than passages consisting of longer sentences. Passages consisting of familiar (commonly used) words are assumed to be easier to read than passages consisting of unfamiliar words (Wright and Strener, 1998). Word familiarity is measured by the frequency with which a given word is used in written United States school texts of various grade levels (Carroll, et al., 1971). In this study, the Lexile Framework® software was applied to three 10-sentence sample passages drawn from the beginning, middle and end of the source documents.
Selection of Abstracted Web Site Material
As noted in Chapter 3, some Web sites were searched and abstracted for more than one condition, and all Web sites were abstracted by two different searchers, resulting in multiple abstraction documents for a given site. For each site, a single abstraction document was randomly selected among all available documents for readability analysis (see Table 3.1).
Results
Readability of English-Language Web Sites
For the English-language Web sites, the mean FRG reading grade level was 13.2 (SD=2.1), ranging from 10 to 17 (Table 4.1). The mean SMOG reading grade level for English-language Web sites was 13.6 (SD=0.9), and ranged from 12 to 15. The mean Lexile Framework® reading grade level was 11.7 (SD=1.0), and ranged from 10 to 14. Among English-language Web sites, the correlation was 0.61 (p<0.05) between the FRG and SMOG grading formula, 0.54 (p<0.05) between the FRG and the Lexile Framework®, and 0.32 (p=0.18) between the SMOG grading formula and the Lexile Framework®.
Readability of Spanish-Language Web Sites
For the Spanish-language Web sites, the mean FRG reading grade level was 9.9 (SD=2.5) and ranged from 7 to 13 (Table 4.1). The mean Lexile Framework® reading grade level was 10.0 (SD=2.6) and ranged from 6 to 13. The correlation between the FRG and the Lexile Framework® among Spanish Web sites was 0.49.
The mean reading grade level for the English-language Web sites was higher (more difficult) than for Spanish-language Web sites, as measured by the FRG (p<0.05) (Table 4.1).
Discussion
This analysis shows that much of the health information available on the Internet is beyond the comprehension of many consumers. All of the English Web site documents assessed had material that required at least a 9th grade reading level, and more than half presented material at the college level[6]. Four of seven Spanish-language sites presented at the 9th grade reading level or higher[7]. Studies of (English-speaking) patients in various clinical settings suggest a 9th grade reading level is too high for most patients. The US Department of Health and Human Services recommends that patient education materials not exceed a 6th grade reading level (US Department of Health and Human Services, 1999). The mismatch between the reading ability of patients and the readability of health-related information on the Internet suggests that for it to become a more effective medium for patient education, the readability of the materials on the Internet must be improved.
This is the first study to examine the reading level of Spanish-language health-related information on the Internet. This aspect of the study has special significance because Spanish-speaking patients face greater barriers to traditional sources of health information than English-speaking patients do (Ginzberg, 1991; Mayberry and Mili, Ofili, 2000). Surveys indicate the number of Spanish-speaking persons currently accessing the Internet for health information is increasing. Further efforts to reduce racial/ethnic disparities in access to the Internet (e.g., The Digital Divide) through strategies such as Community Access Centers will probably bring Internet access to greater numbers of Spanish speakers in the near future (US Department of Commerce, 1999).
Limitations of Readability Assessments by Readability Formulas
It is widely acknowledged that reading is an interactive process that occurs between the text and the reader. In fact, research shows that readers use experiences, knowledge, and information processing skills to comprehend text (Johnston, 1983).
Readability formulas, being strictly text-based, do not address the interactive nature of the reading process. Most reading formulas, including those used in this study, employ syntactic and semantic factors and do not directly address factors related to communicating meaning. For instance, readability formulas do not distinguish between written discourse and nonsensical combinations of words (Dreyer, 1984). Moreover, formulas cannot assess other critical factors such as the readers interest, experience, knowledge or motivation, all of which may influence the readers ability to comprehend the cognitive task asked by a survey (Duffy, 1985). Other factors related to readability and not assessed by a readability formula include typographical and temporal factors (e.g., time allotted to complete the reading task), the cultural appropriateness of materials intended racial/ethnic and linguistic minority groups, and factors related to the unique nature of the Internet.
Based on the findings of this report and recent research on the reading ability of patients one thing is clear: There is much work needed to provide English- and Spanish-speaking patients with health-related information on the Internet that is accessible. Currently, the reading level of health-related information provided on the Internet is too high for most English- and Spanish-speaking patients.
[5] The average sentence length and average word frequency are combined to obtain a Lexile scale score using the following formula:
[6]Using the FRG.
[7]Using the FRG.
[4] The reading level is estimated by the formula
, where PSC is the average polysyllable count per ten sentences.
, where
is the log mean sentence length and
is the log mean word frequency. The Lexile score is then translated to a grade level reading difficulty.
Table of Contents
Chapter 3
Chapter 5