A Website for Young Writers

20

description

I have my doubts that this website has as much value as some might want. It does nothing, for example, to improve students' editing and revision or to get them to understand the difference. Still...

Transcript of A Website for Young Writers

A Website for Young Writers: http://www.analyzemywriting.com/ Keep in mind that the reliability and validity of readability scores (see below) are matters for debate. Moreover, the information provided does nothing to further the causes of editing, revision, or teaching students to understand the difference. Analyzed text comes (admittedly, without permission) from " Final draft of Indiana academic standards replacing Common Core released, Eric Weddle and Stephanie Wang, [email protected] 8:37 a.m. EDT April 17, 2014." Bear in mind, when looking at the readability scores below, that, generally, journalists direct their writing towards eighth or ninth grade readers. That they might do well to lower their sights is beyond the scope of this article (pun intended!). What is a Readability Score? A readability score (or index) is a measure, or rather, an estimation of how difficult a text is to read. The scores (indices) included on this website are ones which can more or less be implemented with a computer. We will discuss the limitations of these scores and how we can interpret them.What Do The Numbers Mean? Each score gives an estimated grade level (United States) required to be able to read and comprehend a text without difficulty. For those of you outside the United States, this grade level can be considered the number of years of formal education (conducted in English) needed in order to read and understand a text. Thus, the lower the number, the easier it is to read the text, and conversely, the higher the number, the more difficult the text is to read.As 12th grade marks the conclusion of compulsory education in the U.S., any score greater than 12 can be considered a thoroughly adult level. Technical and professional literature found in scholarly journals generally scores above 12. Accordingly, texts which are meant to be accessible to a wide adult audience should fall somewhere between 8 and 12. As many of you will see by using this site, many classic 20th century novels by authors such as Hemingway, Steinbeck, and others score around 8th or 9th grade level. Advertising copy is written anywhere from a 5th to 8th grade level depending on the market.We underscore that readability is a lower bound on the level required to read the text. Thus, if a text is rated at an 8th grade level, it does not mean that the text is necessarily for 8th graders, it simply means that approximately 8th grade is a minimal level of education required to easily comprehend the text.Finally, the astute reader will notice that some scores can be negative. We may interpret this to mean that the text being considered is either written at a preschool level (some preschoolers can read before starting public school) or is simply written at the most basic level.We summarize the above with a table:Readability ScoreLevelExamples 3 and belowEmergent and Early ReadersPicture and early reader books. 3-5Children'sChapter Books. 5-8Young AdultAdvertising copy, Young Adult literature, some news articles. 8-12General AdultNovels, news articles, blog posts, political speeches. 12-16UndergraduateCollege textbooks. 16 and aboveGraduate, Post-Graduate, ProfessionalScholarly Journal and Technical articles. Assumptions and Limitations First of all, the readability scores this website uses were developed with English in mind. So those of you eager to see how readable your non-English text is are warned that this is not for you (although it might be fun to play and experiment).Secondly, these indices assume that a text is written in grammatically correct, properly punctuated English. If this requirement is not met and you feed a nonsense text into the program, the famous computing adage applies: "garbage in, garbage out." To put it plainly, if your text is a random collection of symbols, gibberish, or otherwise, the readability results will not be valid. Additionally, it has been noted that intentionally "writing to the formula" (artificially changing word and sentence lengths or other parameters readability formulas use) to make a text "simpler" can in practice have the opposite effect. (So if you're writing copy, be sure that it looks and sounds good first before feeding it into a readability analyzer.)It is also necessary to point out that some readability scores require a fairly substantial sample of text in order to be considered a valid measure of difficulty. In particular, the SMOG index requires at least 30 sentences to be considered valid, that is, it actually measures what it claims to measure.Finally, we must disclose that in some cases, particularly the Gunning fog, Flesch-Kincaid, and SMOG indices, we can only estimate the value of the index (this is true of most online readability analyzers). The reason for this is that counting syllables from how an English word is written is not easy to do perfectly with a computer. Therefore, there may be some discrepancy between the value given here and the true value of the index as it is defined on paper. In any case, however, our implementation gives a good estimation of syllable count and, consequently, a value that is quite close to the true value of the index, and hence, a good estimation of the level difficulty of a text.An Example The best way to illustrate how to interpret the readability scores is with an example which compares texts. We shall compare "Green Eggs and Ham" with Federalist Paper No. 10. But before we do, let's ask the question: which of these well known writings would you estimate is more difficult? Good. Now let's look at the results:The scores for "Green Eggs and Ham":IndexValue Gunning fog 3.22Flesch-Kincaid -1.05SMOG 5.28Coleman-Liau -0.44Automated -2.06The scores for Federalist Paper No. 10:IndexValue Gunning fog 20.95 Flesch-Kincaid 16.96 SMOG 17.31 Coleman-Liau 13.05 Automated 19.64 We see that the scores for the first text are distinctly lower than the second. Is this generally what you expected? We believe, the answer is yes.Now, one might quibble over the fact that there's quite a range of values for a text like "Green Eggs an Ham" and conclude that these numbers don't measure anything. Perhaps. But let's look at it this way: it's impossible to pinpoint with military accuracy how difficult a text will be to read. Indeed, if we look at the overall results, for the most part we are offered a range of grade levels. In the case of "Green Eggs and Ham" all the scores point to a grade level of anywhere between preschool to 5th grade. Interpreting these numbers as a kind of spectrum, it certainly seems reasonable: some of us have known some precocious preschoolers who could read more difficult texts than Dr. Seuss, and we all know that there are 5th graders (and sadly even some adults) for whom "Green Eggs and Ham" would be perfect fit for their reading level.On the other end of the spectrum we have Federalist Paper No. 10. Again, we are given a range of values which indicate anything from early undergraduate to graduate level. The same idea above applies: a well prepared college freshman might have no trouble reading and understanding this document, yet there are scholarly journals devoted to the sole aim of interpreting the Federalist Papers.Other Indices Used On This Site This website also includes the Fry and Raygor readability indices.Conclusions The readability scores (indices) used by this website essentially offer a range of "honest opinions" which enable you, the discerning reader, to draw your own conclusions about a text. Although the numbers aren't a perfect description of the difficulty of a text, they do at least give us an objective sense of a text's difficulty.We shall leave the reader with some quantitative wisdom:Some people hate the very name of statistics, but I find them full of beauty and interest. Whenever they are not brutalised, but delicately handled by the higher methods, and are warily interpreted, their power of dealing with complicated phenomena is extraordinary.-Sir Francis GaltonLinks Wikipedia Article on Readability surveying various indices.Wikipedia Article on each seperate index: Gunning fogFlesch-KincaidSMOGColeman-LiauAutomatedFryRaygorArticle on this Website: The Fry Readability FormulaArticle on this Website: The Raygor Readability Estimate Lexical Density Lexical density is defined as the number of lexical (or content) words divided by the total number of words [1],[2].Lexical words are the kind which give more information or meaning. More precisely, lexical words are simply (omitting some minor exceptions and technicalities) nouns, adjectives,verbs, and adverbs. Nouns tell us the subject, adjectives tell us more about the subject, verbs tell us what they do, and adverbs tell us how they do it.Other kinds of words such as articles (a, the), prepositions (on, at, in), conjunctions (and, or, but), and so forth are more grammatical in nature and, by themselves, give little or no information about what a text is about. All the different kinds of words such as adjectives, prepositions, nouns, and so forth are called parts of speech. With the above in mind, lexical density is simply the percentage of words in written or spoken language which give us information about what is being communicated.A Simple Example We shall first determine the lexical density of an ideal example. Consider the following sentence:The quick brown fox jumped swiftly over the lazy dog. When this website calculates lexical density, it identifies each word as either a lexical word or not: The quick brown fox jumped swiftly over the lazy dog.The lexical words (nouns, adjectives, verbs, and adverbs) are colored green. There are precisely 7 lexical words out of 10 total words. The lexical density of the above passage is therefore 70%. A More Complicated Example As lexical words give meaning to the language being used, reading only the lexical words in a text can give us a "gist" of what the text is about. Let us consider another example. We shall give only the lexical words from a passage and we ask the reader to try to guess the what the text is about from only these words: moment growing economy shrinking deficits bustling industry booming energy production have risen recession freer write own future other nation Earth It's now choose want be next years decades comeUsing the button below, you may read the entire passage.How does the "gist" of the passage compare with the full meaning of the text? Certainly, they do not give every detail, but the lexical words by themselves help us to identify the general idea, while the role of the grammatical, non-lexical words is to help us piece them together to form the whole.As for calculating the lexical density, the above passage contains 29 lexical words out of 53 total words which gives a lexical density of 29/53, or, stated as a percentage, 54.72%.Some Notes on Interpreting Lexical Density Lexical density has generally been observed to be higher in written language than in spoken language [1],[2],[4]. This is not surprising as written text is generally more expository in nature and will naturally contain more information-bearing, lexical words, thereby increasing lexical density. Moreover, spoken language relies upon other non-verbal cues and can be highly context-dependent which reduces the number of lexical words required to communicate an idea.We emphasize that, in the case of written texts, lexical density is not a measure of the complexity or readability of a text, but rather the amount of information the text tries to convey.Naturally then, expository texts tend to have higher lexical densities usually in the range of 60% to 70%. For example, the Wikipedia article (excluding tables, captions, citation marks, and peripheral text) on inflation hedges has a 66.89% lexical density. Thus, journal, technical, and informative articles as well as instruction manuals tend to have higher lexical densities than other kinds of literature.For completeness we note that the highest lexical density can be achieved by lists of items. For example, the shopping list:bananas, coffee, floss, toothpaste, milk has a lexical density of 100% as every item is a lexical word.On the other end of the spectrum, low lexical densities are observed, for instance, when the language relies on vague pronouns. For example, the textShe told him that she loved him. has a lexical density of 2/7, or, 28.57%. It is worth noting the vagueness of this passage the multiple ways it can be interpreted. This passage contains very little information and lexical density is a good indicator of that.In summary, lexical density is a way to measure how informative a text is. Limitations and Assumptions We must first note that our calculation of lexical density assumes that a text is written in English. Secondly, the astute reader will be quick to point out that not all of the words in the sample passage of the previous example are correctly identified as lexical. We intentionally chose a passage that would illustrate this point. In particular, the word It's was improperly identified as being a lexical word. This is due to the fact that we use a computer algorithm to make the distinction between different parts of speech. And so far, no computer algorithm can do this task perfectly. Thus, any online application which computes lexical density can only offer a close approximation. The true lexical density of the passage is 28/53, or 52.83%.We also point out that how to classify certain words can be a point of debate [3]. For example, an aeroplane takes off. Do we classify "take off" as a verb and a separate preposition, or as a single phrasal verb? What is more, do we count the word It's as two words it is (which would yield a lexical density of 28/54) or a single word? The software used by this site treats contractions as single words. The above illustrates some of the ambiguities which can arise and the resulting assumptions which must be made in order to make a calculation.We must also mention that even if a perfectly consistent and objective classification scheme were available, that humans make mistakes too. Thus, any figure obtained can only be an estimate. Despite such ambiguities, the reader will see, considering the previous example, that computers can do a decent job for the most part. This is especially true if we want to estimate the lexical density of a large passage.Conclusions We have made our best attempt to explain the concept of lexical density and how this website calculates lexical density. Moreover, we have tried to illustrate how lexical density can be interpreted as "how informative a text is" by using examples having high and low lexical densities. Finally, we have also attempted to make the reader aware of the limitations and assumptions that must be made in order to estimate lexical density.If questions remain, we encourage the reader to delve deeper into the topic themselves by consulting the references and links at the bottom of the page. There, the reader will find an abundance of other links and references from which they can begin their own investigation of the topic if they are so inclined. And, of course, if the reader has any suggestions for how to improve this article please do so here. Links and References [1] Halliday, M. A. K. (1985). Spoken and written language. 1st ed. [Waurn Ponds], Vic: Deakin University. [2] Johansson, V. (2008). Lexical diversity and lexical density in speech and writing: a developmental perspective, Working Papers 53, 61-79. [3] To, Vinh, Lexical Density and Readability: A Case Study of English Textbooks, presentation given at the University of Tasmania. [4] Ure, J. (1971). Lexical density and register differentiation. In G. Perren and J.L.M. Trim (eds), Applications of Linguistics, London: Cambridge University Press. 443-452.