Wednesday, April 24, 2019

Nine million MNREAD sentences

"My father takes me to school every day in his big green car." At first blush this sentence might not seem to be particularly special. But it is one of only 95 sentences that appear on the English versions of the MNREAD chart. These sentences use a limited vocabulary of high frequency words, they are matched for length (they all have 60 characters) and for physical layout (they are printed onto three lines of left-right justified text with minimal padding of the white space between words.) Writing sentences that meet all these constraints is quite a puzzle.

In addition to having to use exactly 60 characters, we also need to keep track of the physical width of the words to make sure they will fit onto the line — which is tricky because different letters have different widths (for example ‘little’ and ‘common’ both have six letters but common is almost twice as wide). Perhaps it is not surprising that we have had fewer than 100 MNREAD sentences for the first 25 years of the MNREAD chart. 

We needed more. Other researchers have requested a larger number of sentences for their research, and we need more sentences for our own studies too. To address this, we have developed a computer algorithm that generates MNREAD sentences. Full details of the sentence generator are given in our recent publication (Mansfield, Atilgan, Lewis, and Legge, 2019). In short, the generator composes sentences using words from the limited MNREAD vocabulary, these sentences are then filtered to select just the small proportion that fit the MNREAD length and layout constraints. When we first ran the generator it was a thrill to see it output as many sentences in a few minutes that had previously taken years to produce! Now, after generating sentences for the past two summers, we have a yield of over nine million sentences. 

Here are some examples:

 

Reading performance with these new sentences is similar to that obtained with the original MNREAD sentences. We have found no difference between them for reading acuity or for critical print size. However, the sentences are not a perfect match for the originals: maximum reading speeds with the computer generated sentences are slightly slower (Mansfield, Atilgan, Lewis, and Legge, 2019).

It is unlikely that any study will ever need to use all nine million sentences — if we read them at 200 words per minute for 10 hours a day, it would take over two years to read them all! However, having so many sentences allows us to pick and choose them as required by the design of a study. For example, a study might need to use sentences with exactly ten words, or that contain exactly twelve syllables, or that do not contain the letter ‘e’, ... etc. Typically such subsets contain hundreds or thousands of sentences — more than enough for most studies. 

These sentences are freely available from github.com/SteveMansfield/MNREAD-sentences