Now that our Zooniverse project has been running for a few weeks, we thought it would be useful to take stock and report back to you all on the excellent work you have been doing, both as a way to thank you for everything so far and to illustrate how useful your transcriptions have been.
Between the launch on the 15th October and midday on the 7th November, you have checked a very impressive 103,721 lines of text and transcribed a further 19,104 lines from scratch. This translates into over 30,000 lines of text that have been checked three times, or transcribed five times. This represents a fantastic contribution to our project – well done on deciphering so many fuzzy images, coping with odd spellings, and interpreting difficult handwriting.
This contribution has allowed us to correct a substantial amount of text produced by the handwritten text recognition (HTR) model as we can see from Figure 1. To create this figure, we have taken every line from the HTR model, compared it to each of your checks of that line and then counted the number of different words. The figure shows the number of cases where a given number of words were different. We can see that in 18,000 cases there were no changes made to the HTR output, but in the vast majority of cases you have supplied us with at least one corrected word. In 28 per cent of checked transcriptions you changed one word, 31 per cent of the time you changed two and in 18 per cent of cases you changed 3. There are rather fewer cases where more than three words were changed; in only five per cent of checks did you supply four or more corrected words (the numbers for 6-10 changes are too small to appear on this graph). This is reassuring as it implies that the HTR model is not doing a completely awful job, but that you are also supplying meaningful changes that we can use to improve the model substantially.
Figure 1, Number of lines checked by the number of corrected words.
It is important to remember that every corrected word that you supply makes an important contribution to the overall project. We can see this by considering the following example of a set of lines from the will of William Badland from Kington in Herefordshire, who died in 1786. Figure 2 shows a short extract from the start of his will.
Figure 2, Extract from the will of Williamn Badland, 1786.
The HTR output for the first four lines of his will was as follows:
wl wiill waepetatls wol god wmeell
snbm haallad of kington in the county of ereford
alinghty sood i give and beque aith to myloving
waife an the ffrlmtins in the roos ooed the suitther
Each of these lines have now been transcribed three times, and we have used words from each of those checked transcriptions to correct the decidedly inaccurate HTR transcription as follows, with each changed word indicated by italics.
in the name of *god amen
Wm Badland of kington in the county of Hereford
almighty God I give and bequeath to my loving
wife all the ffurniture in the room and the kitchen
This was a particularly poor HTR transcription, and it serves to highlight some of the common difficulties the model faces. For example, it struggles with the large, ornate lettering at the start of the will, and it has problems with the noise on this image, particularly in the final line. However, there remains a degree of uncertainty as to why the model performs well in some areas and poorly in others. This makes your input all the more important, with every correction helping us improve the final version. So even if there is a word or letter you are unsure about, know that any corrections you can make will be of use!
We have recently trained a new HTR model using the corrections you have all supplied to 10,000 lines from the 1780s and 1720s wills. Your corrections have substantially improved the model; whereas the initial model we used got between 11 and 16 per cent of characters wrong, this new, volunteer-enhanced, model gets just 6 per cent of characters wrong - a huge improvement, and this is only with 10,000 corrected lines. We are at present training a model using all 30,000 currently corrected lines, which should improve the model even more. However, these models are only performing well on the eighteenth-century wills, so we need plenty more seventeenth-century corrections and, eventually, sixteenth-century ones as well. With those we should be able to produce a set of models that will accurately transcribe wills from throughout these three centuries.
We have not yet started to analyse the data we can extract from the finished transcriptions, but we can report on a wide array of interesting objects and topics that have been raised on our Talk Boards by volunteers so far. We have seen secret wills (Francis Alsieu), wills written as letters (Edward Daniel), extravagant funeral requests (Susanna Lee) and evidence of family feuds (George Mackenzie). Hopefully in the near future we can report back with some numbers and maps of these bequests, and start to show you some preliminary answers to the questions we are asking about attitudes towards goods and how those changed between the sixteenth and eighteenth centuries.