Transcription process and accuracy levels

There have been a few questions on transcription accuracy and our policy towards certain aspects of transcribing the records. We hope this post clears up a few questions!

The transcription accuracy of the 1911census.co.uk website at launch is in excess of 98.5% according to recent tests - this threshold is set as a requirement by the National Archives.

Transcribing the census is a massive exercise - every single digitised document has to be read and transcribed and this process results in over 7 billion keystrokes over the course of the project. Naturally in this volume of keystrokes, more than a few errors will be made.

However, during the transcription process, we do apply a number of processes (developed during our many years’ experience of digitising censuses and other historical documents) to correct the most obvious errors and keep inaccuracy to a minimum.

The 1911census in particular poses specific problems - because the household summaries are the core documents rather than enumerators’ books, the variety of the handwriting itself is significantly wider - in fact there are 8 million different hands writing returns, making interpretation of the handwriting a much more challenging task!

Now some good news - the 98.5% accuracy at launch will improve over time.

The first way that it will be improved is by users of 1911census.co.uk reporting errors to us. Each report is reviewed by hand by the transcription team and if the change is approved, the change is incorporated into the search results, usually within a month (when the next data upload is made to the website).

Our policy is to accept changes only if they match what is on the original page (i.e the household form). So if your ancestor made spelling mistakes on the original page, they will be carried through into the transcript. This is actually more common than you might think, so please be sure to check the original page before you assume that there is an error, rather than an accurate transcription of the original document.

The second way that we improve the quality of the transcription over time is by applying ‘data standardisation’ processes. This is basically a set of rules we develop over time as we identify errors and apply to the data. A basic standardisation that we apply for example is converting “Geo” to “George” and listing records from Kent, Surrey and Middlesex as “London” if they fall within the metropolitan London area. We are developing and applying more data standardisations over time to eliminate more of the current transcription errors and to make searching easier, but some of these processes are much easier to apply once the data is complete.

All of our transcriptions undergo thorough batch sampling, by the transcription house, by The National Archives and by our in-house Quality Control team. Any batch failing to meet the required level of accuracy is rejected and rekeyed.

One way of reducing transcription errors is by ‘double-keying’ every entry - this basically means getting the transcriptions done twice (by different people) and then comparing the two versions and eliminating differences by hand. However, the cost of doing this naturally doubles the transcription cost, would not improve the accuracy rate by a hugely significant degree (you can never reach 100%), and the costs would have had to have been passed on to the public – resulting in higher prices for the census service.

We could also have taken the route of transcribing fewer fields – just a name index, like the old pre-digital booklets – but feel that this would have resulted in fewer people being able to find their ancestors as it would narrow the number of fields you can search on. It would also have made the transcription much less useful for academic study, which is one of the uses to which 1911 census will be put when it is completed.

It is important to remember that the transcription is designed as a finding aid for the original documents, which should be viewed as the “source of truth”; happily most users are able to find their ancestors despite the inevitable errors that creep in.

We have also provided very flexible search options (using wildcards, for example), which, with some lateral thinking, can also help you track down those who do not appear on the first search. The search options had to be constrained at launch to allow for the volumes of people searching, but we have been unlocking these features as the week has worn on, and there is more to come (see other blog posts).

Tags: , , , , , , ,

36 Responses to “Transcription process and accuracy levels”

  1. John Avery Says:

    One of my ancesters is categorised both as being the son of the head of the household and Female (unlikely with a name like Reginald). Could you run a bulk test to sort out that wives and daughter are female and sons are male?

  2. iantester Says:

    Hello John,

    This is precisely the type of standardisation we will run across the whole database (and, in fact, a specific cleanup already planned). So Reginald’s ambivalence will be removed and his dignity restored.

  3. AC Says:

    Will you also be ‘cleaning up’ the errors that are keying errors like Geroge and Jospeh for George and Joseph, and so on?

  4. iantester Says:

    Yes, these are on the hitlist once the dataset is complete. Geroge is one of the old chestnuts.

  5. Marie Says:

    “Our policy is to accept changes only if they match what is on the original page (i.e the household form). So if your ancestor made spelling mistakes on the original page, they will be carried through into the transcript.”

    I still don’t understand why other corrections won’t be included in the index as well. I don’t mean that the transcript should be altered, but they could be flagged as “alternative name/spelling”, or something similar.

    The index is supposed to help the user find an entry.

  6. Susan Says:

    You mention “listing records from Kent, Surrey and Middlesex as “London” if they fall within the metropolitan London area.” Shouldn’t this also apply to parts of Essex? EG: West Ham is only listed under Essex and not under London, although it is part of the metropolitan London area.

  7. Jean Says:

    I have found serious discrepancy between the names on the index and the names on the original schedule (which cost 30 credits). I do not want to pay a further 10 credits for the transcription (where I can report the errors). Can I report the discrepancy in any other way.

  8. Martin King Says:

    While doing searches I have noticed obvious transcription errors in the names of people who I am not interested in. It seems to me that the only way to report these errors is by paying 10 credits per person, which obviously I am not inclined to do.
    Will a solution to this problem be considered?

  9. Andy W Says:

    1st, thanks for the speedy solution promised for the correction of my GG Grandfather’s christian name on search.

    I have though to agree with other bloggers, (not just on this thread), when they complain of the poor transcriptions they are coming up against.

    Your reply about how accurate the transcription rate is, sadly doesnt wash, as I alone have reported 6 mistakes so far, and clearly when the original is seen, the mistake is down to simple speed of transcription, (WAN instead of WARR, and EPRAIM instead of EPHRAIM), and a poor grasp of comparative hand writing, not the poor spelling on the original form.

    I have experience of transcribing census returns for various OPC sites, but unlike the transcribers who have done this census, I had the luxury of TIME, (you have not said, but were the transcribers paid per sheet done, which increases the pressure on them for speed?), without which, my results would be probably far worse than on this site.

    So well done for what you are doing right, but stop blaming our ancestors, PLEASE!!!!.

  10. Keith Flinders Says:

    If I purchase an image based on your index and your index is in error, will I get a refund ? I have a glaring example where a person is indexed with a different surname to that shown on the original 1911 form. The 1911 writing leaves no question as to the actual surname.

  11. Keith Flinders Says:

    I am the author of a One Name Study and including the usual mistranscriptions caused by source documentation I find that 42 people I hope to see on the 1911 index are not identifiable.

    Of these 20 are likely to be in English counties not yet indexed, and Wales. Of the others many are males aged 18 - 30 who well may have been in the armed forces or merchant navy, also not indexed yet. Most who emigrated I know about and have factored in.

    An excellent result leaving me with only about 4% of the people to locate after indexing is completed. This will include some who may have gone under a different surname at census time.

    The 1911 index is substantially more complete than 1901 is in respect of my interests, and enhancements announced will only make it a more usefull tool.

  12. bren. Says:

    I paid £3 to view an original document, only to open a blank with “error” printed on it. May I have a refund please ?

  13. iantester Says:

    @bren:

    Yes, please contact Customer Support and let them know which image is malfunctioning. Even better, they might be able to get the original form for you.

  14. Noel Says:

    I paid 10 credits for the transcription for a vessel (the Lady Gwynfred). All I got for that was the name of the vessel, and the names and ages of the three men on board.

    As for the “place” of the census, I suppose that the parish description of “Faversham Within” means that the Lady Gwynfred was in the port of Faversham on the night of the census, or that the census form was collected there.

    No birthplaces of the crew are on the transcription, nor their status on board (Master, Mate, etc.)

    I’m reluctant to pay another 30 credits to see exactly what additional information might be on the original schedule.

    At the moment, I consider that to pay 10 credits for a “shipping transcription” is pretty steep for what you get.

  15. David Blake Says:

    One household’s whose transcription I looked at had two people with the same name and age. I didn’t know about a second person of that age in that family, so paid to look at the original page. But only one person of that name and age was on the original page, so in effect I was robbed of 30 credits!! I submitted a complaint but have not yet heard anything. I hope I will - the costs are high enough for a perfect product, let alone an inferior one.

  16. David Blake Says:

    Another standardisation which would be useful is to amend county names. A search for the surname, with variants, of BETTESWORTH for HANTS gives 102. For the same name with HAMPSHIRE is 32. It would be better if one search could find both.

  17. David Says:

    Thanks for what is in general an excellent site! However, I’ve encountered an odd feature when searching under a named “Other member of household”. In my case this “Other member” was “Henry George Matthews”, whom I’d thought was single: however, the search came up with a wife and 3 children (I checked their “Relationship to head” in each case). But when I called up the original page, HGM was the *only* person on it - the search had given 4 false matches, perhaps the wife and children of the separate “Henry Matthews” in the same registration district. It was pretty annoying to waste 30 credits on this!

  18. Arnold Nicholson Says:

    I had problems locating my Grandmother who was called Zillis I managed to eventually locate her through Husband and Household. she had been entered has Eillis Quarmby.
    I have found the 1911 census to be extremely helpful and accurate

  19. Jack Says:

    A problem that needs addressing is County and Registration District. Many Registration Districts cover several Counties. Searching for people who live in Ilkeston, Derbyshire, in the Registration District of Basford, cannot be found living in Derbyshire since you have decided that Basford only covers Nottinghamshire

  20. Jack Says:

    A problem that needs addressing is County and Registration District. Many registration districts cover several Counties. A search for people living in the Basford District of Derbyshire, Ilkeston for example, cannot be found living in Derbyshire since you have decided that Basford Registration District only covers Nottinghamshire

  21. Robin Says:

    Hi
    In terms of transcription errors, is there a simple way of logging them? I have searched the help pages and blog but could not locate any advice.
    As others have mentioned, simply paying £3.00 per original transcript to correct an error seems an unfair way of managing data. Perhaps if genuine errors are identified, a credit could be offered/given.
    The benefits of the latter approach would be to improve user satisfaction and increase hits.
    Robin

  22. iantester Says:

    @Robin - unfortunately there is no way of logging the transcription error without viewing the original, and this is quite deliberate - until you have viewed the original, you cannot be sure that what you are looking at is a transcription error rather than an error in the original.

  23. Marie Says:

    What you don’t seem to realise is that the index and the transcript should be two separate issues.

    You should be adding all corrections to the index to aid searching. By all means let the original stand in the index, but why not accept corrections from people who know what the correct names/places etc should be? These could still be added to the index to make it easier for users to find what they are looking for.

    Ancestry use this method - all corrections are added to the index, without anything being removed.

  24. Jack Says:

    Do I detect a cooking of the books. In December I reported 6 errors in one household involving 2 different surnames. 3 people named Guy were transcribed as Gulf and 2 people named Henderson were transcribed as Gulf. These transcripts were subsequently amended to the correct names but yesterday I received an email for each error report stating “After careful consideration we have decided that an amendment to the transcription is not required.” Can you explain this in any other way?

  25. iantester Says:

    @Jack: I am guessing that what has happened is that in between your comments being submitted and being checked that we have applied a data standardisation which will have cleaned up the earlier problems. We apply these standardisations gradually to improve the searchability of the site.

    Unfortunately the person reviewing your comments would not be able to see the previous version and will clean it against what they see on screen, thereby thinking you to have requested changes which do not have to be made. Our fault - obviously we need to apply some more joined up thinking in these cases! However, I’m happy that we managed to pick up and fix the errors as well as you.

    So less a cooking of the books, more a case of us finding some oddities and fixing them before we got to your error report!

  26. Marie Says:

    Still no response to my comment, I see.

  27. Noel Says:

    And I’d like to report that the transcript of the crew of the “Lady Gwynfred” (see my post above) that I now access using the facility “My Records” has had the name of the vessel removed!!

    That’s just not fair after I’ve paid the 10 credits.

    Will you please reinstate the name of the vessel at the earliest opportunity?

    Thank you.

  28. iantester Says:

    @Noel: we’re in the course of fixing these records based on your previous comments - please bear with us and soon you will be in ship nirvana.

  29. Don Osborne Says:

    “@Robin - unfortunately there is no way of logging the transcription error without viewing the original, and this is quite deliberate - until you have viewed the original, you cannot be sure that what you are looking at is a transcription error rather than an error in the original.”

    What about common sense? Would a mother describe herself as son? On the same page (fishing folk) someone is described as a notworker when others are correctly shown as networker.

    Also I wish I had seen Noel’s comments before paying for a shipping transcript. I got the version without even the ship’s name.

  30. Maureen Mathie Says:

    Now that I have had time to study my Census finds, I have discovered an anomaly.
    My Grandmother and her Sister ( both married ) and living elsewhere in the same town (Hull) are entered twice in the count.
    Once on their parents census form ( because they have been asked how many children they have ) and secondly with their Husbands and family at different addresses.
    So how accurate is this ?
    I know the reason this has happened in my family ,is because my Great grandparents were from Europe and probably did not understand the question.
    How I found this double entry is the fact that my grandmother is with my grandfather and 3 children at one address ,her sister and husband at another and she and her sister have been entered on their parents (my Great Grandparents) along with 2 other siblings who still lived at home on their census at another address
    So I have 3 census transcripts with different surnames all in the same town.

    I know it can’t be changed, but I would like to bring it to the notice of the 1911 site so that they can make some sort of remark on site so that others can check this, as it being the first time these questions appear on census ,you have to wonder how many people were caught out this way !
    Hope this makes sense.
    Also the fact that my grandparents surname is incorrect as it has been picked up as Brown when it is BROHM and I hope that this part can be changed.
    Maureen

  31. iantester Says:

    @Maureen: interesting that this has happened - as in all censuses, there would always be a few people who by accident or otherwise could have been enumerated in two places, as well as those who managed not to get enumerated at all.

    Brohm - if it is simply incorrectly transcribed, you can report a transcription error. If it is on the original form itself, it has been naturalised either by your relatives or by the person filling out the form, without their permission!

  32. Fiona Gayther Says:

    I curse the National Archives every single day for giving the contract for the 1911 Census to Brightsolid. Ancestry, for all it’s faults, allows you do as many searches as you like and their annual pricing is not too bad, but the 1911 Census and Scotland’s People is a huge money making racket. Reading this blog has just confirmed it.

    Talking of transcription errors. I have a Amy Wallace, 12 year old schoolgirl - relation to Head of Household - Father-in-Law!

  33. Don Osborne Says:

    @Lantester “please bear with us and soon you will be in ship nirvana.”

    How soon is soon? My March transcript still has no ship’s name.

  34. Bill Bartmann Says:

    Excellent site, keep up the good work

  35. Bill Norton Says:

    I have purchased a subscription and while being pleased to have the 1911 census find it hard to believe the accuracy claims made.
    I am happy to report errors but feel that some recognition should be given to people who have reported and had significant numbers of errors accepted, thus improving the reliability of this census.
    Is it true that this census was transcribed by guests of H.M. Government?

  36. iantester Says:

    @Bill - it most definitely was not transcribed by prisoners! I believe that Qinetiq who launched the original 1901 census did use prisoners to perform some transcription work but the results were sub-par.

    We use dedicated transcription houses overseas and employ a variety of methodologies to ensure maximum accuracy. The accuracy figures we quote are part of a contractual requirement from the National Archives and as such are checked extremely rigorously before a piece is passed and allowed onto the site.

Leave a Reply