Archive for the ‘Data’ Category

Final 3 English counties - preparing for loading

Monday, April 6th, 2009

We are quality checking the final three English counties at the moment (Northumberland, Cumberland, Westmorland), as well as the missing Gateshead data from Durham (which was not uploaded with the rest of the county owing to an error in the master data catalogue, which has now been rectified).

If no problems are found, this data should be available on the live site either tomorrow or Wednesday. Gateshead data will be searchable under the county of Durham, as it should be.

UPDATE Tuesday 16:50: we’re redeploying the data now, it should start appearing late this evening / early tomorrow morning if all goes smoothly.

Gateshead - released with Northumberland, rather than Durham

Wednesday, March 18th, 2009

Just to let you know that although Gateshead is in the county of Durham in 1911, for the purposes of the census it was enumerated and collected as part of Northumberland.

Hence records covering this area will be released with Northumberland records.

UPDATE (24/03): we have investigated this anomaly and it seems that the root of the problem is an error in the TNA data catalogue, which has now been addressed and corrected. Gateshead data will now appear when the Northumberland data is loaded in a few weeks’ time but will be searchable under the county of Durham, as it should be. Thanks for your patience!

Missing Yorkshire (west Riding) volumes now available on site

Wednesday, March 18th, 2009

2 volumes of Yorkshire (West Riding) were not put online at the same time as the rest of the county as they were damaged and had to spend some time in Conservation Care at The National Archives before they could be scanned. These 2 volumes related to Knaresborough and Doncaster.

We are happy to tell you that these 2 final volumes are now live and available on the site and that Yorkshire (West Riding) is now complete.

New counties added: Yorkshire North & East ridings, Durham

Tuesday, March 17th, 2009

We have added another 3 counties, the keenly-awaited Yorkshire North & East ridings and Durham. All three counties are searchable as of now.

The next data release will be the final 3 English counties: Cumberland, Northumberland, Westmorland. We estimate these are approximately a month away.

Enjoy - let us know what you find.

Next 3 English counties: preparing for loading

Tuesday, March 3rd, 2009

We are preparing the next counties for loading onto the website during March. Depending on the speed of the data load and any problems found, we anticipate they should be available in 2-3 weeks.

At a minimum we will load the 2 remaining Ridings of Yorkshire and Durham. We may be able to get one or two others in at the same time if all goes well.

Improved data now on site, including some enhancements

Tuesday, March 3rd, 2009

We have loaded a fresh version of the searchable data onto the website, with a number of enhancements:

  • Transcription errors reported up to February have been checked and corrected where necessary
  • Data standardisation has been applied to first names to correct common mistakes such as Geroge for George
  • Data standardisation has been applied to ages to make them standard: for example the various ways that householders may have written “months” has been standardised
These changes should further improve the accuracy of your search - many more are on the way!

Scanning of all English counties complete: 15.1m images

Saturday, February 28th, 2009

Just to let you know that we have now scanned all English counties and have started on Wales. We anticipate that the East and North Ridings of Yorkshire, and Durham will be available by the end of March. Happy St David’s day!

UPDATE: 3/3/09 To the end of February, we have scanned 15.1million images, 93% of total. This leaves 1million (or 7%) to go. We should finish scanning at Kew In April. Compare the number of images to the 1901 census at 1.5 million images - it is over 10 times larger!

More address search tips

Wednesday, February 25th, 2009

Address searching often requires a degree of lateral thinking to get the best results. Here’s a few extra tips and also some new features on the horizon which aim to make your searching easier. The post below is based in replying to questions from a customer searching in Dorking, Surrey but the points apply equally to addresses across the country.

The source of address details on the census is that taken from the original form filled in by the householder (this contrasts with previous censuses, where the forms were compiled by the enumerator, thus introducing some level of standardisation in recording). Unfortunately, several factors conspire to make the historical document problematic for finding addresses using 1911 census returns. 

The first is that in 1911, the concept of a full postal address with a number and street was less evolved than it is today. Many houses simply carried names and householders would then place the town afterwards. To take an example, looking at modern-day Pixham Lane in Dorking, Surrey, the majority of the houses carried names but most householders simply included their postal address as “name of house, Dorking” and this is the information that we transcribe. Unfortunately this was compounded by the small space on the original form left for the address, meaning the householder would often abbreviate the address to make it fit. Have a look at an example of an Original Page to see how small the space was for your ancestors to enter their address. 

The second is that many householders used abbreviations for words (as we do today), such as “Rd” for “Road”. Again using an example of Lincon Road in Dorking (around the corner from Pixham Lane) if you search for “Lincoln” on its own in Dorking, Surrey all 44 properties are returned sequentially, some listed as “Lincoln Road” others as “Lincoln Rd”. Try searching for just the first part of the address and leaving off lanes, Roads, Crescents etc, but narrow the search area by county and district first.

We will be applying many data enhancements and standardisation processes over the coming months to compensate for these common inconsistencies in the originals and to make the data more easily searchable. However, the transcriptions are in this case accurate based on the original documents. To get the best out of any historical document, a degree of lateral thinking often has to be applied. 

Thirdly, place names and spellings change: in the case above, Pixham had an alternative spelling of “Pixholme” and 35 properties are found in Dorking under this listing. If you can find contemporary maps of the area you are searching, either online or in local libraries and archives, these can prove useful as the name today may be utterly different.

 Finally, with 8 million different sets of handwriting, deciphering becomes extremely difficult and what may appear to be transcription errors (and in some cases are) occur. Thus we found one property transcribed as “Pischolme”. However, when examining the householder’s writing, the awful way he had formed the X would lead any person to transcribe it this way.

 We are working on a number of ways to make searching by address simpler in face of the difficulties posed by the original records, but the unique nature of the 1911 census means these methods have had to be worked out afresh for this census, and the census is very much work in progress, although to date hundreds of thousands of researchers have successfully used the service to identify the records they want to view.

 As well as applying many enhancements to the data to attempt to smooth over the inconsistencies of our ancestors, we will also release the RG78 Enumerators Summary Books soon (current estimate is April), which list the households and heads in each area: this information is invaluable for identifying neighbouring houses when the address information left by our ancestors makes this hard to recover. If you have already paid to view a household image, you will be able to view the linked Enumerators images for free, by returning to your saved records. You will not be required to make further payment to view these.

We will also be adding a wildcard search to the street field to allow you to search laterally and many more data standardisations will be applied over the coming months.

Order of scanning of remaining counties

Friday, February 6th, 2009

We thought we’d give you an update on the remaining counties and approximately when they will appear on the site. Please note that the dates are highly approximate, but we prefer to give you a rough guide rather than a concrete date that might not be accurate. 

The list below is the order of scanning the documents: this might not necessarily be the order the counties actually appear on the site but it should be reasonably close.

The remaining English counties should all be available on the site within two months - the best indication we can give for Wales, Islands and Military at present is summer 2009 but we hope this is more useful than saying nothing! As soon as we can give a closer indication on specific counties and a better steer on Wales, we will.

Order of scanning:

  • Yorkshire, East Riding (with York)
  • Yorkshire, North Riding
  • Durham
  • Northumberland
  • Cumberland
  • Westmorland
  • Monmouthshire
  • Glamorganshire
  • Carmarthenshire
  • Pembrokeshire
  • Cardiganshire
  • Brecknockshire
  • Radnorshire
  • Montgomeryshire
  • Flintshire
  • Denbighshire
  • Merionethshire
  • Carnarvonshire
  • Anglesey
  • Isle of Man
  • Guernsey
  • Jersey
  • Alderney
  • Sark
  • Royal Navy
  • Military

Fields transcibed from the original page

Wednesday, February 4th, 2009

A number of people have asked why there is slightly more information on the original page than there is on the transcript. 

When we transcribe the census, we transcribe everything on the original form except for the number of people in the house. The reason we do not transcribe the number of people in the house is that we do not believe that it is a particularly useful piece of data to include in the search engine (very few people would know this information, although arguably it could be useful for sociologists analysing the data in bulk). The reason for creating the transcriptions is simply to allow us to build a search engine which can analyse the most useful information provided in the original pages and provide results based on this to guide you to the original pages.

So the only other information that is included on the original page but not on the transcription is the number of living children born to the marriage, number dead and number of rooms in the house.

Again the reason we do not include this on the transcript is because we do not believe that this information is particularly useful as a search field and it is therefore excluded from the search options as well. All other fields are included on the transcript as they are all available as options in the advanced search.

The concept and purpose of the transcripts on the 1911census site (and indeed all findmypast.com historical records) is to act simply as a finding aid for the original page.

We always recommend that family historians (as all good historians should) rely on the original record wherever possible as the single definitive source of truth, and also the source of those extra details - not necessarily useful to search for as unlikely to be known in advance with anything approaching certainty, but potentially valuable for further research.