British Library begins web harvest

The British Library aims to 'harvest' the entire UK web domain to preserve the digital age for future generations
5 April 2013

The British Library will begin to preserve the digital age for future generations when new regulations come into force on Saturday.

It aims to "harvest" the entire UK web domain to document current events and record the country's burgeoning collection of online cultural and intellectual works.

Billions of web pages, blogs and e-books will now be amassed along with the books, magazines and newspapers which have been stored for several centuries. The library could eventually collect copies of every public tweet or Facebook page in the British web domain.

Lucie Burgess, leading the project at the British Library, said the unprecedented operation would provide a complete snapshot of life in the 21st century which increasingly plays out online.

She said: "If you want a picture of what life is like today in the UK you have to look at the web. We have already lost a lot of material, particularly around events such as the 7/7 London bombings or the 2008 financial crisis. That material has fallen into the digital black hole of the 21st century because we haven't been able to capture it. Most of that material has already been lost or taken down. The social media reaction has gone."

The operation to "capture the digital universe" will begin with an automatic "web harvest" of an initial 4.8 million websites - or one billion web pages - from the UK domain, she said. This will start on Saturday and is expected to take three months. It will then take another two months to process the data.

Until now the British Library could only preserve a relatively small number of websites. The 2003 Legal Deposit Library Act paved the way for the information to be stored but copyright laws forced the library to seek permission each time it wanted to collect web content.

Under the new regulations - which extend to the Bodleian Library in Oxford, Cambridge University Library, the National Library of Scotland, the National Library of Wales and Trinity College Library in Dublin - it has the right to receive a copy of every UK electronic publication.

Roly Keating, chief executive of the British Library, said: "The regulations now coming into force make digital legal deposit a reality, and ensure that the Legal Deposit Libraries themselves are able to evolve - collecting, preserving and providing long-term access to the profusion of cultural and intellectual content appearing online or in other digital formats."

Culture minister Ed Vaizey said: "Legal deposit arrangements remain vitally important. Preserving and maintaining a record of everything that has been published provides a priceless resource for the researchers of today and the future. So it's right that these long-standing arrangements have now been brought up to date for the 21st century, covering the UK's digital publications for the first time."

Create a FREE account to continue reading

eros

Registration is a free and easy way to support our journalism.

Join our community where you can: comment on stories; sign up to newsletters; enter competitions and access content on our app.

Your email address

Must be at least 6 characters, include an upper and lower case character and a number

You must be at least 18 years old to create an account

* Required fields

Already have an account? SIGN IN

By clicking Create Account you confirm that your data has been entered correctly and you have read and agree to our Terms of use , Cookie policy and Privacy policy .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged in