It Takes a Community: The CLOCKSS Initiative

Program Description:

How will you ensure researchers have access to electronic content in the future? What happens when journals get sold or lost in the shuffle of a merger? These questions vex librarians and publishers alike, but CLOCKSS has answers. CLOCKSS (Controlled Lots of Copies Keep Stuff Safe) is a community-wide endeavor built upon the widely-used LOCKSS system. Victoria Reich, Director of the LOCKSS Program at Stanford (CA) University, explains how they are working to guarantee long-term access to digital materials, regardless of ability to pay. The New England Technical Service Librarians section (NETSL) and the North American Serials Interest Group (NASIG) co-sponsor the program.

Monday, 8:30- 10


When libraries first started subscribing to more and more databases with licensed periodical content, I remember a lot of discussion about whether libraries should keep subscribing to the print version of a periodical. What would happen if the vendor stopped licensing the content from a periodical?  Many libraries have dramatically cut back on their serials subscriptions as they rely more heavily on the licensed content from their database vendors. But, if budget cuts make them curtail their database subscriptions or if a vendor severs a relationship with a publisher, that content is lost to the library. With the print subscriptions, that content remained with the library long after the subscription was canceled.

The LOCKSS and CLOCKSS intiatives have separate ways of addressing this issues. LOCKSS (Lots of Copies Keep Stuff Safe) tries to replicate the print serials subscription model by providing a way for libraries to store the content provided by a database vendor on a server, called a LOCKSS box. According to Reich, LOCKSS allows libraries to build local collections. They take local control of content from the Web and download it to a LOCKSS box. It’s preserved and you have 100% perpetual access.

CLOCKSS, on the other hand, is a dark archive of material built on the underlying LOCKSS technology. Nobody can access the content in a CLOCKSS box until it is no longer available through any publisher.

Reich said Stanford University and the other institutions involved are committed to these initiatives because they believe library collections are the key to democracy. Libraries are important to democracies, and collections are critical to libraries. “What keeps the group going at Stanford is the fact that we believe libraries situated in communities that have collections are central to core democracy.”

She started her presentation by talking about CLOCKSS.

CLOCKSS

Built on three principals:

  • CLOCKSS is different because it is owned by the community.
  • It is set up an archive that was dirt cheap to operate.
  • Any access to the content is free and open access.

Three ways it’s cheap:

  • Donated time
  • Using LOCKSS technology built 10 years ago.
  • Putting the content into libraries. We’re not building new infrastructure. Libraries maintain archives. Libraries taking care of CLOCKSS and are taking care of content they do and do not subscribe to.

There are 15 libraries participating in CLOCKSS including NYPL and Stanford. There are none from New England.

NYPL – takes all the content from Elsevier. They can do this because it’s a dark archive. Nobody can access it . It is content that needs to be preserved for future generations.

Question from audience: How is it possible for libraries to archive content to which they don’t subscribe?

The publishers and libraries all serve on the CLOCKSS Board together. This board wrote up the contract for itself. All of the publishers agreed to dump all of the content into the archive.

“The goal is to scatter CLOCKSS boxes all over the world. “ All have different legal and administrative regimes.

CLOCKSS is looking for archive nodes in Canada, more in Europe, Africa, Middle East. Each node is responsible for archiving entire content of all the participating publishers. Every CLOCKSS box is about $7,000 and holds six terabytes of content.

The number 15 is based on how many replicas we want to spread across the world.

When is content available from the archive?

Some of the trigger events for making this material available are:

  • The publisher ceases operations and titles are not available from any sources.
  • The publisher removes back issues and they are unavailable elsewhere.
  • The delivery platform fails over a sustained period time.

Once the materials is available, Creative Commons Licensing is used for the content.

Question from audience: Who determines when this content is available?

Every organization who contributes to CLOCKSS has a role in the process. A vote of the board determines that the content can now be made available through CLOCKSS.

CLOCKSS has had two trigger events so far. The Sage titles Auto/Biography and Graft were removed. They were copied from the CLOCKSS archived and are now being posted at Stanford University and the University of Edinbugh.

CLOCKSS worked really hard to capture as much as possible what the content looked like. It was not difficult. It adds value and does not add much expense.

To date, CLOCKSS has

  • Impacted policy decisions for who owns a DOI (Digital Object Identifier) after a trigger event,
  • Has made it clear what trigger events are,
  • and has determined that Creative Commons Licensing works for this type of content.

Archives are beginning to maintain eBooks. Trigger events are different for eBooks. If a publisher doesn’t carry an eBook anymore, the rights of a book revert back to the author. You need to wait until the book is out of copyright before it can be made available.

Sustainability – it seems unwise to manage an archive on incoming revenue. CLOCKSS is using an endowment to keep it sustainable. The CLOCKSS business model is to try to raise an endowment that pays 80% of the ongoing costs for running an archive. The goal is to raise $10 million within five years.

Why is the endowment important? “You can’t collect digital content and not preserve it.” If you want to keep digital content static, you need to take action on it constantly.

Question from audience: How strong a contract do you have with these publishers that they will always contribute everything?

We don’t. It’s a three-year contract. It’s highly uneven what publishers are contributing to the archives. There is no promise that any publisher will continue. However, they are all committing to the fact that once content is in the archive, it stays in the archive. The CLOCKSS contract with the publishers was written by publishers themselves.

Question from audience: In all these nodes throughout the world, is there a regular refresh cycle?

There is an ongoing process of audit and repair of the bits.

LOCKSS

LOCKSS is 10 years old. It was born out of working at HighWire press. Reich noticed that journals were starting to publish articles online that were not making it into print. Libraries could not add these articles to their collections. LOCKSS ensures continual access to the content even if the publisher site is down. There are 200 participating libraries and 380 participating publishers.

In the print world, copies are distributed to a lot of libraries. Sometimes a copy at the particular library is destroyed, but the content is still available through all of the other libraries that store that content. There are many copies of most things scattered around the world.

In the print world, it’s very hard to destroy or tamper with most material without people noticing. The notion of LOCKSS is to replicate this system. Centralized systems cannot do this.

Ten years ago, the business relationship between libraries and publishers changed. It was an accidental evolution that libraries stopped owning materials and started leasing materials instead. LOCKSS collects and preserves any content delivered on the Web. The library needs permission from the publisher to take local custody of it. And the content must have an authoritative version.

How does it work?

Configuring a LOCKSS Box Video:

The publisher has to give permission to the institutions. For example, all LOCKSS institutions with a subscription to the NE Journal of Medicine crawl the NE Journal of Medicine content and download the content to their LOCKSS boxes. It’s ust like receiving your print serials subscription. You must be a subscriber.

The content is never perfectly ingested the first time. It takes about a week to establish an authoritative version, bit for bit, of what’s available on the publisher’s Web site. All of the LOCKSS boxes are communicating all the time to compare at the bit level that the articles are identical. If a bit is off, an article gets a repair from the other systems. The repair can come from the publisher’s Web site or from the other LOCKSS boxes.

If a library discontinues a subscription, they still have access to the back issues. (Just like in print!)

From the perspective of the reader at library, if they try to access an article and the publisher is available, they get the content from the publisher. If it isn’t available, even for just a minute, they get it from the LOCKSS box.

LOCKSS preserves both the source files and the presentation files. It’s probably important to historians to compare what the content looked like in 1990 and what it will look like in 2060.

There is an initiative working to re-implement the federal depository system using a private LOCKSS network. Government documents are moving to a centralized system under control of the federal government. LOCKSs puts it back into a system that is tamper evident. With LOCKSS, the federal government still own the material, but they have to inform LOCKSS if they change something, just as they had previously told libraries when they changed something in the print government documents collections.

Advertisements
%d bloggers like this: