words wash your mouth every time you say "buddha"

 

« | main | »

#access2009pei – Dan Chudnov – Chudnovian Stuff

description here

Repository Development Group at LC: 30 people, various roles (including dedicated project managers), various backgrounds. LC21 Report guiding LC srtategies, from this report the Office of Strategic Initiative came to be.
– capturing digital artefacts
– make them available for copyright registration/deposit
– pass along for inclusion in the collection
– subsequently processed for cataloguing, indexing etc.

Scale is global: LC universal collection imperative. Capture world scale, distribute web scale. E.g. of wdl.org – global partners, content from all over the world, users as well. Launched April 2009, big press release resulting in 9K requests/second on day 1. Entirely relying on open source software. Clean URIs, static pages: global edge caching with “very well known” caching service.

Another e.g.: Chronicling America; digitization of local/regional newspapers. Approx. 140K US newspaper titles’ bib records, 1.4M pages of content. All freely available now. Scale already over 100Tb from only 16 of 50+ states/territories from about 1850 to 1922. Similar software stack and design decisions to wdl.org

Using the word “movage” more and more: preservation and storage, on a practical day to day, is actually moving bits around. Capture artefacts using BagIt: think of it as a packing slip for data. Tells you what data should be inside, can then check to make sure it’s really there. Sender tells you what is being sent, receiver checks to make sure it really was. Oddly, this hasn’t really been solved previously. Works across space, systems, organizations, time. Also easy to make: tools: md5deep, (bagit library?), bagger; free, OS releases from LC: sf.net/projects/loc-xferutils/. OS release was very new for LC, lawyers got involved, but it got done!

Challenge of managing communication among people: for every bit that moves around, there are human communications that have to take place. Need to improve transfer, inventory, workflows.

Chudnov really cares about incorporating digital objects in the collection. Traditionally using catalogue records, exhibit sites. Cost of integrating everything in this way is high. Hard, expensive, need skilled people with time. Cost of updating everything is even higher. Good news: cost of consistent web strategies (increasingly adopted) is low. E.g. of linked data. linked data design issues. In LC’s case, LC authorities on the web is a newly available example. Machine readable view is acceptable for people and bots, and the end point includes a clean, concise definition of what it means (mainly for humans, but bots can work with it too).

Visit a URI, get something that defines a concept with a precise meaning. This is a standard way to refer to a catalog heading. Never had that before. A healthy web of data. Available now, and can download and mashup. Also new for LC to give headings away like this.

OAI-ORE aggregations to describe data. Look at the web, see a thing; OAI-ORE defines the constellation of things that make up that page. Each concept defined explicitly in the RDF. Interesting thing about all this work: the web itself is the API. Repeat that! No secret key, no custom interface.

LC mission says make things available and useful. Idea for how to incorporate digitally into the collection–”sustain and preserve a universal collection”–if we’re consistent about what we mean when we publish something, giving people links to follow, and everyone is consistent in the same way, end up with distributed conceptual integration. The web is a universal collection. Let’s all incorporate all our artefacts into the universal web!

Posted by pzed on October 1, 2009 at 11.44am
Categories: access 2009, conferences, libraries, twitter

Post a comment