words all fail the magic prize

 

« | main | »

#access2009pei – Richard Akerman – Will We Command Our Data?

The David Binkley Lecture.

(Akerman writes Science Library Pad.)

Issues around data use and management are not unlike those facing copyright.

How big is data? Although storage capacity is significantly improved, it takes about ten 2m tall racks to contain a petabyte. There is a physical aspect to data, and costs associated with it. At the petabyte scale, data must by close to computation because of bandwidth constraints.

Four sources of data: research data, government data, library data, personal data. Government data is being released a bit more freely, so there’s more of it and we might be in a position to leverage even more into the public realm.

Convergence of factors since 2000: value of sharing, ease of sharing, and level of sharing at the machine level. We see this as good, and it’s increasingly easy to do. Are increasingly able to expose raw data to machines and take advantage of the rote activities in processing that machines do really well.

“OECD Principles and Guidelines for Access to Research Data from Public Funding” (April 2007). Fairly non-controversial principal that if the public funds research, the data should be release publicly. Publishers do not have a vested interest in becoming data publishers.

“The Toronto Statement on prepublication data sharing” (September 2009). Encouraging sharing of data before the long publication process.

OECD: “Open access to research data … easy, timely, user-friendly and preferable Internet based”

Gov’t data: US Memorandum on Transparency and Open Government, US Memorandum on the FOIA; commitment to public release of gov’t information and the power of transparency. UK Power of Information Task Force: “public information held by for example the police, health bodies and local authorities is often not available. This is bad for democratic expression, the economy, and citizen customers.” US – data.gov; can librarians help governments learn to share this data more effectively? UK PM Brown meets with Tim Berners-Lee, announces UK wants to release gov’t data as linked data.

Library data: ILS Customer bill-of-rights (2005); Berkeley accord (2008).

Personal data: privacy risks, but potential power from the data in our lives. Wired cover feature “Living by numbers” (July 2009). Twitter will soon allow you to opt-in to automatically recording you geographic position.

Why libraries? Advocates, exemplars, experts. Open up data in a sensible, productive, usable way. Unlike print, data is not self-describing. E.g. of DataCite: “DOIs for data”; NRC-CISTI Gateway to (Canadian) Scientific Data Sets.

Canada hasn’t had the strong push from the PM/Pres level that other nations have, but there are significant projects. It’s actually really difficult to release government data under crown copyright. Can look at geogratis, DLI, and ODESI for examples of how it can be done.

Municipal efforts too: Vancouver, Toronto has plans, Ottawa working on a policy.

Back to Library data: how do we connect library data to patrons in a similar way? Some examples: a million free covers from LibraryThing, the Open Library has pulled data from all over, TALIS Connected Commons specifically about linked data, MESUR (resolver data) – we have data in our resolver logs that we could use to build interesting tools, LCSH (see Dan Chudnov, later).

APIs vs raw data….

Personal data: Daytum, people can record almost anything about themselves.

Back to the peta-scale: Total Recall; only valuable if you could find stuff in that huge store of information. Libraries as preserving culture and its outputs, must think about how we record and preserve people’s lifestreams.

Posted by pzed on October 1, 2009 at 8.15am
Categories: access 2009, conferences, libraries, twitter

Post a comment