#access2009pie – Peter Rukavina – Infinite Malleability
2 Oct 09
Going to talk about application vs capabilities instead.
When designing a system, you can build apps, or you can build capabilities. Flexible, multi-use systems for multi-talented generalists: farm example, architecture that supports multiple uses. Farmers do lots of stuff. Contrast with a factory, inflexible, purpose-built, made for specialists. Factory workers do one thing over and over. Industrialism changed the system design paradigm from capabilities to applications. The two overlap, but are different sensibilities. Apps are discrete, manageable, predictable, artificial; capabilities are interrelated, malleable, unpredictable and natural.
E.g. of the unix command line as providing capabilities. Contrast Royal Botanical Gardens–a nature application that has been installed in Hamilton–with the Charlottetown Boulder Park, a capability of the urban landscape. There are many other examples.
Library web site (Robertson UPEI) as an example of an application oriented design, vs Google (of course) which provides capability. Is the library’s mission to run a book and reading management application, or to extend society’s knowledge capabilities?
Posted by pzed on October 2, 2009 at 7.04am
#access2009pei – Stevan Harnad – Grasping what is already within immediate reach
1 Oct 09
Open Access means free, online, immediate, permanent access to reading, downloading, storing, printing, data crunching
Primary target is 2.5M articles written for academic journals, primarily author giveaways. Optionally can include books and other categories.
About 25K journals published worldwide. Most universities can only subscribe to a small fraction. Research is having only a fraction of its potential impact, achieving only a fraction of its productivity. OA provides a remedy. Free articles found to be cited > 3x as often (Lawrence 2001), with significant impact advantage. True in every field tested, Research that is freely accessible has 25-250% greater impact (Brody & Harnad 2004).
Two ways to do it: publishers convert to OA (the golden way), researchers deposit in IRs (the green way). However, only 15% of articles are being voluntarily submitted. Gold relies on publishers, whereas Green only requires the research community. USouthampton has created EPrints (which Harnad strongly recommends over DSpace).
Creating IRs is a necessary but not sufficient condition for creating 100% OA. Many repositories, but most are almost empty. Incentives are not sufficient to increase self-archiving. To guarantee 100% self-archiving, must make it an administrative requirement. USouthampton ECS repository virtually 100%. Why?
Publishing is mandated already (publish or perish), self-archiving mandate can be a natural extension. Surveys indicate 95% or researchers would comply, more than 80% willingly. Only those IRs with mandated deposit achieve any where near 100% self-archiving. There are currently 98 institutions world wide with Green mandated deposits. That’s out of over 10,000 institutions. See ROARMAP. There are 57 university mandates so far. There are 41 research funder mandates. In Canada, only one departmental mandate, 8 funder mandates, one funder proposed (NSERC).
OA articles accelerate the research/access/use/citation cycle: OA articles are cited sooner. Time-course of citation/use cycle shows more citations means more downloads. Higher early downloads means correlate with high citation rates later.
Mandates should be to
– deposit all articles
– in an IR
– immediately upon acceptance for publication; a compromise is the “immediate deposit – delayed access” mandate
63% of journals endorse immediate, Green OA self-archiving. For the remaining 37%, EPrints has an EPrint Request button. Any user on the web can still reach the metadata, but click “request a copy”, then send an email form that indicates the article is needed for research purposes. Email goes to the author, who can then click “OK” thereby sending a copy to the requestor.
EPrints has rich use metrics. Integrates with CiteBase. One of the rewards of self archiving mandates: authors are often interested in vanity searches. Also important in evaluating impact etc.
Posted by pzed on October 1, 2009 at 10.17pm
#access2009pei – Roy Tennant & Mike Rylander – ILS stuff
1 Oct 09
Roy Tennant
ILS in the Sky with Diamonds
Many of our systems and services are moving into the “cloud”. Moving library data/apps to the network level at web scale. “Cloud computing” opportunities. COmputing tasks move from in-house servers to the net. Incorporates infrastructure, platform, and software as services. Low barriers to entry, pay as you go, no need for local server capacity, automatic software upgrades, saves staff. Drawbacks: lack of complete control, reliance on network connectivity, data held by a third party.
Amazon’s web-scale value proposition: most orgs spend 70% on infrastructure. Attempt to flip that: spend 30% on infrastructure, 70% on initiative.
One library example is OLE Project, ground-up new design for an ILS including workflows for all tasks envisioned in new system. Tennant not sure if this project has new funding, at the modelling stage. Another example is the Extensible Cataloging Project.
OCLC believes libraries have too many systems to support, too much invested in maintenance, a fragmented web presence, and lost opportunities for leveraging common data. There’s growing dissatisfaction with existing options, very few alternatives. Starting to see some real choices, finally!
OCLC is uniquely positioned to get into this stuff.
– > 1M libraries worldwide, > 5K transactions per second
– OCLC thinks this could all be done with a handful of commodity servers. Some nice graphics on simplifying he mess of siloed systems we deal with
– how much of our data could be shared, in what ways? vendor data, e.g.
A next-gen LMS
– supports all library management functins
– scalable and 100% web based
– reduced total cost of ownership
– unified platform for print/electronic
– flexible, customizable, but not unique
– network effects: application sharing, data registries
Responsive, scalable, fault tolerant, agile, suitable for public consumption (web services!), integrates with existing OCLC stuff
There’s a Web-scale management strategic steering advisory committee (or sumthin); anticipate rolling out a product in 2011.
Data: intent on making workflows more efficient, allow more intelligent CM decisions. The library who add’s data, owns the data. What goes in must come out. Leverage the power of WorldCat….
Mike Rylander
OS ILS
Without open software, the knowledge of how to read formats in which open data are stored will be forgotten. Even MARC will die!
OS ILS advantages:
– ROI – Evergreen an order of magnitude cheaper
– leverage with proprietary vendors
– control, over the code, the data, and the direction
– ideology a very good match with libraries
Next big things
– Cloud computing: using other peoples computers, learning not to waste computing resources
– Software as a service: hosting with a service provider, on virtual servers – BUT you don’t get to write the software
– Platform as a service: hosting with a service provider, on virtual servers – BUT you can only run apps targeted at a specific framework
Evergreen is SOA, SaaS(able), PaaS(ish)
Could we leverage the scaling features of Evergreen to build a community owned, run, and maintained PaaS cloud? Vision of each of us running one or two commodoty servers talking to each other over the internet, would immediately become the largest Evergreen instance in the world. PINES has 20 servers, never go over 25% utilization.
Posted by pzed on October 1, 2009 at 10.15pm
#access2009pei – Donald Moses & Paul Pound – Islandlives
1 Oct 09
(Donald) Privately funded project to digitize PEI books, running mostly on OS software. At it’s heart is Islandora.
Community pieces: created the repository of bib records first, began local promotion through newspaper and others, building community support. Then needed to connect with rights holders. Contact authors for copyright-protected items, are some challenges with handling orphan works. “Post-it note” method of tracking rights holders! Have a notice and takedown page for individuals who believe they have rights in incorrectly categorized orphans can contact. Allow community contributed metadata for photographs, which are cropped without metadata from books.
Develope TEI viewer, page viewer. Used Google Forms for QA process.
(Paul) Islandora is a Drupal module. Drupal manages users, roles, permissions. Connects to Fedora Drupal filter. Can use Drupal’s LDAP without having to build into Islandora. Use Drupal permissions to manage permissions for data operations. Drupal also provides themes, easily customized. Some installs use out-of-the-box themes, others minor customizations. Islandora actually six modules; also leverages Lucene, Djatoka, OpenOffice.
– Fedora Repository module
– Scholar module
– Fedora-Attach module, extends Drupal’s file attach module
– Fedora Imageapi module, building on D’s imageapi to manipulate image streams
– Islandlives module: several book object datastreams
– that’s only 5, did I miss one?
Use Fedoragsearch to index MODS, TEI, and DC datastreams. Untokenized fields for lists (place names, etc.); tokenized fields indexed for searching.
IslandLives JP2 viewer using djatoka, openlayers, jquery to view jpeg2000 images. djatoka is an OS jp2000 image server. Some issues with djatoka and Fedora.
IslandLives TEI Editor is a separate Drupal module. GUI to mark up TEI, lots of js
Tagging application: where to store tags? usually choose Fedora: have many Drupals hitting on Fedora. Tag xml can be indexed in Lucene.
For the future:
– use Drupal’s Solr module to combine Drupal/Fedora searching in one index.
– take advantage of Drupal hooks to sync data between Fedora and Drupal
There’s an Islandora google group, and FedoraCommons hosting. Islandora team is eight people.
Posted by pzed on October 1, 2009 at 12.25pm
#access2009pei – Dan Chudnov – Chudnovian Stuff
1 Oct 09
Repository Development Group at LC: 30 people, various roles (including dedicated project managers), various backgrounds. LC21 Report guiding LC srtategies, from this report the Office of Strategic Initiative came to be.
– capturing digital artefacts
– make them available for copyright registration/deposit
– pass along for inclusion in the collection
– subsequently processed for cataloguing, indexing etc.
Scale is global: LC universal collection imperative. Capture world scale, distribute web scale. E.g. of wdl.org – global partners, content from all over the world, users as well. Launched April 2009, big press release resulting in 9K requests/second on day 1. Entirely relying on open source software. Clean URIs, static pages: global edge caching with “very well known” caching service.
Another e.g.: Chronicling America; digitization of local/regional newspapers. Approx. 140K US newspaper titles’ bib records, 1.4M pages of content. All freely available now. Scale already over 100Tb from only 16 of 50+ states/territories from about 1850 to 1922. Similar software stack and design decisions to wdl.org
Using the word “movage” more and more: preservation and storage, on a practical day to day, is actually moving bits around. Capture artefacts using BagIt: think of it as a packing slip for data. Tells you what data should be inside, can then check to make sure it’s really there. Sender tells you what is being sent, receiver checks to make sure it really was. Oddly, this hasn’t really been solved previously. Works across space, systems, organizations, time. Also easy to make: tools: md5deep, (bagit library?), bagger; free, OS releases from LC: sf.net/projects/loc-xferutils/. OS release was very new for LC, lawyers got involved, but it got done!
Challenge of managing communication among people: for every bit that moves around, there are human communications that have to take place. Need to improve transfer, inventory, workflows.
Chudnov really cares about incorporating digital objects in the collection. Traditionally using catalogue records, exhibit sites. Cost of integrating everything in this way is high. Hard, expensive, need skilled people with time. Cost of updating everything is even higher. Good news: cost of consistent web strategies (increasingly adopted) is low. E.g. of linked data. linked data design issues. In LC’s case, LC authorities on the web is a newly available example. Machine readable view is acceptable for people and bots, and the end point includes a clean, concise definition of what it means (mainly for humans, but bots can work with it too).
Visit a URI, get something that defines a concept with a precise meaning. This is a standard way to refer to a catalog heading. Never had that before. A healthy web of data. Available now, and can download and mashup. Also new for LC to give headings away like this.
OAI-ORE aggregations to describe data. Look at the web, see a thing; OAI-ORE defines the constellation of things that make up that page. Each concept defined explicitly in the RDF. Interesting thing about all this work: the web itself is the API. Repeat that! No secret key, no custom interface.
LC mission says make things available and useful. Idea for how to incorporate digitally into the collection–”sustain and preserve a universal collection”–if we’re consistent about what we mean when we publish something, giving people links to follow, and everyone is consistent in the same way, end up with distributed conceptual integration. The web is a universal collection. Let’s all incorporate all our artefacts into the universal web!
Posted by pzed on October 1, 2009 at 11.44am
#access2009pei – Mark Jordan & Brian Owen – COPPUL stuff
1 Oct 09
Marc Jordan
COPPUL’s LOCKSS Private Network
LOCKSS preserves by making at least six copies of things. Does a preservation check to ensure copies do not become damaged. Private networks tyipcally have mixed content, public network primarily ejournals.
How does something get into the LOCKSS network? On the public network, there’s a nomination/voting process. On the private, content is determined by whoever manages the private network. COPPUL includes collections of local interest, of greater than usual risk of being lost if not preserved by LOCKSS. Can be done on a low end server, storage about 1-4Tb. Storage is the big hurdle. Minimal staff needs to set up and run the machine.
COPPUL: OJS content, CONTENTdm, USask ETD database, local “staged” content.
To allow content to be harvested, must set up a manifest to tell LOCKSS crawler that it has permission to access. OJS supports this, not surprisingly.
One outstanding tech task: integrating LOCKSS private network into campus proxy.
Staged content: not public facing, packaged for programmatic exposure and retrieval; e.g. of CONTENTdm content, SFU Editorial Cartoons. Intended to be dbs/repository neutral and to facilitate long-term preservation. Archival units are folders containing zip files, manifest page links to file and says “yes, you can harvest”. BagIt specification identifies content/metadata in the directories. Much simpler than an XML packaging format. Relies heavily on checksums. Metadata itself will be XML.
Brian Owen
Software Lifecycles & Sustainability: a PKP and reSearcher Update
reSearcher: CUFTS, GODOT, dbWiz, Open Knowledgebase
PKP: OJS, OCS, Harvester, OMP, Lemon8-XML, PKP WAL
Projects are both open source (GPL), LAMP architecture.
Under development: Open Monograph Press (OMP); PKP Web Application Library (WAL). PKP user interface upgrade will be tried out first on OMP.
Open source not just about good code:
– community building
– sustainability strategies
Posted by pzed on October 1, 2009 at 9.35am
#access2009pei – Richard Akerman – Will We Command Our Data?
1 Oct 09
The David Binkley Lecture.
(Akerman writes Science Library Pad.)
Issues around data use and management are not unlike those facing copyright.
How big is data? Although storage capacity is significantly improved, it takes about ten 2m tall racks to contain a petabyte. There is a physical aspect to data, and costs associated with it. At the petabyte scale, data must by close to computation because of bandwidth constraints.
Four sources of data: research data, government data, library data, personal data. Government data is being released a bit more freely, so there’s more of it and we might be in a position to leverage even more into the public realm.
Convergence of factors since 2000: value of sharing, ease of sharing, and level of sharing at the machine level. We see this as good, and it’s increasingly easy to do. Are increasingly able to expose raw data to machines and take advantage of the rote activities in processing that machines do really well.
“OECD Principles and Guidelines for Access to Research Data from Public Funding” (April 2007). Fairly non-controversial principal that if the public funds research, the data should be release publicly. Publishers do not have a vested interest in becoming data publishers.
“The Toronto Statement on prepublication data sharing” (September 2009). Encouraging sharing of data before the long publication process.
OECD: “Open access to research data … easy, timely, user-friendly and preferable Internet based”
Gov’t data: US Memorandum on Transparency and Open Government, US Memorandum on the FOIA; commitment to public release of gov’t information and the power of transparency. UK Power of Information Task Force: “public information held by for example the police, health bodies and local authorities is often not available. This is bad for democratic expression, the economy, and citizen customers.” US – data.gov; can librarians help governments learn to share this data more effectively? UK PM Brown meets with Tim Berners-Lee, announces UK wants to release gov’t data as linked data.
Library data: ILS Customer bill-of-rights (2005); Berkeley accord (2008).
Personal data: privacy risks, but potential power from the data in our lives. Wired cover feature “Living by numbers” (July 2009). Twitter will soon allow you to opt-in to automatically recording you geographic position.
Why libraries? Advocates, exemplars, experts. Open up data in a sensible, productive, usable way. Unlike print, data is not self-describing. E.g. of DataCite: “DOIs for data”; NRC-CISTI Gateway to (Canadian) Scientific Data Sets.
Canada hasn’t had the strong push from the PM/Pres level that other nations have, but there are significant projects. It’s actually really difficult to release government data under crown copyright. Can look at geogratis, DLI, and ODESI for examples of how it can be done.
Municipal efforts too: Vancouver, Toronto has plans, Ottawa working on a policy.
Back to Library data: how do we connect library data to patrons in a similar way? Some examples: a million free covers from LibraryThing, the Open Library has pulled data from all over, TALIS Connected Commons specifically about linked data, MESUR (resolver data) – we have data in our resolver logs that we could use to build interesting tools, LCSH (see Dan Chudnov, later).
APIs vs raw data….
Personal data: Daytum, people can record almost anything about themselves.
Back to the peta-scale: Total Recall; only valuable if you could find stuff in that huge store of information. Libraries as preserving culture and its outputs, must think about how we record and preserve people’s lifestreams.
Posted by pzed on October 1, 2009 at 8.15am
#access2009pei – Cory Doctorow – Copyright vs Universal Access
1 Oct 09
tale of two networks: the one we thought we would get, delivering 500 channels of high-res tv! The network that would make us more socially normally (instead of infinitely weirder). David Eisenberg calls this the “smart” network.
Instead, we got a dumb network, in which the people in the middle don’t know what they tech is for or what people would do with it. Great advantage to this is that people at the edge can be very smart.
Surprisingly, dumb network delivered progressively low resolution. Example of telephone, from high quality centrally controlled network, through introduction of crappy phones, to mobile, to skype. We trade quality for price, access, and customizability. Content isn’t king, conversation is.
Every exec thinks they’re industry is the most important thing ever, and are regularly proven wrong by the cycle of creative destruction that is the market economy. Except when the have a regulatory monopoly.
Countries have formerly managed copyright in local, idiosyncratic ways. However, the current regime is governed by a harmonized approach developed through WTO etc. and these rules are written primarily by industry insiders, preferring rights of producers over rights of users.
The network is fundamentally a copying machine, with increasing capacity for storage. It just gets easier to copy. But copying is reified not as an act of an individual, but as an act of a company making copies on an industrial scale. The problem is it doesn’t take a giant, industrial machine to make a copy any more, but we trigger the same set of regulations that govern industry to govern the activities of private individuals. On the internet, we make copies simply by accessing material. We communicate, make plans; read for education, political engagement; work, fall in love… all governed by copyright.
UK study: Extending the term of copyright has a net negative effect economically. DRM doesn’t work. Policies are set without any recourse to evidence. Industrial revolution was not based on buying and selling machines, but using and access to them. Info revolution must also be based on access and use.
The punishment for infringement in many places is disconnection from the internet. Effectively, this is equivalent to the death penalty for citizenship. Future treaties may build surveillance and control into regulations, requiring hardware to be checked at borders, ISPs to inspect packets. These negotiations are entirely in secret, the Obama admin says its position papers are state secrets. Why? Because experience has shown (Hello, Sam Bulte) that when the public becomes aware of them, we rebel.
Copyright law should go on doing what it’s always done: regulate the way corporate entities interact with one another, not how we as individuals act. The point of copyright law can’t be to ensure that one group of people get to make a living for ever. Rather, its role should be to ensure that the greatest number of people can participate in culture. Libraries have an important role, as an unimpeachable moral authority.
Posted by pzed on October 1, 2009 at 7.16am
institutional repository – some definitions
18 Sep 09
ODLIS:
A set of services offered by a university or group of universities to members of its community for the management and dissemination of scholarly materials in digital format created by the institution and its community members, such as e-prints, technical reports, theses and dissertations, data sets, and teaching materials. Stewardship of such materials entails their organization in a cumulative, openly accessible database and a commitment to long-term preservation when appropriate. Some IRs are also used as electronic presses to publish e-journals and e-books. An institutional repository is distinguished from a subject-based repository by its institutionally defined scope. IRs are part of a growing effort to reform scholarly communication and break the monopoly of journal publishers by reasserting institutional control over the results of scholarship. An IR may also serve as an indicator of the scope and extent of the university’s research activities. (institutional repository (IR))
Wikipedia:
An Institutional Repository is an online locus for collecting, preserving, and disseminating — in digital form — the intellectual output of an institution, particularly a research institution.
For a university, this would include materials such as research journal articles, before (preprints) and after (postprints) undergoing peer review, and digital versions of theses and dissertations, but it might also include other digital assets generated by normal academic life, such as administrative documents, course notes, or learning objects. (Institutional repository)
CARL:
An institutional repository (IR) is a digital collection of an institution’s intellectual output. IRs are a key infrastructure component in the digital environment because they provide better access to our digital assets and they ensure that digital objects are managed appropriately. (Canadian Institutional Repositories)
For the record. . . .
Posted by pzed on September 18, 2009 at 3.13pm
institutional repositories
16 Sep 09
I’ve been planning, for a few months now, to start using this space to think more deeply about my job. Among other things, I’m Digital Initiatives Librarian at the University of Windsor’s Leddy Library. What a job title like “Digital Initiative Librarian” might mean differs greatly from institution to institution; as in most things related to libraries, I think this is a by-product of the fact that we’re trying to figure out what it is we do anymore. I’m fortunate that I have a fair amount of leeway in deciding exactly what it is that our version of the Digital Initiatives Librarian will do.
But the position does come with some expectations, and one of these is to guide the development of our institutional repositories. The problem is, I’ve always been a little skeptical that libraries should dedicate resources to archiving copies of the published work of their faculty. Essentially, you end up with a large collection of disparate materials united only by the fact that at least one of their authors was affiliated with a specific institution. And nobody asks themselves, “Gee, I wonder what people at University X are doing in my discipline?” One of the reasons why academic journals exist is to collate research output by discipline, and if I want to stay on top of things in my field, I read those journals.
Not to say there aren’t useful things we can do under the rubric of institutional repositories. For institution, at Leddy we’ve been working on locally mounting digitized copies of UWindsor dissertations and theses. But generally, I don’t think creating additional copies of already published works and “exposing” their metadata is a particularly useful contribution to scholarship.
Now, it’s not like I haven’t read anything about this in the past, but I need to do a bit more focused reading over the next little while. And since I’m going to talk about it here, I figured it wouldn’t hurt to expose my prejudices first.
Posted by pzed on September 16, 2009 at 4.02pm
