#access2009pei – Cathy Hartman and Mark Phillips – The Portal to Texas History
3 Oct 09
Began mid 90s by pulling in a small, orphaned website: Advisory Commission on Intergovernmental Relations. Agency had 50+ years of publishing; became a burden for reference, docs, and ILL staff. Got some funding to outsource scanning of their periodical publications and started mounting them online as PDFs. This was the beginning of what migrated into the Portal project.
Wanted to help Texas libraries, museums, and cultural heritage organizations.
– Many small organizations wanted to put stuff online, but didn’t have the skills our resources.
– build online collections connecting like materials from many libraries/museums/archives.
– provide resources for educators: lesson plans and resources built on digitized history collection.
– provide single interface to heterogeneous collections.
– text and photographic materials, some video and audio
Knew external funding would be needed. Easier to obtain funding for site creation/improvement and content building when providing services to many organizations. Federal, State, and foundation funding was received.
More than 100 partners, 3 models:
– UNT does all the work (costs more!)
– Partners scan, create metadata; UNT puts online
– UNT scans, partners create metadata, UNT puts online
UNT then does quality assessment for models 2 and 3. Creating metadata is the most expensive part.
Infrastructure: IOGENE Project, an IMLS funded project. Rapid development framework for digi lib interfaces with genealogists as the target users. Focus on user centred interface design.
Wanted a lightweight public access system to digital content; easily scalable in content, number of requests, collections, partners, types of content. Using tools that have well established communities, using other people’s code as much as possible writing only the library stuff. All digital content in one big pot (no silos) using different interfaces to brand and manipulate sub sets.
METS, DC (locally qualified), Pairtree, BagIt, ARKs identifiers for public access metadata management. Archive backend a different set of tools.
Defined a digital object model, serialize to METS, use same object model for all document types, works for everything so far. Object structure is mapped to URLs. ARK identifiers map well to beautiful URLs. URLs become the API. Designers can make significant changes to interface without interacting with developers. Metadata editor adds ‘edit’ subdomain, easy click to fix errors when discovered.
68K objects, 60K more in the queue, possible 700K records from State agencies, even more from possible newspaper sources.
Adding partner services
– brandable interfaces
– partner’s domain names
– SRU/OpenSearch target for each partner/collection
– OAI-PMH repository for each
– new services are developed, added to the stack, benefit everyone
UNT benefits
– all “digital library” content in the same system
– build rich research collections for students, researchers, the community
Working on interface overhaul, beta is here, somewhat cleaner interface, focus on usability improvements; help guides available to reroute questions if needed.
Posted by pzed on October 3, 2009 at 8.09am
#access2009pei – Mark Leggott – Virtual Research Environment, 2 Years Later
3 Oct 09
Islandora Takes Shape
Islandora is the combination of “Island” and “Dora the Explorer”.
Anticipating the death and rebirth of the repository. Drupal/Fedora combination: default alterior motive to build capacity in the Library/Campus, keep the library at the forefront of all campus activities: teaching/learning, research, admin. Data stewardship a critical concept. Must store, transform, provide access, mutate, migrate data. Longevity and usefulness.
Stewardship is a must, curation is just one small part. It is a deluge, researchers are freaking that they have to manage this stuff. Recent reaction from scientists at conference: This is the first time someone has responded to the data challenges I am facing. Typical, IT draws the line at hardware. There is no more significant opportunity for academic libraries in the next few decades. Will also stimulate library’s research development, IR development, etc.
Two years later: vision is the same, have some experience, a good evolution of tools in the community, and the library is the foundation for data management in all three landscapes. Research is the core driver, but admin is a significant driver as well. Staff feels enabled in finding solutions to challenges.
It’s all about the local: Google can’t do local like we can. Every research project is multidisciplinary and multinational, so your local becomes global and international.
Focus is on OS and open data. Have about 7 staff 50% or more committed to the project, soon to be 12+. Research, have received about $150K in hardware, faculty are encouraged to leverage research grants to by hardware add-ons to increase capacity rather than get standalone servers. Also about $200K recent funding for staff. Research fund tech staff grants rerouted to the library. Leveraging multiple pools of resources to build shared capacity both in hardware and staffing. Interest in using Islandora increasing at Admin level, e.g. Senate document management pilot: immediate storage and stewardship for institutional documents. Rather than throw money at adopting a document management system from scratch, are investing in building document management capabilities into Islandora.
The learning environment is the least active, e.g. learning object repositories haven’t really taken off, but focus is on plugging Islandora into learning management system. Currently over 50 research VREs from a broad range of disciplines, a number of others dedicated to admin areas. Probably looking at giving each major committee its own VRE. The look is pretty basic, starting to look more at improving; focus has been content.
Islandora external: 1st external contract by end of Sept. Implementors include U of North Texas, Georgia Inst of Tech, UNB, Carleton, UGuelph; like all OS projects, there may be more. Sloan-Kettering is interested. Working on DuraSpace partnership. Also working of FESL.
All data is stored in Fedora rather than Drupal: data, metadata, workflow, authentication all stored and maintained in the repository. Drupal is the collaborative layer. Three or four Drupal multi-site installations. Keep public installations separate from higher
Islandora is not in the Drupal contrib because some clean up is needed. It’s the glue that ties Fedora and Drupal together: D module, php and java apps, rule engine for flexible workflows, drop-in support for modules. Use a senior level comp sci class for development of components (e.g. of TEI editor shown earlier). Plug-in capability.
Ability to define and integrate complex digital workflows. A lot of science is drudge work, similar to the drudgery in digitizing books. Looking at integrating Taverna.
Solution packs: policies, disseminators, worklows, apps, data. First solution pack will be IRs.
Sun partnership: rapidly evolving to provide support to Sun resellers for selling hardware platform with Islandora pre-installed. Goal is to provide development and support contracts to the community generally.
Goal is to have a rich, fully defined community framework. There is effort that goes into customizing data management for different research groups. There will be a quarterly roadmap for code changes, each will have a new solution pack starting with IRs in January 2010. Planning Islandora/RIRI institute for Summer 2010.
Posted by pzed on October 3, 2009 at 7.18am
#access2009pei – Bess Sadler and Jon Jiras – Next Gen OPACs – part 2
2 Oct 09
Jon Jiras, Rochester Institute of Tech
eXtensible Catalog
XC is a modular set of OS tools, can use any part that’s useful. Facilitates the resource discovery piece, but also has core metadata management piece. V1 scheduled for January 2010 release.
faceted, frbrized user interface; customizable, a web app framework for library data. metadata tools. connectivity tools: OAI and NCIP
interface will be a Drupal module. Can include traditional, digital, and web resources. Can use interface as a platform for library website.
Metadata toolkit can aggregate data from many sources, normalize MARC and DC data for indexing using XC schema (RDA); there will be a web interface for staff to tweak, run reports, configure, etc.
Connectivity: XC OAI toolkit, can maintain a synchronised copy of repository metadata. XC NCIP toolkit allows ILS circulation features in XC and access authentication through ILS or LDAP.
Posted by pzed on October 2, 2009 at 2.45pm
#access2009pei – Bess Sadler and Jon Jiras – Next Gen OPACs – part 1
2 Oct 09
Bess Sadler, U Virginia
Blacklight: Findability for your whole collection
Blacklight is a discovery layer, about creating a great user interface, increase serendipity: Dan Rubin “If your interface requires instructions, it needs to be redesigned”
Lack of relevance ranking, lack of permanent URLs, no RSS, siloing of collections, lack of object type appropriate behaviours (e.g. of DVD which doesn’t have an author), inability to respond to user requests and suggestions; these things are broken and we need to fix them ourselves.
One problem Blacklight is trying to solve is the variety of different siloed data sources: catalogue, IR, dissertations, Google books project, local digitization projects, licensed journals and databases, etc.
Solr is the anti-silo. Easy to use: download it tonight! Indexing is under our control, adding a new collection is as simple as adding a new xml output. We determine how to index our own data. E.g. of music collection, meticulously catalogued; number one reference question: flute/violin duets? Hasn’t been in our power to define indexing in order to answer that question. Can have different metadata profiles for different kinds of objects.
But how do you get good results? Because we index it ourselves, we have control over our relevance algorithms. In practice, a testing language called Cucumber which plugs in to Ruby. Looks like English. Can further resolve conflicts by developing lightweight interfaces for specific user groups. E.g. music people get a difference relevance algorithm: most relevant thing is rarely an exact title match.
Solr, Ruby on Rails; plugin structure allows for local customizations without forking.
Data should be stable, enduring; applications should be kept lightweight. Control of the app must be as close to the user as possible.
Becoming widely adopted: Stanford, Johns Hopkins, US Natl Agriculture Lib, U Wisconsin, others; working on commercial ILS integration, GIS stuff, and more.
Posted by pzed on October 2, 2009 at 1.35pm
#access2009pei – Roy Tennant – Inspecting the Elephant
2 Oct 09
Roy works in research and is somewhat apologetic about yesterday’s sales pitch.
The Hathi Trust is a shared digital repository that grew out of the Google digitization project. U Mich leads the effort, OCLC works on the service side. Lots of partners, but 82% of contributions from UM, 12% from UCal (ramping up quickly), Indiana and Wisconsin have small pieces as well.
When Hathi Trust web site was set up, also allowed download of all metadata describing volumes in the project; 13 elements mostly of little use outside the environment. Roy, for fun, grabbed the file, parsed it in XML (no standards, just because). Indexed it and created a search utility.
Then a colleague (Constance Malpas) at OCLC came up with a “cloud library” project. Shared digital and print repositories to create new operational efficiencies for research institutions. Requires new infrastructure for managing, monitoring, consuming shared services.
[insert Stan Rogers, "The White Collar Holler", here]
Downloaded HT metadata, enhanced with OCLC numbers, explode the data into millions of tiny xml files, indexed it, extracted unique OCLC numbers and sent to JT (a person) who extracts the WorldCat records. Then merge HT data into WC records, index, and extract info to simplify reporting. Perl/XML/Swish-e, XSLT, xsltproc
OCLC doesn’t really use MARC internally. Rather, have their own CDF (common data format?) which allows data to be extracted in more usable ways (e.g. of dates). Also inserted HT metadata. Also plan to insert metadata from libraries involved (holdings? I kinda missed that).
Some interesting reports. Murky buckets, a Lorcan Dempsy term. E.g. of weird dates. Roughly 16% as of Sept 2009 are public domain: 600K volumes, mostly pre-1922 and government documents. UM actually does proactive review of copyright status (unlike Google), trying to open up as much as possible. Subject distribution dominated by arts and humanities, esp. literature and history.
Now working with NYU: considering impact of collections overlaps among NYU, Hathi Trust, and ReCAP (a shared storage facility in NY).
Lessons
– identifiers are essential: OCLC number Roy thinks is best
– standards are great until they get in your way; have ignored both internal and external standards but it gets effective work done
– never underestimate the power of a prototype
Posted by pzed on October 2, 2009 at 12.27pm
#access2009pei – Dorothea Salo – Representing and Managing the Data Deluge
2 Oct 09
Grab a bucket, it’s raining data!
Potentially golden age of digital librarianship; digital research data is an entirely new form of research publication.
Salo admits she’s been described as the Cassandra of OA. She’s not against Open, but five years of running IRs…
– unclear goals
– insufficient means
– asking faculty for something, offering nothing
– IR view of digital universe is more narrow than the content we need to contain
– fit between user needs and system needs not good
sees similar trends in the early days of data curation, but it’s early days: no reason we have to make the same mistake again.
focus on the fit between content and container, with a human (rather than technology) lens
what do we know about research data?
– there’s a lot of it; do we have big enough buckets? CLOUD.
– data are there to be interacted with, we store it in order for people to do stuff with it; not “look but don’t touch” museum objects
– CC0 is about getting a legal barrier out of the way of data reuse; we must get rid of the tech barriers
– the data buckets will be must internalize and respect the affordances of different kinds of data
– data are diverse, as are their technical environment. E.g. can’t treat a book marked up in TEI in the same way you treat a book made up of page scans. Also can’t treat similar or related data as entirely separate entities in the manner of dSpace
– we often don’t control the technical environment (e.g. proprietary formats) our data live in
– if we’re lucky, we might be able to advise our researchers on how to store their data, but more often we will need to adapt to them; they’ve already created quite a lot of it, and they’re not always thinking very far ahead; nor do IT people often have as long-term a time horizon as we do
– researchers often have no idea we can help manage their data, sometimes don’t even trust us to; have to go out and rescue it
– and of course it’s also us creating a tonne of unsustainable digital silos; all that stuff is in danger
– a lot of data are analog but really want to be data: paper lab notebooks, linguistic field notes on paper, slides; can we scale up to that?
– data are project based: Exploring the Hype(r) a dissertation based on WordPress; how are we going to deal with this? Researchers are not above building an entirely new tech stack for every project
– data are sloppy; if we insist our repositories will only accept clean, pretty data, we have a problem
– data aren’t standardized, aren’t going to be
Our big bucket: the digital library. We already do big data. Another big bucket: the IR. Neither of these will magically solve the data problem. There is an impedence mismatch between DLs or IRs and data. We have developed lots of skills and tools that will help, but we need to rethink how to apply them.
U Wisconsin is rebranding “digital collections” – being digital is no longer what linguists call a “marked state”. Digital libraries carefully built and tended, careful selection of best materials, we then lavish a lot of effort on them. But how will our careful collection/development policies cope with what’s already out there?
Concern that data projects will follow the money, leaving arts and humanities behind; how do we decide what to archive? How will we rescue the sloppy data that’s out there, when our natural tendency is to keep things neat? How much and what kind of care can we give our data libraries? They can’t all look as good as our digital libraries?
We like to do things in a “Taylorist” way in production. In the DL context, tend to limit the kinds of work to what you can easily automate and train for. A DL will specialize in one or a few things. Specialize ourselves by data types. How will that serve us when we’re not in control of the data production process, when it doesn’t fit in our buckets? Will not have the luxury of specialization. How can we be efficient when the data don’t come in standardized form?
There will be technology structure mismatches. Choices? We can pull the data out of their environment and recreate it. Kind of like pinning a butterfly in a museum case. Lose things, like search functionality, when flattening a dynamic site into something like dSpace. Other choice? Can take on maintenance and “future proofing” of the site: not an efficient process. For every single new input, somebody has to figure out how its put together, how it works, how to move the old interpretation into the new one. That’s work.
Many digital libraries are project silos, built to solve a specific problem, but not built for the future, and not replicable. There’s a flood coming, none of us needs to reinvent the wheel. E.g. of Decameron Web: can’t build a “Dante Web” because the tech is completely hidden. DLs are often “cabinets of curiosities”; beautiful, but you can’t get in and play. Context is not the be-all and end-all of how an object must be presented; context is fluid, built and rebuilt. Digital objects need to be exposed so they can be recontextualized, that’s what researchers want to do with data.
Presentation is content specific, but it’s possible to go so far in the direction of content-specific presentation that the data become locked within an unworkable interface. We’ve already lost a lot of digital projects to project siloing. Most Mellon funded digital humanities projects are GONE. Must develop a coordinated rescue effort for project silos. If we can rescue our own projects, we will then know a lot about rescuing other peoples.
What about IRs? We are caged in our institutions. Salo must prove a link to someone in her institution to undertake an archiving project. A lot of data falls through the cracks. Collaboration, people moving around; institutional focus must be given up. Problem starts with scholarly publishers, who allow IR deposit, but not other kinds of web archiving. Research does not respect institutional boundaries.
The IR “we’ll take anything” promise is always broken. IRs are built for journal articles. Can only take stuff that is static and final; old news to a researcher. Model does not work with interactive data. We will lose data that’s already out there but not cleaned up and ready. Sometimes you think something’s final, but it’s not. dSpace and Fedora make it pretty hard to correct things. Our response is, we’ll take anything at all, but it has to be one file at a time: not practical for the data deluge. IR installs not as easily customized as promised.
We’ll take any metadata you want, but only in key-value pairs. Just don’t cut it for data. Running out of time! Content models are broken. Silos are both necessary and unacceptable. Will have to do a lot of content modelling. Run standardization processes on top of that. Lot’s of code to write, please share it ’cause we can’t do it alone. Social processes around DLs and IRs very fragmented.
Fedora looks like a big part of the future, but it needs to change. Replaceability and editability of objects weak.
Must get involved earlier in the research process. Can’t curate what you don’t have.
Love’s Solr: lightweight tool that does heavweight things. Would prefer to become the Clio of Data Curation!
Posted by pzed on October 2, 2009 at 11.55am
#access2009pei – thunder talks
2 Oct 09
Natalie Collins, CISTI Lab
– came out of a hackfest-like “innovation challenge”
– invited everyone across the organization to create something innovative
– proposal day, teams of up to 4 pitching their ideas
– 15 proposals, 8 accepted to go forward
– gave a week’s worth of free time, ended with a final presentation
– winner looked at the impact of news reporting of research on searching
Ali Sadaqain, York U
– web redesign in VuFind
– facets on left sidebar, call number issues, language tweaks
– multiple formats showing up
– wanted some “2.0″ stuff, so added an “add to favorites” [sic] option; not shared
– also included click-throughs to journal articles, although I don’t see explicit SFX enabling
(sorry, missed presenter’s name) Jamaican libraries
– looking at revamping Jamaican libraries in Drupal
– most have single page websites
– most use UNISYS(?) from UNESCO
– few resources for most up-to-date software, and for digitization
– internet penetration is about 20%; 80% of businesses have access
– mobile phone use us very high
Anne Barrett, Dalhousie
– one month live with WorldCat Local as primary search interface
– adds relevance ranking, FRBR features, SFX integration, ILL support
– more modern presentation of information
– older search engine to Novanet still available
– 17,000+ searches in Sept, apparent calm is hopefully good news
– still some outstanding issues: large number of records with no OCLC match
– significant impact on cataloguing
– smaller issues, RefWorks isn’t integrating well off campus
– can FRBR algorithm be adapted to favour more academic institutions
– impact on doc delivery, because of links to items held world wide
Craig Deplace, Jamie O’Toole, School District 16 in NB
– teachers
– used Drupal to develop repository for student and staff generated video
– were publishing on Youtube, less than perfect for kids
– decided to create own Youtube, extended it to allow teachers to upload wide variety of resources: District 16 Media Centre
– close to 3000 pieces of content
– recognized potential to use Drupal elsewhere
– distance delivered media course wanted a way to publish content, originally a PDF newsletter
– The District 16 Report
– students log in, act as creator/editor for news content, can also maintain a personal blog
– there is a moderation queue
– brings together students from 5 sites
– in the process of moving the schools’ websites into Drupal
– much simpler than training teachers to use Dreamweaver
– spreads content maintenance across many teachers, Drupal makes it easy and changes what and how we publish
– one school (Gretna Green?) streams their announcements in video and then uploads to their website
Bess Sadler, U Virginia
– Scholars’ Lab – merger of geospatial/stat data centre with text centre
– had lots of GIS data with very little metadata
– generally had to talk to the “GIS guy”
– received a grant to buy ESRI(?) online mapping system, spent 6 weeks trying to install
– somebody said, why not try OS?
– OpenLayers on top of PostGIS
– simple search utility
– visualizations as well
– planning to load data sets into main catalogue
Jennifer Richard, Acadia
– digitized herbarium
– started with manageable collections, starting with rare/endangered species, then invasive
– applied CFI project for Canadensys, now up to 50K specimens
– improved quality control, search capabilities
– considering replacing proprietary image management with dJatoka
– also includes smaller collection from Cape Breton U, plans to work with St FX, UNB, UPEI
Cameron Metcalf, U of Ottawa
– have 250K air photos
– hoping to bring online, at least the indexes
– photos described by roll number and photo sequence
– very manual, somewhat tedious methods to find photo numbers
– starting to use MarkerClusterer to allow drilling down via web map index
Karen Hunt, U Manitoba
– chat reference
– have been using PHP live
– solution to bridge gap between culture of students and of librarians
– no real IM culture in library, much less text messaging
– now advertising a text messaging number that connects via Google Android phone to librarian interface
Posted by pzed on October 2, 2009 at 9.33am
#access2009pei – Cary Gordon – Drupal In Libraries
2 Oct 09
Drupal is free, OS, simple, based on blogging idiom. Beneath that is a content management framework that can be used as a rapid, web-based development platform.
Hook system, very flexible for inserting your own code. CVS used for versioning, pretty much anyone can build modules and add to the official repository.
Community designed, with about 800 contributors to core (thousands of contributors to modules), 25 maintainers, 2 core committers. By comparison, Mozilla has 50 contributors, 10 staff (Drupal has no staff). Most OS projects follow that more centralized model.
Drupal principles; D has its own lingo (nodes, blocks, etc.); modules and themes to extend capabilities and customize. “Core” means the basic Drupal install, “contrib” is everything else that can be plugged in.
Requires PHP, most installations on the LAMP stack. Drupal modules connect through the hook system. Use jQuery and PHPTemplate as presentation helpers.
Best practices:
– plan your site before you build it, not after!
– plan for the future, don’t lock yourself in
– get involved in the D community
– back up your site
– test your php snippets: Drupal allows you to put PHP in your content, but there’s no safety net!
– observe Drupal Programming Best Practices
– use a version control system
– keep your site up-to-date; should be in the latest release (usually security patches)
Warning:
– Don’t use a Windows server and IIS
– Don’t hack core: cautionary tale about outsourcing to offshore developers….
Of the 4849 modules contributed, there’s often little info on what they do, most start as solutions to specific problems; strongly encouraged to contribute if we write modules that can be generalized. Easiest way to find out what modules are good is to get out in the community and ask. Gordon’s company has a collection of modules they make available to all sites they build; slightly different packages for different types of libraries.
Rather than get new copies when versions come out and overwrite the old, they use links in the root folder.
Examples
– Ann Arbor District Library; wanted the library site to be a social site.
– Darien Library; next version of the social opac (SOPAC), integrating catalogue data.
– Genesee Valley BOCES; mostly school libraries, again integrating catalogue data
– Idaho Commission of Libraries
– Troy Public Library; higher level of theming
– Benicia Public Library; good example of a library with few resources, had no website previously, not fully integrated but lots of features
– Camarena Memorial Library; bilingual, have a Spanish site
– North State Cooperative Library System; service site
The whole concept of using a blogging type system is that you have information that’s up-to-date, ideally stays that way.
Example Applications
– CSU San Marcos: digital repository, intranet, e-resources directory; some custom coding, but pretty much take advantage of D’s taxonomy stuff
– SFU Library Thesis submission system; students submit, staff can manage, grad office gets stats
– Cornell Mann Library room booking app
– Anchor Archive Zine Library
– William Hayes’s biometric data curation tool, import data from spreadsheets, with filtering/visualizations and curation tools
– McMaster subject guides
– Islandora (not added to Drupal contrib, yet….)
Brief update on Drupal 7: in code freeze, no fixed release date,
– D7CX pledge: trying to get contribs to commit to being ready for D7 live date
– changes include allowing users to cancel accounts, more semantic class/ID names, friendlier to CSS layout, jQuery 3, and more!
– more database support, SQLite, in theory could extend to Oracle
– Field API
– File API (files now first class nodes)
– Registry
Start with a clear idea of what you’re trying to build, keep your initial install simple
Cracking Drupal a must read
Drupal Library Group, Drupalib, Drupal4lib
DON’T HACK CORE
Posted by pzed on October 2, 2009 at 8.07am
#access2009pie – Peter Rukavina – Infinite Malleability
2 Oct 09
Going to talk about application vs capabilities instead.
When designing a system, you can build apps, or you can build capabilities. Flexible, multi-use systems for multi-talented generalists: farm example, architecture that supports multiple uses. Farmers do lots of stuff. Contrast with a factory, inflexible, purpose-built, made for specialists. Factory workers do one thing over and over. Industrialism changed the system design paradigm from capabilities to applications. The two overlap, but are different sensibilities. Apps are discrete, manageable, predictable, artificial; capabilities are interrelated, malleable, unpredictable and natural.
E.g. of the unix command line as providing capabilities. Contrast Royal Botanical Gardens–a nature application that has been installed in Hamilton–with the Charlottetown Boulder Park, a capability of the urban landscape. There are many other examples.
Library web site (Robertson UPEI) as an example of an application oriented design, vs Google (of course) which provides capability. Is the library’s mission to run a book and reading management application, or to extend society’s knowledge capabilities?
Posted by pzed on October 2, 2009 at 7.04am
#access2009pei – Stevan Harnad – Grasping what is already within immediate reach
1 Oct 09
Open Access means free, online, immediate, permanent access to reading, downloading, storing, printing, data crunching
Primary target is 2.5M articles written for academic journals, primarily author giveaways. Optionally can include books and other categories.
About 25K journals published worldwide. Most universities can only subscribe to a small fraction. Research is having only a fraction of its potential impact, achieving only a fraction of its productivity. OA provides a remedy. Free articles found to be cited > 3x as often (Lawrence 2001), with significant impact advantage. True in every field tested, Research that is freely accessible has 25-250% greater impact (Brody & Harnad 2004).
Two ways to do it: publishers convert to OA (the golden way), researchers deposit in IRs (the green way). However, only 15% of articles are being voluntarily submitted. Gold relies on publishers, whereas Green only requires the research community. USouthampton has created EPrints (which Harnad strongly recommends over DSpace).
Creating IRs is a necessary but not sufficient condition for creating 100% OA. Many repositories, but most are almost empty. Incentives are not sufficient to increase self-archiving. To guarantee 100% self-archiving, must make it an administrative requirement. USouthampton ECS repository virtually 100%. Why?
Publishing is mandated already (publish or perish), self-archiving mandate can be a natural extension. Surveys indicate 95% or researchers would comply, more than 80% willingly. Only those IRs with mandated deposit achieve any where near 100% self-archiving. There are currently 98 institutions world wide with Green mandated deposits. That’s out of over 10,000 institutions. See ROARMAP. There are 57 university mandates so far. There are 41 research funder mandates. In Canada, only one departmental mandate, 8 funder mandates, one funder proposed (NSERC).
OA articles accelerate the research/access/use/citation cycle: OA articles are cited sooner. Time-course of citation/use cycle shows more citations means more downloads. Higher early downloads means correlate with high citation rates later.
Mandates should be to
– deposit all articles
– in an IR
– immediately upon acceptance for publication; a compromise is the “immediate deposit – delayed access” mandate
63% of journals endorse immediate, Green OA self-archiving. For the remaining 37%, EPrints has an EPrint Request button. Any user on the web can still reach the metadata, but click “request a copy”, then send an email form that indicates the article is needed for research purposes. Email goes to the author, who can then click “OK” thereby sending a copy to the requestor.
EPrints has rich use metrics. Integrates with CiteBase. One of the rewards of self archiving mandates: authors are often interested in vanity searches. Also important in evaluating impact etc.
Posted by pzed on October 1, 2009 at 10.17pm
