words we are all imprisoned by the dictionary

 

« | main | »

National Q&A with John Wilbanks on Digital Repositories and the Digital Commons

brought to you by CARL

any suggestions for libraries for demonstration projects?
– most common place to start is a thesis/dissertation IR; above that, more generally for scholarly works by faculty
– beyond that, far less standardized: storing data for faculty, e-science projects
– where are you at as an institution, do you have the resources in the library to support more ambitious efforts?
– danger is “if we build it, they will come” doesn’t work (ref Dorothea Salo: formerly here, now there)
– look at the software tools and staff resources, then look at faculty interests on campus

what are some ways in which rapid change can be encouraged?
– rapid, transformative change almost always comes from outside the organization
– simply let people use technology in unexpected ways
– can lower the resistance by providing the rights and the infrastructure to do interesting stuff
– training for librarians: semantic web camps, data linking camps etc – standards are still being built, aren’t many experts; in the short term, simply being a part of this conversation is a step
– pay attention, get training, build collaborations with faculty: change will come from users, not institutions

what might be strategies for realigning fiscal imperatives?
– open question
– institutions and funders are part of the same system
– can begin to make change by looking at past successes; e.g. of genomics and proteomics: funders have created a standardized data library through NLM, requirements that publications reflect data in same standardized ways
– when we fund the creation of data, it needs to live somewhere on the internet, with standard identifiers, in ways that others can access it
– yet to be decided who will do this, and how; but important role for libraries to provide storage facilities, naming standards

with respect to involving faculty, how can librarians most effectively bring faculty onside?
– faculty aren’t standardized
– try to find people who already agree with us, rather than expend the energy to convince people we’re right
– build understanding of what their role is, what they can achieve by working with librarians, will be different how to do this with different faculty

are IRs generally under the library?
– yes, don’t know of cases where it’s not

what are your hopes for Open Access Week?
– good way to get general knowledge built
– outside the open access movement, most people are completely unaware
– students get involved, librarians reach out to faculty, we’re all in this together: broader understanding that we can change in the way we communicate knowledge; get the message out beyond the niche

genome project mechanisms?
– big, international project
– data went into NLM site, by default entered public domain legally; raw facts, not creative works, so from a publication perspective they were public domain from the outset
– in practice, the process was slower, many of the sequencing labs held on to their data waiting for publication: US government met with the scientists and developed a process to drop all sequencing data into the website within 24 hours, but retain the right to first publication on the sequences they had included
– also required all publication on genomics to reference database identifiers to sequences being used: became habitual that everyone simply started depositing all their sequencing in the database, and open data access became normal: a set of interlocking initiatives that created the ecosystem for open data

how might librarians acquire necessary skills?
– technologies are relatively difficult to pick up, but there are semantic web conferences, get-togethers, mailing lists: lots of chatter on the web where standards are being developed
– best way to get involved is volunteer and get going
– begin looking at your library’s metadata and see whether there are pieces that can be exposed, make your stuff usable and findable

do other institutions play a similar role to libraries?
– libraries are important infrastructure to the data network: stability over the long term is difficult for institutions to achieve, libraries/universities tend to stick around longer.
– libraries providing storage infrastructure, metadata
– researchers must see a value in marking up their data, in reaching out to the library
– lastly, companies will come to see these open data sources as resources for creating value-added services that might be profitable
– libraries need to create the most usable, open layer; think of your content as a web 2.0 platform on which others can build applications

interoperability best practices and norms?
– differ widely, difficult to make a general statement
– even within the life sciences, very different between those who deal with people and those who don’t (privacy, etc.)
– need data that can be reused, funders should require reusable data
– universities need to look beyond the impact factor of journals towards the accessibility of data/informaton when evaluating faculty
– creators need to be rewarded for interoperable and reusable data

how to manage conflict between collaboration and competition?
– some people just like to share, others don’t
– by promoting to people who do like to share, over time universities who provide sharing tools will out-compete universities who don’t provide that platform
– expects a new set of impact factors that will reward scientists whose data leads to more data and more publications
– institutions who are ready with that platform will be in a position to recruit the best faculty: your job is not just to harvest data for publication, but to promote reuse in a web context, and we can support that better

are open access economic models useful for open data?
– big differences
– legally, it’s universally accepted that copyright applies to journal articles, copyright is comparatively harmonized internationally re articles; don’t have that with data: public domain in US, in UK there’s a “sweat of the brow” protection, crown copyright applies to govt data in commonwealth
– economically, without the harmonious legal framework, there are disincentives to overcome the issues of data accessibility: data needs to be processed to be understood, so it’s not just about publishing and dealing with copyrights, there’s an enormous cost to formatting data for human or machine reading
– no established industry of data promulgation: no entrenched industry fighting change, but also no established ways of doing things; have to bootstrap a data publishing system: registration, certification, dissemination, preservation
– no peer review system for certification, difficult to disseminate, how do we decided what to preserve and for how long….
– add additional layers specific to data, and it becomes really hard to envision open data

how to get large, commercial non-profit journals to switch to open access?
– easiest way to make this transition is for journal to allow authors to retain copyright
– economic issue is bigger, because commercial non-profit journals rely on subscription fees
– education around support fees, etc.
– a lot depends on the sector, the journal, the finances
– key is to remember that it’s usually an economic rather than a legal problem

some of the more successful approaches to encouraging researchers to follow good data management principles?
– astronomy; open, standardized system where experts and non-experts can post and share data
– incredible number of standards developed by the discipline
– model we should be looking to
– the people who cared about sharing data put in the time and effort to make their data open and interoperable
– researchers are motivated by solving problems they have: must work to create a culture in which the solution to your problem is an open solution
– incentive is to publish; must develop an infrastructure that makes shared data usable, useful, and valued; there is no scholarly punishment for bad data management

[missed one question, but the gist was that the web becomes the infrastructure for humanities and social science applications as well
– the expertise will get paid for in the sciences before it gets into the humanities

how to address learning curve for researchers?
– starting to have more skills, rapid web prototyping changing the game
– outreach and training should be the universities job: wouldn’t it be nice if everytime some did powerpoint, instead they could run a simple demo program?
– waiting: generational changes
– 15 years ago writing web pages was an arcane exercise
– today, prototyping systems make it easier for people to write code that acts on data

neurocommons project?
– initial focus in neuroscience, but it has forked into these three things:
1. a distribution of integrated, public domain data which will soon have a name, becomes the nucleus for distributing data in a package model similar to linux distros
2. standards development effort: web ontology not far enough along
3. problem with persistent web identifiers: proposed semantically empty URLs that are persistent and allow mapping of names, binding them under unique identifiers; shared names can’t be taken private or changed, the community owns the names and the people who use the names are in charge of them
– future goals: RDF distribution adopted as standard communication protocol for data
– need to get more people involved and engaged
– really want shared name thing to become equivalent to DNS for data, prefer to have discussions around that in the community rather than just in Science Commons

Posted by pzed on October 19, 2009 at 1.23pm
Categories: digital initiatives, libraries

Post a comment