words what do you read, m'lord?

 

code4lib 2007 category archive

Obstacles to Agility, Joan Starr

Obstacles to Agility
Joan Starr
California Digital Library

“Agile”: see Agile Manifesto, 2001

What keeps us from agility?
– academic culture itself
– project management practices
– institutional hiring practices
– our own work practices

academic culture
– “non-participatory democracy”; don’t want to be involved, but want to say no at the end
– not representative; always want to be able to say no themselves
– difficult to identify a single subject-matter expert willing to represent a large base
– end up with large committees of members often representing other committees

project management
– typically do not have a single individual or committee with ownership
– major decisions often require input from very diffuse structure
– projects can drag on for years, sucking life. . .
– no one to adjuticate conflicts, meetings are widely spaced, lots of waiting, time passing
– solutions obsolete by the time they are delivered

hiring processes
– very slow
– bad fit for tech market
– some shops don’t have admin control over own openings
– grant-funded staff can’t be moved around
– project teams have difficulty getting help when and where needed, more time passes. . .

our own practices
– programmers value space and privacy over team work
– don’t want to share work in progress
– aren’t practised in time estimation
– don’t really measure progress
– lose time coordinating and communicating
– aim for perfection before getting feedback
– may not be working on what users need/expect

start small. . . with a new project. . . and introduce it as a pilot

Posted by pzed on March 1, 2007 at 12.15pm

The BibApp, Eric Larson and Nate Vack

The BibApp
Eric Larson and Nate Vack
Wendt Library, UW-Madison

Wendt serves College of Engineering
also Office of Scholarly Communication and Publishing
– ad hoc assistance
– digital publishing
– copyright assistance
– fund for open access publishing

libraries don’t know a lot about where people are publishing
– capture campus bibl for UW faculty pubs
– Eric is liaison for engineering, can use that
– capture citations talking to faculty, through refworks, through IEEE etc
– create a BibApp that creates an identity for each faculty member, lists their pubs, archives (eventually)
– also adds their identities to one or more groups (dept, institute, etc)

. . . lovely demo. . . 
– screen cast will be mounted on C4L page
– added a faculty member from UIUC from their LDAP server
– searched his work in Engineering Village
– export/import citations (does dedupe with bugs)

pilot project; not live, not even close
– tag clouds
– popular journals
– popular publishers
– “find an expert” : this is a feature that actually works; draws data from faculty member’s actual publications
– at individual level, can also see co-authors, journals, publishers, and specific citations
– rss feeds for people, groups
– funky keyword timeline tool: slider to scan through evolving keywords attached to a group by year
– visualization tool showing connections between co-authors

challenges
sherpa data not ideal
– author name collisions

http://code.google.com/p/bibapp/

more stable by August, 2007

Posted by pzed on March 1, 2007 at 11.58am

code4lib: Erik Hatcher Keynote

“When I woke up this morning, I heard a disturbin’ sound!”
– jingle-jangle of a thousand lost books. . . .

EH is a card-carrying library geek – love’s books

Recommends Ambient Findability
– you can’t use what you can’t find

Timeline
Lucene in Action
Rossetti Archive Search
Collex
Windsor Lucene Summit
eIFL-FOSS
Solr Flare

Solr Flare
– all about sets; venn diagrams and data visualization
– tags/keywords create custom sets

facets: to show the entire universe (or subset thereof, same diff) of objects divided by attributes

Erik showed, live, a few examples of how totally Solr Flare rocks
– chinese collection from UVa
– his personal collection, scanned using Delicious Library
– pledges by the end of the day to write an import filter for iTunes library

Future is bright
– still proof of concept
– availability needs to be incorporated
– saved searches has been added at preconf
– facet visualizations of various types need developing (clouds, sortable lists, maps, timelines, set diagrams. . .)

Posted by pzed on March 1, 2007 at 11.06am

The XQuery Exposé, Kevin Clarke

The XQuery Exposé:Practical Experiences from a Digital Library
Kevin Clarke
Princeton University
http://diglib.princeton.edu/

XQuery
– an XML Query Language
– SQL of the XML world
– integrates XML data from a variety of sources
– focus on data, not dbs; higher level than SQL
– does not include fulltext searching or update
– not a full-fledged programming language
– currently working through Lucene; SOLR very enticing

XQuery is a functional language built entirely on expressions
uses native XML types and XPath
can be “loosely typed”, e.g. can omit declaration of data types
– Elsevier has done some work on code conventions

let 10,000 FLWORs bloom!

primarily an image-based site
some search, but mostly browsing

Posted by pzed on February 28, 2007 at 2.47pm

Forget the Lipstick, Fabien Tiburce, Peter Giansante, Beth Jefferson

Forget the Lipstick: This Pig Just Needs Social Skills
Fabien Tiburce, Peter Giansante, and Beth Jefferson
BiblioCommons

Beth Jefferson
extensive user research
BC, AB, ON provided proof of concept funding
today’s focus is on architecture and user experience
OPAC user’s not complaining, just going elsewhere
nextgen catalogue more than tech
social search is key
– discovery, relevance, connections to community and other users

Fabien Tiburce
tech – ajax widgets, rdbms mixed with xml repository
– object factories to aggregae representations of data
– decouple data in transit from data in storage
– rdbms to preserve most important data

Peter Giansante
data model

Beth Jefferson
personalized relevance (ratings, reviews, demographic filtering)
building trust in the social environment, adding other users to a network optionally based on a domain of expertise

Fabien Tiburce
social data coupled with the user
– user preference subsystem
– user generated data associated with bib record and other users

Peter Giansante
user prefs as part of data model

Beth Jefferson
thinking outside the box of the library
want to get away from “sorry, no match”
– opportunities to use ILL in local interface
– suggest to purchase
– suggest share from other user’s personal collection
– refer to community generated interests (groups, discussions, etc.)
engaging users to take us halfway
– use library metadata to structure discussions
– can then put community resources, events, etc. on the right pages
when a question is asked, discover who is likely to have the answer and present it to them
finally, offer an online answer service
– people in the OPAC
– users connecting with other users realtime in OPAC

Peter Giansante
more data model

Beth Jefferson
need critical mass of users
data must tie back to them
must engage a significant percentage of users

maximize breadth of implementation
– in the flow of existing activity
– approaching costless
– provide motivations to contribute

Fabien Tiburce
ILS integration with low footprint web services
will be open source
data flow diagram!

whew!

Posted by pzed on February 28, 2007 at 2.25pm

Library Data APIs Abound, Richard Wallis

Library Data APIs Abound!
Richard Wallis
Talis
http://www.talis.com/home/

libraries are used to obscure protocols, starting to use less obscure
all have an insular view of the world
– I have data, and you can have it if you use the right protocol
– mixing that data with other data is your problem

Talis platform uses Bigfoot
– easily queried
– can query bib data, pass search to augment search with holdings data, pass again to augment with deep linking to OPAC content and return results with links
– each “store” (bib, holdings, book jackets, etc.) has an API that can be search or used to augment
– can also augment XML data from other sources by transforming into a stream that can talk to API augmentation

Bigfoot APIs
– Items
– Augment
– Facet
– OAI-PMH (coming)
– Config (coming)
– On-demand stores (coming)
– transform service (built in to item query, coming for others)

Possibilities
– transform output to WordPress compatible output

c.f. the 20 minute union catalogue

Posted by pzed on February 28, 2007 at 2.10pm

Smart Subjects, Tito Sierra

Smart Subjects: Application Independent Subject Recommendations
Tito Sierra
NCSU Libraries

Input user search query and spit out a list of related library subjects
based on search log analysis done as part of QuickSearch development
– found lots of topical subject queries

NCSU also provides browsable subject portal
– locally developed classification, approx. 100 subject nodes in 12 top-level catagories
– nodes influenced by local curriculum
– subject specialist mapped resources to the subject page

also based on OpenSearch

started with data from course catalogues to gather terms for mapping; also used text snippets from papers in faculty publications archive
– text extracts used to create search indexes
– run keyword search on the index; return, rank, dedupe results
– crosswalk to classification map

strengths
– application and collection independent
– subject recommendations can be integrated into any library search application
– broader, serendipitous resource discovery

weaknesses
– false positives (bad recommendations)
– zero hits

future plans
– database advisor
– increase size of subject indices using article tables of contents and backlog of course descriptions
– guage interest in a less-specialized release to the community

http://www.lib.ncsu.edu/dli/projects/smartsubjects

Posted by pzed on February 28, 2007 at 1.00pm

On the Herding of Cats, Mike Rylander

On the Herding of Cats
Mike Rylander
Georgia Public Library System

General overview of planning, etc.

Posted by pzed on February 28, 2007 at 12.40pm

Free the Data, Emily Lynema

Free the Data: Creating a Web Services Interface to the Online Catalog
Emily Lynema
NCSU Libraries

Architecture – front end web application to an Endeca back end; considered advantageous: a “freedom” feature that allowed the creation of an XML interface to the catalog

began as a web service to speak to other systems like WorldCat and return availability results

REST API for querying catalog
– looking for RSS feeds, particularly new books
– integrating catalogue results into website QuickSearch

. . . alphabet soup. . . 

wanted search results and facets

OpenSearch add-on; c.f. A9 aggregator

also had to integrate with NCSU’s QuickSearch product

facet data
– used OpenSearch query role="subset" custom:facet="4617264627"
– using numbers for facet values makes for cleaner URLs
– do aggregators provide support for this query role? not sure
– multiple elements would slow down results

http://www.lib.ncsu.edu/endeca
http://www.lib.ncsu.edu/catalog/ws/

Posted by pzed on February 28, 2007 at 12.19pm

MyResearch Portal, Andrew Nagy

MyResearch Portal: An XML based Catalog-Independent OPAC
Andrew Nagy
Villanova University

completely ILS agnostic portal for research activities
– catalog, DBSs, digital library
– single search interface

most resources are in XML
– digital library: METS
– Metalib XServer: XML
– catalog: MARCXML
– web site: XHTML

data store advantages in XML
– native XML stores easily, easily entered
– no need for RDBMS

eXist
– open source XML dbs
– full-text searching available
– platform independent: java backend; API through REST or SOAP
– inherent directory structure
– LDAP support
– however, not really meant for a library search type of system

Berkeley DB XML
– proven
– wide range of platforms
– good performance
– commercial backing, decent help
– no full text extenstions, no inherent directories

commercial solutions? – too expensive, more complex

implementation challenges
– eXist innappropriate
– DB XML capable, but slow, even with some reconfigurationm
– MARCXML very complex, lots of unhelpful data, field names actually in attributes

moving MARC field number into tag improved response times considerably

technologies don’t tend to understand the complexity of library data
queries not well optimized, if at all
– needed to develop very basic query processing

SOLR/Lucene looks like the answer

Posted by pzed on February 28, 2007 at 12.03pm