Open-Source Endeca in 250 Lines or Less, Casey Durfee
2 Mar 07
Open-Source Endeca in 250 Lines or Less
Casey Durfee
Seattle Public Library
#bugs ~ lines of code ^ 1.5
demo: http://catalog.spl.org/catalog/
code: http://extranet.spl.org/code/code4lib2007.zip
presentation: http://extranet.spl.org/talks/open_source_endeca/
Solr shortcuts
– results in Python format
– no database
– lucene search syntax
Django
– faster than Rails, can handle concurrent users
– forces you to do things the right way; forces split between coding and design
– object-oriented templates
goal to keep the URL as simple as possible
– no bizarre numeric codes
Solr performance tricks
– <optimize/>
– huge filterCache, very important for faceting; roughly equal to number of bib records in database
– some facets are faster than others; need to warm facets: query all records and do facets on every field (facetwarmer.py run every 10 minutes)
Posted by pzed on March 2, 2007 at 11.47am
Intellectual Property Disclosure Process, Michael Doran
2 Mar 07
The Intellectual Property Disclosure Process: Releasing Open Source Software in Academia
Michael Doran
University of Texas, Arlington
developers working for institutions likely do not own the copyright on their software, and therefore don’t have the right to release the software under open source
university will have IP office
– intellectual property disclosure process
does this IP stuff apply to me? – very likely yes
does it apply to this particular software?
– use common sense or ask
– “easier to get forgiveness than permission” will only work once, if that
a couple cautionary tales: in the first, the software is released and then (oops) the IP disclosure process is undertaken retrospectively
in the second, the UTA IP committee decides not to allow the free release (no right of appeal), decision is supported by provost; initially attempt to license back to ILS vendor, who don’t want it; then try to sell the software to other institutions directly; MD then pleads with the provost to allow him to address the committee again, and is able to explain open source in a way that they understand and the software is finally released
advice: have a plan
– find out about the process beforehand
– understand the committee’s viewpoint, generally patent/profit oriented, can’t assume they know what open source is
– work out a strategy
– be clear about what you want and why
– add context
under questioning, Doran also suggested the inclusion of a “poison pill”, which I assume would be to include some code released under GPL or better yet CC non-commercial
Posted by pzed on March 2, 2007 at 11.06am
LibraryFind, Terry Reese
2 Mar 07
LibraryFind
Terry Reese
Oregon State University Libraries
hybrid federated search service
Ruby on Rails application
– Rails only works with one request at a time
– in a metasearch environment, this is a problem
– clustering solution using Mongrel
– Ruby’s built in XML support not particularly good
LibraryFind
– metasearch tool: items harvested and federated (wanted to include access to IRs – built under the assumption that federated search isn’t a long-term solution, but that eventually everything will be harvested)
– openURL resolver/server
– web service
LF is a component of OSU’s vision of a library as platform
unique metasearch tool
– integrated openURL resolution
– both harvester/indexer (OAI and MARC repositories; 65 databases) and federated search too
– metadata-based knowledge base, using abstract connection classes (e.g. can create one generic connector for all Z39.50 sites, another for all web service sites, another for OAI sites, etc)
– as a result, adding new resources and sharing the knowledge base with other institutions is possible
caching
– daily cache of all searches held for 3 days
– top 15% of searches then stored to a permanent cache; considerably faster!
three search types
– general
– images (repositories)
– books and more (includes local catalogue)
To do
– scaling; current production version supports 5 simultaneous connections
– harvesting potentially 100s of millions of records; foresee as many as 20 trillion (!?!)
– umlaut integration
– opensearch
– json
– coins
– improved installation (shooting for WordPress 5 minute install)
no authentication needed to search the tool, only to connect to resources; so far, haven’t talked to any vendors who have a problem with that
Posted by pzed on March 2, 2007 at 10.46am
Library-in-a-Box, Bess Sadler
1 Mar 07
Library-in-a-Box
Bess Sadler
eIFL-FOSS
eIFL’s mission is to lower barriers to information in the developing world
– do not give out money
– negotiate with publishers
– library consortium building
– open access advocacy
– IP subgroup – represent developing countries at WIPO
– knowledge sharing
– OSS for libraries
Library-in-a-box
– follows the Tactical Technology Collective NGO-in-a-box model
– easily distributed, easily installed ILS
– need to build a community before you write a line of code
– sustainable model, implementing libraries rely on each other for support (source-camp model)
– leap-frog: better to create a high-quality, next-gen product
Benefits for eIFL libraries
– financial savings obvious; also keeping local currency local
– building in-house, in-country expertise
– for many, it’s their only option
Current activities
– business plan (Open Business Readiness Rating)
– recruiting country coordinators
– appliying for funding, building partnerships/community
– choosing pilot sites
– later this year contracting for software development
will be returned to open source community, conceivably of benefit to global community; not aiming to be best of breed, but may contribute to that development
Posted by pzed on March 1, 2007 at 3.11pm
Atom Publishing Protocol Primer, Ed Summers
1 Mar 07
Atom Publishing Protocol Primer
Ed Summers
Library of Congress
What is Atom, and why you might be interested
Karen Schneider said people don’t care about standards; lots of standards propagation within libraries, tend not to look outside
– most think of Atom as a syndication feed
– looking at elements, can see metadata markup (title, date stamp, entries with titles, authors, etc)
Representational State Transfer (REST)
– application state and functionality are divided
– resources are uniquely addressable (uri)
– uniform interface for the transfer of state between client and resource (http)
Atom + REST = Atom Publishing Protocol (APP)
Posted by pzed on March 1, 2007 at 2.51pm
Fun with ZeroConfMetaOpenSearch, Dan Chudnov
1 Mar 07
Fun with ZeroConfMetaOpenSearch
Dan Chudnov
Yale Center for Medical Informatics
Why doesn’t the library work like iTunes?
ZeroConf
– networking with no configuration needed
iTunes uses Apple developed daap thingy built on zeroconf to power iShare
DC created in-house net called Library Clique and showed as folks around the room connected and began uploading shared data (viewed through iStumbler) – anyone offering any services over zeroconf shows up; shared music in iTunes, web servers, ftp, http, other protocols, printers
in Libraries?
– Open URL COinS requires an install; nothing else comparable
OpenMetaSearch
– add OpenSearch to metasearch
– put your library in their browser
– what if metasearch and open url used same interface?
search and resolve aren’t that different – user won’t notice the difference!
so – ZeroConfOpenMetaSearch
– having merged in metasearch and added opensearch, advertise the interface using zeroconf AND find users’ resolvers using zeroconf: when somebody visits your network, they will immediately find your search and openURL resolver interfaces
– everyone who visits you finds your search interface
– everyone you visit finds your resolver
– no installation required
Posted by pzed on March 1, 2007 at 2.28pm
Lightning Talks 2
1 Mar 07
[I skipped Lightning Talks 1 yesterday to meet up with J. . .]
All Lightning Talks are listed here
Karen Coombs: CMS on Steroids
Homegrown CMS built at U of Houston Libraries
– anybody can edit any page
– different kinds of content types on one page
– each page built out of modules (rss feeds, staff, databases, etc)
– easy to create portal-like pages for subject areas
– need to expand existing modules, expand the UI
– display order can be changed, would like to add position on the page
Aaron Krowne, Emory University: SouthComb
– meta-collection/DL/portal (harvesting)
– collection dev. rather than just presentation
– focus on Southern studies
– OAI harvesting to gather data
– also focused web crawling and library catalogs (only Emory so far)
– system consists of repository core, coll dev layer, admin layer
– portal environment either Java-based or custom (Rails)
– connect to data as web services (SOA)
– repository in Fedora
– content model, Fedora putting together CMDA spec
– use DBMS for some aspects
– coll dev curation tasks in OXF framework (OAI/OCKHAM Xforms)
– standardized XML configs and modularization
– presentation will be online
Posted by pzed on March 1, 2007 at 12.59pm
Get Groovy at Your Public Library, Amy Begg De Groff and Luis Salazar
1 Mar 07
Get Groovy at Your Public Library
Amy Begg De Groff and Luis Salazar
Howard County Library, Maryland
http://www.hclibrary.org/
Linux for public access computers one of the best things they’ve ever done
300 public computers
used Groovix flavour of Linux (built on Ubuntu)
early iteration in Lumix
– no longer need to be a geek to get it to work
browser-based
– no login needed
– rewrite firefox history etc. on close and reopen
– open documents in Open Office; originally no support for authoring own docs
– no time limits for users
q. regarding patron expectations for MS software
– Groovix runs Open Office 2.0; installed update overnight and patrons just came in and started using
– LS couldn’t help pointing out how much he hates MS for what they’ve done to computer science and the internet
– did about 6 months training for info desk people regarding OSS and Linux, but very few problems arise
Posted by pzed on March 1, 2007 at 12.41pm
Obstacles to Agility, Joan Starr
1 Mar 07
Obstacles to Agility
Joan Starr
California Digital Library
“Agile”: see Agile Manifesto, 2001
What keeps us from agility?
– academic culture itself
– project management practices
– institutional hiring practices
– our own work practices
academic culture
– “non-participatory democracy”; don’t want to be involved, but want to say no at the end
– not representative; always want to be able to say no themselves
– difficult to identify a single subject-matter expert willing to represent a large base
– end up with large committees of members often representing other committees
project management
– typically do not have a single individual or committee with ownership
– major decisions often require input from very diffuse structure
– projects can drag on for years, sucking life. . .
– no one to adjuticate conflicts, meetings are widely spaced, lots of waiting, time passing
– solutions obsolete by the time they are delivered
hiring processes
– very slow
– bad fit for tech market
– some shops don’t have admin control over own openings
– grant-funded staff can’t be moved around
– project teams have difficulty getting help when and where needed, more time passes. . .
our own practices
– programmers value space and privacy over team work
– don’t want to share work in progress
– aren’t practised in time estimation
– don’t really measure progress
– lose time coordinating and communicating
– aim for perfection before getting feedback
– may not be working on what users need/expect
start small. . . with a new project. . . and introduce it as a pilot
Posted by pzed on March 1, 2007 at 12.15pm
The BibApp, Eric Larson and Nate Vack
1 Mar 07
The BibApp
Eric Larson and Nate Vack
Wendt Library, UW-Madison
Wendt serves College of Engineering
also Office of Scholarly Communication and Publishing
– ad hoc assistance
– digital publishing
– copyright assistance
– fund for open access publishing
libraries don’t know a lot about where people are publishing
– capture campus bibl for UW faculty pubs
– Eric is liaison for engineering, can use that
– capture citations talking to faculty, through refworks, through IEEE etc
– create a BibApp that creates an identity for each faculty member, lists their pubs, archives (eventually)
– also adds their identities to one or more groups (dept, institute, etc)
. . . lovely demo. . .
– screen cast will be mounted on C4L page
– added a faculty member from UIUC from their LDAP server
– searched his work in Engineering Village
– export/import citations (does dedupe with bugs)
pilot project; not live, not even close
– tag clouds
– popular journals
– popular publishers
– “find an expert” : this is a feature that actually works; draws data from faculty member’s actual publications
– at individual level, can also see co-authors, journals, publishers, and specific citations
– rss feeds for people, groups
– funky keyword timeline tool: slider to scan through evolving keywords attached to a group by year
– visualization tool showing connections between co-authors
challenges
– sherpa data not ideal
– author name collisions
http://code.google.com/p/bibapp/
more stable by August, 2007
Posted by pzed on March 1, 2007 at 11.58am
