Using Open Source Social Software as Digital Library Interface

Search | Back Issues | Author Index | Title Index | Contents

D-Lib Magazine
March/April 2008

Volume 14 Number 3/4

ISSN 1082-9873

Using Open Source Social Software as Digital Library Interface

Erik Mitchell
IT Development Librarian
Z. Smith Reynolds Library
Wake Forest University
<mitcheet@wfu.edu>

Kevin Gilbertson
Web and Digital Projects Librarian
Z. Smith Reynolds Library
Wake Forest University
<gilberkm@wfu.edu>

	Abstract This article investigates the use of social software applications in digital library environments. It examines the use of blogging software as an interface to digital library content stored in a separate repository. The article begins with a definition of digital library approaches and features, examines ways in which open source and social software applications can serve to fill digital library roles, and presents a case study of the use of blogging software as a public interface to a project called Digital Forsyth, a grant-funded project involving three institutions in Forsyth County, NC. The article concludes with a review of positive and negative outcomes from this approach and makes recommendations for further research. Introduction A significant portion of digital library literature focuses on issues such as document/technology heterogeneity and the relationship of users and communities in digital libraries [Borgman, 1996; Lagoze, Krafft, & Payette, 2005; Renda & Straccia, 2005)]. While applications such as MediaWiki and WordPress have been built primarily to support specific types of Web-publishing, they also include a number of structures common to the digital library community, including storage, preservation, and access models. For this reason, MediaWiki and WordPress can be investigated as acceptable platforms for digital library applications. Some library applications have already been developed using these platforms including Scriblio¹ from Plymouth State University [Plymouth state university library, 2007)], and The Polar Bear Expedition digital collections [(Polar bear expedition digital collections, 2007)]. Both of these sites demonstrate the benefits of the incorporation of user services into digital library software. There are a number of possible approaches to using social software in digital library environments. Downloadable applications such as MediaWiki or WordPress blogging software lend themselves to data and interface customization. Other sites, such as Flickr, support data storage and application programming interface functions that could be used to create a digital library application. Through working with WordPress in the digital library application presented in this article, a number of challenges to using social software were discovered including: The ability/inability to crosswalk data between the social software application and the primary repository A lack of emphasis on metadata by default A lack of searching algorithms by default The remainder of this article presents how a specific social software application (WordPress) was used to implement a public interface to a specific digital library. The article looks at the advantages and disadvantages of this approach as well as what its future potential there may be. Case Study Presentation Overview Digital Forsyth² is a grant-funded digital library project involving the Forsyth County Public Library, Winston Salem State University, and Wake Forest University. The project is supported by grant funds from the Institute of Museum and Library Services³ under the provisions of the federal Library Services and Technology Act (LSTA), administered as a multi-year grant by the State Library of North Carolina,⁴ a division of the Department of Cultural Resources. The purpose of the grant is to create a digital repository for the historic photographs and collections of Forsyth County, NC institutions. In the first two years, a primary focus of this project has been the digitization and collective categorization of photographs from the archives of each of the institutions. This process has involved the creation of a digital library using Qualified Dublin Core (QDC) as the metadata standard, with extensive description using a faceted categorization system. Each photograph is described with QDC elements and has faceted analysis completed on the elements and context of the photograph. On average, each photograph has 11 faceted descriptors. The faceted taxonomy is developed from the 'bottom-up' using the photographs and archival information and is managed using Protégé,⁵ an open source ontology management system developed by Stanford University. As the Digital Forsyth project evolved, community participation emerged as one of the goals of the digital library. The resulting definition of the user interface included goals of community participation in the content, description, and organization of this digital library. While the back-end of the digital library employs a commercial digital library system that enforces authority control and manages metadata well, it did not include a public interface that met the community goals of this project. For this reason, we began looking at alternative platforms that would both allow us to present and manage our data easily and provide a robust community experience. In short, the basic functional requirements of the software were to have the ability to: Store and display QDC metadata Store and display hierarchical facets Allow user-contributed content, including comments, description, and tags Permit the import and export of data in bulk Be customized easily both for user-interface and data management purposes The project team considered several social software or Web-publishing applications including MediaWiki, WordPress, Flamenco, and Flickr. We chose blogging software as a platform because it fills many of the needs of the project, including photo representation, metadata support through use of special fields, category support, and embedded social tools. While the idea of hosted blogging software was discussed, we felt that a local solution was required in order to achieve the level of customization and data access needed. We ultimately chose WordPress due to our prior experience with the software, which is hosted by one of the project partners for its own user community. The next section includes a discussion of some of the issues and solutions that we encountered. Solutions Using WordPress as a digital library application required the Digital Forsyth team to solve a number of problems related to metadata management, data migration, and user interface management. Modifying WordPress for metadata representation In order to work as a digital library interface, WordPress needed to support some level of detail for QDC metadata and provide the capacity to use our hierarchical facets as tags. To achieve this, we utilized custom fields in WordPress, post categories, and manipulation of the WordPress searching and browsing functions. One of the strengths of WordPress is its support for custom fields that can be used to satisfy rudimentary metadata. We were able to configure these fields to match the Qualified Dublin Core metadata standard and move much of the content out of the blog posting text to these custom fields. In addition, we were able to load the Protégé-based taxonomy of the faceted categorization system into WordPress categories. This allowed us to preserve the relationships of the taxonomy while making terms easily usable in the blog interface. For loading blog posts using both custom fields and categories, we modified the blog loader scripts to accommodate the new fields. We were able to use internal Protégé IDs to match on category assignments in the metadata records themselves, enabling permanent URLs to be created to point to different sections of the taxonomy. While this provides a unique, customizable link, it does not give the interface 'clean' URLs that would be easily remembered by a user. The use of clean URLs is a popular feature in many social software platforms including Flickr, Del.icio.us and MediaWiki, and allows a user to navigate to different parts of the site using common words rather than cryptic strings. In order to enable more granular searching of these new fields and categories, we used a pre-developed WordPress searching plug-in called AdvancedSearch. This gave us a suite of defined functions that supported category and custom field limits, which allowed the users of the interface to limit their search/browse experience by institution, date, or other metadata fields. While the plug-in was modified to support local needs, the existence of the library allowed us to quickly develop search and limiting functionality. While in some cases, we altered specific components of the plug-ins, we avoided making modifications to the core WordPress software. This has allowed us to patch the software several times without negatively impacting local customizations. While future upgrades (particularly the implementation of WordPress 2.3) will require an evaluation of local modifications, the use of locally written plug-ins and minor modifications to third party plug-ins is expected to minimize development required for upgrades. Metadata issues While initially we had standardized on Qualified Dublin Core, moving forward with WordPress required us to make two substantial modifications, one to the metadata itself and the other to the WordPress functionality. First, although the libraries had followed the recommended QDC date format wherever possible (but had created non-standard dates when days/months/years were not known specifically), WordPress required a well-formed date value. Because of this, we had to zero-fill any un-specified values. This allowed WordPress to make decisions and appropriately place non-specific dated posts. The second major issue we faced occurred due to the size of some areas of the taxonomy. The standard functionality of WordPress categories includes a retrieval of all the related child categories when a user is interacting with the category browse interface. While this is preferable in some cases, database response times for our project were leading to time-out issues whenever category values exceeded several hundred. In order to solve this issue, the category retrieval functions had to be re-written to make this process more efficient and to pull child values only when they were needed. WordPress allows local implementations to make modifications like this using a local functions file that is related to specific interface themes. While this will require us to re-examine these functions with each WordPress upgrade, it also allows us to continue expanding the use of the taxonomy to create new relationships between the images in our collection. In general, we used the following four approaches to modify WordPress in the Digital Forsyth project: First, the implementers downloaded pre-developed plug-ins. These plug-ins allowed us to quickly extend or modify the functionality of WordPress without significant effort. Second, we modified existing functions to fit our local needs. While using pre-defined functions makes these modifications much simpler to do, they can also affect the system when upgrades are made to WordPress. Third, in some cases we chose to modify the core functionality of WordPress. This was required in special cases where the WordPress default behavior was not in sync with our expectations for Digital Forsyth. A simple example of this is the fact that blogging software by default returns posts in reverse chronological order. However, for our project we wanted posts to be returned in increasing chronological order; therefore, in such cases we modified basic WordPress functions accordingly. Fourth, we created new local functions. While this approach was not used extensively, in the case of category retrieval new functions were required to make data retrieval more efficient. This final approach is probably the most susceptible to issues related to system upgrades. As the Digital Forsyth project moves into its third year, we are looking toward merging the library cataloger experience with public feedback and tagging experience. This may allow us to stop using a back-end digital library for record creation and simply use it for archiving, which will significantly simplify the processes of data creation, loading, user-tagging, and updating currently in place. Creating a user interface using themes We were able to use the theme platform in WordPress to easily create a custom look and feel for our digital library. Two of the major modifications required were modifying the theme to display hierarchical tags more prominently and including our custom metadata fields at the correct place in the blog posting. Other modifications included style and layout customization. Some high level uses and benefits of themes are: Enabling multiple user/organization centric views of the data Enabling rapid prototyping of interfaces during initial development Using splash pages to provide deep links to searching and browsing interfaces. In general, WordPress themes provide a simple platform for creating user interfaces, and involve approximately 8-10 files. Themes can be based on templates and specific pages where required. In some cases, we found ourselves deciding whether to use external plug-ins for themes or to develop them locally. An example of this is the development of 'photoblogging' style functions without using the photoblogging plug-in that comes with WordPress. While we found the use of local functions was appropriate at the time we made our decision on this matter, being willing to re-investigate/re-develop this area as upgrades occur is an important component of using social software. Because so many possible features exist and are developed in the blogging user community, deciding which features to include and use can be challenging. Here again, the presence of such a large community of developers and users means that we are able to embed and deploy new services quickly, without a large amount of local development. We found that using the WordPress software allowed development to start at a much deeper level. For example, rather than writing a batch loader or indexer, we were able to make small modifications to the WordPress software using existing modules. A prime example of modification is the revision to the tag-cloud plug-in. While traditional tag-clouds emphasize the most popular terms, with regard to our project we found that emphasizing the un-common facet categories in the tag cloud using a Term Frequency/Inverse Document Frequency (TF/IDF) approach yielded a more relevant and usable interface. We accomplished this by creating a function that counted the number of times a facet was used in the corpus of blog posts and comparing those counts against the facets in the individual post. Based on the assigned weights using class definitions in the HTML, the interface uses CSS (Cascading Style Sheets) to modify the style of the resulting tags. Data migration In order to load data from the digital library system to WordPress, we modified the existing RSS loader to support Qualified Dublin Core tags. The RSS loader was used because it already had a framework to support XML and employed a generic approach to the import functions. By adding control structures and modifying the mapping approach of data, we were able to map QDC fields to the appropriate custom fields and categories in WordPress. We also modified the loader in a few ways, such as excluding certain records and category matches in order to ensure that existing records are not overwritten and that duplicate categories are not created. In order to handle record updating, a separate update script was written. As we begin to collect user-contributed data, data export and archiving is becoming a more significant problem. We are looking at different approaches for exporting and preserving this data, including extending the Qualified Dublin Core data in the back-end digital library system to support new blog-centric fields and storing data in archival packages. Two fundamental questions arose while we worked on data loading and exporting. First, while we were committed to using external standards to represent and store data, it became obvious that it might be just as useful to define a local standard and then rely on mapping applications to make data available to external systems. Using a locally defined standard could have allowed us to approach some fields (such as record state, cataloger) more specifically while preserving the standard descriptive fields. A related need to add qualifiers to different types of links could have been achieved via either a local standard or simply a different approach. As the project moves into its third year, we are considering different metadata standards or approaches that might allow more specificity at the administrative levels. Second, we found that the blogging software had sufficient functionality to support all of the data creation, management, and social-contribution needs of the project. It became obvious that using a digital library system for archiving rather than data creation might be a more efficient approach. While we adapted the WordPress import/export functionality to suit the needs of our digital library, the issue of scalability might make it necessary in the future to take different approaches with large metadata batches. While current loading practices for our project are acceptable, loading larger batches through direct database interaction could lead to more flexible system interoperability. The content of the blog post record For our project, we use the blog record to represent an individual photograph. The QDC record title is mapped to the blog posting title. Post content contains the QDC identifier field mapped to the image HTML element, and the QDC description field is mapped to a Paragraph. The Post excerpt contains an image element referencing a thumbnail representation of the photograph. The rest of the blog posting uses custom fields to store the rest of the QDC elements. By making extensive use of WordPress's custom fields to store QDC metadata, the interface and search functionality is enhanced. Observations and Discussion Many of the main benefits of using WordPress or other social software as a digital library interface have already been mentioned. Using a pre-developed system can enable rapid and flexible system development, enables the project to be more focused on user-driven functions, and benefits from the development focus of a larger community. Issues Data management One of the primary issues with the use of social software applications as a digital library interface is data management. While digital libraries have multiple goals (including creation, management, and preservation of digital objects and their metadata), the goals of social software are often focused on ease of content creation, user-friendly interfaces, and data presentation. Making sure that digital objects are accurately represented and displayed means: utilizing primary identifiers from the core digital library system; taking steps to ensure that metadata is accurately represented in the social software application to ensure no degradation of accuracy or specificity; and accurately managing metadata and digital object versions to ensure that the most accurate data is preserved. While WordPress required us to manipulate some functions to ensure that these goals were met in our project, its flexibility as a content creation and display system has resulted in a significant reduction in the amount of work required to create and manage our user interface. Future developments in the use of WordPress as a content creation and management system are expected to save time and allow Digital Forsyth to employ systems that can focus on metadata management and preservation. Technical skills A common criticism of social software is the skill level required to implement and manage these applications. Using social software in a custom environment such as the one described in this article exacerbates this situation, as specific skills are required to make the software work in specific ways. In general, programming skills, XML skills, metadata skills and digital object skills are essential for a project like ours. In addition, some specific skills are required to work with WordPress and common digital library applications, including skills using: PHP SQL CSS/HTML Web server software JavaScript Two issues that took time for us to work through were: Lack of familiarity with the WordPress programming framework Lack of documentation of software In addition, while theme creation is relatively straightforward, customizing a theme beyond simple color and layout decisions required more extensive work. WordPress is rather complicated software, so finding the appropriate function can be difficult at times. Working with these functions can get confusing, regardless of the extent of documentation provided. Sustainability Project sustainability should be a key concern when selecting software for a digital library project. Commercial systems can have a host of sustainability issues, including vendor support, ease of system use and customization, and system flexibility. Social software and open source software applications have other sustainability issues to consider as well. These include the inability to guarantee service stability and permanence. While commercial systems typically have market permanence or vendor-supplied migration options, open source and social software is often only as permanent as the user community surrounding it. This means that an organization must be willing to find new solutions when required, and it also means that these organizations will have to carry the burden of system migration on their own. Further, while there are many hosted low cost and free options for digital libraries (including hosted blogs/wikis, hosted image sites such as Flickr), the system being considered should be compared against the needed stability and uptime of the system. In short, it does not make sense to base key parts of a digital library on a free, hosted service if the digital library cannot sustain itself outside of that service. Conclusions and Recommendations In choosing the approach we took for Digital Forsyth, we found that the implementation of the user interface was accomplished quickly and with community-centric features that serve as the centerpiece of the project site. Despite the issues that we experienced with metadata management, system assumptions/limitations, and interoperability, overall the use of WordPress lowered our development and maintenance costs. We are hopeful that the system will benefit from increased sustainability and portability as future releases of WordPress are installed, and as special functions are developed and deployed. In selecting an existing open source application to serve as an interface to your particular digital library, some key questions to ask are: Will the system scale to the level that your data requires? Are the data models used in the system compliant with your digital library metadata structure? Is migrating data between the applications sustainable? What needs can be filled with these social software/open source applications? Is the software appropriate just for user interfaces or can it serve other functions of the digital library system? How will system sustainability be ensured with the selected options? Will developing the customized parts of the software be more manageable than other solutions? What core functional requirements exist, and is this application the best approach? By choosing the solution we did, we hope to be able to incorporate a number of approaches to public interfaces while keeping our digital library applications centralized and focused on key functions of storage and preservation. Works Cited Borgman, C. L. (1996). Social aspects of digital libraries. Retrieved 2-Nov, 2006, from <http://is.gseis.ucla.edu/research/dl/UCLA_DL_Report.html>. Lagoze, C., Krafft, D., & Payette, S. (2005). What is a digital library anyway? Beyond search and access in the nsdl [Electronic Version]. D-Lib Magazine, 11. Retrieved January 18, 2007 from <doi:10.1045/november2005-lagoze>. Plymouth state university library [Electronic (2007). Version]. Plymouth State Univeristy. Retrieved October 27, 2007 from <http://www.plymouth.edu/library/>. Polar bear expedition digital collections [Electronic (2007). Version]. Bentley Historical Library. Retrieved October 28 2007 from <http://polarbears.si.umich.edu/>. Renda, M. E., & Straccia, U. (2005). A personalized collaborative digital library environment: A model and an application. Information Processing & Management, 41(1), 5. Notes 1. Scriblio, <http://about.scriblio.net>. 2. Digital Forsyth, <http://www.digitalforsyth.org>. 3. Institute of Museum and Library Services, <http://www.imls.gov>. 4. State Library of North Carolina, <http://statelibrary.dcr.state.nc.us>. 5. Protégé, <http://protege.stanford.edu>. Copyright © 2008 Erik Mitchell and Kevin Gilbertson

	Top \| Contents Search \| Author Index \| Title Index \| Back Issues Previous Article \| In Brief Home \| E-mail the Editor

	D-Lib Magazine Access Terms and Conditions doi:10.1045/march2008-mitchell