Traditionally, researchers have bound data and stored it in a physical form on a library shelf; the only way to access it was to look up its ISBN. In more recent times, research data has often been stored on someone’s computer somewhere, accessible to only a few. There has been a growing need for long term archiving of data, to make it available in a digital form that can be shared and used by others.
Recently, a team of developers at the Institute for Quantitative Social Science at Harvard University led an initiative to make that data available in a digital form. The Dataverse Network Project (figure 1) is a social science research data archive that allows researchers, faculty, students, and other research professionals around the world to store and share their work.
Figure 1: The Dataverse Home Page
The application is designed to be a network of individual research/storage sites known as dataverses. It is easy for an individual or organization to create a dataverse on the dataverse network, simply by clicking a link on the dataverse network home page. After its creation, the owner can administer their own dataverse without having to install their own version of the software. Each dataverse can be customized to create its own look and feel, as well as add its own users, access restrictions, and so forth. Access to studies and files is completely up to the owner/administrator of the content.
The Dataverse Network features industry leading citation standard and data analysis, in addition to XML data that describes the data in terms of its producers’ credentials, research abstract, and other criteria. It conforms to standards for interoperability with other organizations that are collecting, distributing, and using social science research data, which means that studies and their files can be linked among dataverses. Users can also harvest data on a schedule from other organizations that make their studies and the data available.
The interoperability standards were established by the DDI, an international effort to create a standard for technical documentation describing social science. The DDI is funded by the National Science Foundation, as is part of the Dataverse Network project.
The Dataverse Network is an open source software project freely available at SourceForge. In addition to the Dataverse Network hosted at Harvard, it is also installed at the University of North Carolina, UNC, and it will soon be put to use by ICPSR, a well-known social science research house.
The initial development team for the Dataverse Network included Mercedes Crosas, who is now the Manager of Software Development at the IQSS, as the original project lead, Bob Treacy, Senior Software Architect and Engineer, and Wendy Bossons, Senior User Interface Developer. The team later grew to ten members, including Ellen Kraffmiller, Project Lead, and Gustavo Durand, Technical Lead.
While a predecessor of the project was JSP based and used standard JSF components, the Dataverse Network was entirely an EJB/JSF project from the beginning. They started out using the Sun’s (now retired) Project Woodstock components, but moved to ICEfaces 1.7.2 SP1 to address deadlock issues which were found during testing, in which threads were not being released. ICEfaces provided a rich set of Ajax enabled components that solved the issues.
In preparation for using ICEfaces, they moved to Facelets in a later release. The team found Facelets to be a very useful tool to have, not only for the way that it provides a simple set of tags for templating, but also for the use of the ui:repeat, ui:fragment tags, and the attribute jsfc, which allowed them to tie HTML elements to the behaviors of specific ICEfaces components. This was important because in some cases they wanted to use semantic markup, but needed to be able to take advantage of the richness of the Facelets and JSF environment.
Currently, the plan is to move to Postgres 8.3 from 8.2 to improve general performance and to address memory leaks.
Initially the team developed a few custom components to present tabular data that they wanted to flow in multiple columns, top to bottom, left to right. At the time, there were no components that did that. They are still using some custom components to produce some GUI elements such as tooltips and other context-sensitive help.
During the process of developing the site, the team relied on support from several JSF books, the Pro EJB book, and various web sites including The Glassfish Community, Sun's Java Forums, ICEfaces Forums and Component Showcase, Netbeans User Forums, Facelets User Forums, Nabble, and DevEdge.
The Dataverse Network project was released about three years ago, and the team is thrilled with the adoption rate in terms of the number of contributors and visitors. They had over 13,000 visitors in 2008, and that only represented five months of statistics.
The Dataverse Network has received raves reviews:
"This looks terrific. Thank you very much. I'll try to put your two links up in my . . . home page as soon as possible, and will encourage my students and colleagues to use them..."
"The site looks great, and I am impressed by how many people have already
downloaded the datasets, etc.!"
Information about their citation standard can be found on the wiki at http://thedata.org/citation/standard. Also, you can get information about the DDI initiative at http://www.ddialliance.org/.