Book Reviews

Curating Research Data. Volume One: Practical Strategies for Your Digital Repository. Lisa R. Johnston, ed., for the Association of Research Libraries. Chicago, Ill.: American Library Association, 2017. 294p. $65.00 (ISBN: 978-0-8389-8858-9).

Curating Research Data. Volume Two: A Handbook of Current Practice. Lisa R. Johnston, ed., for the Association of Research Libraries. Chicago, Ill.: American Library Association, 2017. 338p. $65.00 (ISBN: 978-0-8389-8862-6).

The last decade has seen a marked increase in the creation, analysis and reuse of data by scholars across a wide range of disciples. In response to this trend, the management and curation of data has become increasingly important for libraries in the digital and data age. The increased focus and demand for these library services have generated comprehensive and instructive works, like the ones reviewed here. These volumes, edited and organized by Lisa R. Johnston, are important reads for both seasoned and novice practitioners of data and digital curation. Johnston, an Associate Librarian at the University of Minnesota, serves as the library’s Research Data Management/Curation Lead and the Co-Director of the University Digital Conservancy.

Book covers: Curating Research Data. Volume One: Practical Strategies for Your Digital Repository, and Curating Research Data. Volume Two: A Handbook of Current Practice

Academic librarians, data scientists, and information technologists wrote the chapters and sections in the volumes. The volumes meld best practices and new ideas for digital and data management and curation together. The works also offer readers insights, perspectives, and case studies from new voices, as well as recognizable experts in areas of digital and data management and curation (Briney, Chen, Imker, Peer, and Yakel, to name a few). The volumes contain comprehensive bibliographies and notes, comprehensively documented resources that add great informational value to the text, and biographies of the contributors. Overall, the writing is even across the sections, and most chapters are clear and concise. Chapters and steps end with notes and a comprehensive bibliography for additional readings.

Volume One: Practical Strategies for Your Digital Repository is a guidebook for strategies and best practices for digital repositories. The work explores a range of topics that address key challenges for repositories, such as meeting data funder policies, data reuse, and outreach services. The volume, divided into three parts, has twelve chapters. The segments are well organized overall, and novices will find this a key resource; seasoned practitioners will find new takeaways and information they can apply.

Part I: Setting the Stage for Data Curation has five chapters. The second and fourth chapters are key reading for anyone involved in a digital repository. They explore data sharing and how polices from various stakeholders may affect the preservation, as well as the complexities of access to and reuse of data. This section is necessary reading for all data curators. There is also a chapter dedicated to data repository challenges unique to Canadian libraries.

Part II: Data Curation Services in Action is composed of four chapters. These chapters offer important insights into service and financial models of repositories and outreach. Chapter 8, by Karl Nilsen, which addresses the financial models, is not only well written but is also a great primer for those new to digital repositories and the financial models associated with maintaining and building these services. The outreach section offers promotional examples and a sample survey of outreach practices.

Part III: Preparing Data for the Future offers the readers three chapters on the ethical and appropriate reuse of data found in digital repositories. Although the lifecycle of data has been extensively written about, this section provides readers with important chapters that help supplement works on data reuse and preservation. Information that includes a list of repositories possibly appropriate for certain types of data preservation is very useful, as well as the whys and hows of “data rescue,” providing a repository for important scientific data (historical in nature) that is at risk of being lost either because of format or because of not having a repository. The last chapter is entirely devoted to a topic that receives limited attention: data rescue; readers new to the topic will come away with key insights and takeaways for why preservation of certain types of data (climate data being one) currently at risk of being lost is important for the scientific community and citizens.

In the next work, Volume Two: A Handbook of Current Practice is broken down and organized into steps for curators to follow for the ingestion, curation, and preservation of data into digital repositories. The eight steps in the volume focus on topics such as intake, assessment of and who gets access to data, understanding the layout and structures of data sets, ingesting, creating and applying metadata, preservation approaches and techniques, and preparing data for reuse. By segmenting the complexities and challenges of ingesting data into steps, the work offers readers a comprehensive approach. While some of the steps are brief (metadata and copyright being examples), this is in large part because the topics are complex and entire books or articles are dedicated to the topic. Each step provides an overview, important advice, insight into curation subtleties, step summaries, and case studies, which provide useful examples. The comprehensive descriptions and explanations illuminate what to do, what not to do, and why. The writing at points in the text is technical, but not overwhelmingly so.

The first step, Receive the Data, is an overview of the key processes and best practices for promotion of data repository services, including the recruitment of data, tips for assessing data intake needs, including legal requirements, as well as determining the life of the data. The section on legal requirements would have benefited from a deeper dive into the topic, because understanding legal exposure and the responsibilities of hosting data are fundamental best practices for any repository. The topic of exchange of data between two repositories and the considerations and challenges is an excellent example of the complexities of data sharing; the case study provided by Marz is a great illustration.

The second step, Appraisal and Selection Techniques that Mitigate Risks Inherent to Data, does a good job of illuminating the importance of understanding the needs and responsibilities of the repository for taking in certain types of sensitive data. The case studies are good examples of the restrictions on some data sets in regard to the sensitivity and deanonymization of data. The section also reinforces the importance of carefully considering the types and scope of data that are added and supported by the repository.

In the third step, Processing and Treatment Action for Data, the authors outline practices and technologies needed to understand and reuse the data that is being ingested into the repository. The authors offer comprehensive inventory methods and the software needed by the curator to work with the data. The software suggestions are comprehensive but do not include Tableau or other softwares that could be used for data files from relational databases. A key portion of the section highlights the need to, as well as how to, document computational environments, which is key for reuse and long-term preservation. The case studies are brief; however, they are good starting points, and the content invites the reader to research and build familiarity and skills using various softwares, as well as ingestion techniques.

The fourth step, Ingest and Store Data in Your Repository, is less about the steps to ingesting data in one’s own repository and focuses more on national and international repositories that can ingest small and large data sets. Additionally, the step explores the differences and benefits between repositories hosted on machines not connected to the cloud and those hosted in the cloud. The chapter contains a wealth of resources for novice and experienced curators.

The fifth section, on Descriptive Metadata, captures all the essentials on the importance of metadata for indexing and discovery, a crucial aspect of any repository. While the section is brief, it does go beyond Dublin Core and covers disciplinary metadata. This chapter, taken with the work of Erik Mitchell’s Metadata Standards and Web Services in Libraries, Archives, and Museums: An Active Learning Resource and Steven Miller’s Metadata for Digital Collections: A How-to-Do-It Manual (How To Do It Manuals for Librarians), is a good introduction to the importance, techniques, and best practices surrounding metadata.

The sixth step, Access, reviews and builds on steps 2 and 5. The section breaks down access into two parts. The first part explores the topic of access in data repository (who has access and what type—view, download, and so forth) and the benefits of creating a “terms of use” for the repository. Expanding on terms of use, it briefly covers copyright concepts like Creative Commons and Open Data. The second part explores the benefits of using additional metadata like Digital Object Identifiers (DOI) and datasite metadata schemas, which are important to help promote discovery of data, correct citing of and data-related projects.

The seventh section, Preservation of Data for the Long Term, explores the challenges of long-term preservation of data, which includes obsolescence of technologies. The section provides a framework for preservation based upon format type. The section also explores the technical solutions for transferring or porting data into formats better suited for preservation. One best practice highlighted is obtaining robust documentation on the steps used in any analysis of the data as well as the computational systems used.

The final step, Reuse, focuses on how to demonstrate the value and impact of data from the repository that has been reused. The step focuses on using different metrics and analytical reporting features to track how often data is downloaded and cited in other works. The section also offers some good strategies for promotion of data, such as peer review. The case studies are perfect for the section; they help to make a strong closing to an important work.

Curating Research Data. Volume One: Practical Strategies for Your Digital Repository and Volume Two: A Handbook of Current Practice are important works to be read by anyone interested in or supporting data curation repositories, services, and solutions in an academic library. The volumes are highly recommended for new data curators, as well as those who have not worked with a variety of data sets. It is also a valuable work for any curator to use to assess gaps and strengths in an existing repository’s best practices and processes. This work complements titles such as The Data Librarian’s Handbook by Robin Rice and John Southall and Digital Curation by Gillian Oliver and Ross Harvey. These volumes will be well received by those who are both new to data librarianships, as well as seasoned data practitioners.—Kara Kugelmeyer, Colby College

Copyright Kara Kugelmeyer

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Article Views (Last 12 Months)

No data available

Contact ACRL for article usage statistics from 2010-April 2017.

Article Views (By Year/Month)

January: 15
February: 14
March: 13
April: 13
May: 12
June: 8
January: 0
February: 0
March: 0
April: 6
May: 103
June: 22
July: 25
August: 45
September: 11
October: 18
November: 11
December: 10