16 Collection Development and Management of Research Data
Dandi Wang
Introduction
In 2011, Carpenter, Graybill, Offord, and Piorun projected the academic library’s role in the scholarly community in 2025. The researchers believed that one of the primary roles of librarians includes “managing information for projects of all sizes, including bibliographic management, data creation and preservation, usage rights, and assisting with the distribution of finished works and raw data by promoting open access and local and national data repositories” (p. 66). With the development of high-performance computers, researchers nowadays can more easily collect, access, search, and analyze data than ever before, and librarians get more involved in research data management and preservation. The Association of College and Research Libraries (ACRL) also analyzed the trending topics in the academic library field between 2021 and 2022; the result implies that data has become one of the top trends that librarians and researchers have constantly discussed (Association of College and Research Libraries, 2022). The emergence of big data also resulted in the increasing number of available datasets researchers can use (Khan & Du, 2018). To foster the reuse of digital research data, many funders and peer-reviewed journals have started requesting researchers to provide research datasets with published articles. Though many research papers discussed the influence of research data management on librarians, few researchers examined the impact of research data management on academic library collections, let alone the challenges in managing research data collections.
This chapter intends to explore the importance of research data collection inclusive, the relationship between research data management and collection management, discuss the challenges in research data management from the perspective of collection management, and find potential solutions.
Background and Current Content
The Shift in Collection Management
ACRL defines collections as materials “sufficient in quality, depth, diversity, format, and currency to support the research and teaching missions of the institution” (Association of College and Research Libraries, 2018, p. 9). In a study conducted by Levine-Clark (2019) in the Journal of Collection Management, it was noted that the evolution of library collections encompasses the notion that libraries will play a role in assisting their universities in generating and disseminating content. This content can include works by faculty or students, as well as materials that have been digitized from archival sources. As open access and institutional repositories have matured, academic libraries function as consumers of information and as creators and publishers (Genoni, 2004; Gwynn et al., 2019).
Dempsey (2017) defined this changing direction as the “inside-out library.” In the past decades, libraries depended on publishers to gain access to licensed material to serve their patrons. Such a collection model is considered as “outside-in.” On the other hand, the inside-out library model supports the creation, curation, and dissemination of institutional creation, including open educational resources, research data, digital scholars, and other learning materials in the digital environment. As a result, supporting digital scholarship and research data management has become a growing interest and an important focus for academic libraries.
The Importance of Inclusion of Research Data in University Libraries Collection
Although librarians have started to see the importance of including data sets in the collection (Boté, 2019; Dempsey, 2017; Saponaro & Evans, 2019), no research seems to consider including research data as part of the collection management. Research data are “used as primary sources to support technical or scientific inquiry, research, scholarship, or creative practice, and that are used as evidence in the research process and/or are commonly accepted in the research community as necessary to validate research findings and results” (Government of Canada, 2021). Note that despite research data sharing many similarities with data sets that Boté (2019) and Saponaro and Evans (2019) described, there is a vast difference between the data sets discussed in these articles and research data. Data sets discussed in those papers are the existing data sets; some are subscriptions with vendors meaning most datasets have a well-developed metadata description created, while research data, on the other hand, are the self-deposit datasets relying on the collaboration work between researchers and librarians to provide metadata (Mannheimer et al., 2021).
Including research data in the library collection can benefit schools and researchers significantly. Firstly, it enhances the educational experience by providing students access to rich and diverse datasets, enabling them to engage in hands-on learning and develop critical data analysis skills. By incorporating research data into the library collection, schools can offer a broader range of resources for students and faculty to explore, fostering a more comprehensive and interdisciplinary approach to research and learning.
Many funding agencies and publishers have started to require researchers to publish their research datasets openly along with the paper (Government of Canada, 2021; Scientific Data, n.d.; Elsevier, n.d.). Thus, developing institutional research data collection would also promote collaboration and knowledge sharing among researchers working on related topics, as they can access and analyze existing datasets, thereby building upon previous work and accelerating the research process. By making research datasets openly accessible, researchers can also discover new insights and generate innovative research questions by exploring and reusing datasets that might have been collected for different purposes initially. Additionally, the inclusion of research data in the library collection supports the principles of transparency and reproducibility in research, enabling others to verify and replicate findings, which ultimately strengthens the credibility and impact of scholarly work.
Moreover, the library’s management of research data ensures its long-term preservation and accessibility. By curating and preserving research datasets, libraries preserve knowledge and enable future researchers to build upon previous research. This preservation effort also mitigates the risk of data loss or deterioration over time, ensuring that valuable research data remains available for future generations. Overall, integrating research data into the library strengthens the academic community by fostering knowledge sharing, innovation, and the advancement of research.
Challenges in Research Data Collection Management
In the realm of library collections, research data collection management introduces its challenges, necessitating focused strategies separate from those employed in traditional collection management. The most common challenges researchers mentioned are as follows: insufficient budget and staff, equity of access, and intellectual freedom (Gregory, 2019; Horava, 2010; Saponaro & Evans, 2019). Filson (2017) found that academic libraries often faced funding issues in collection management activities. Empirical research conducted by Hamad et al. (2021) substantiated the assertion that inadequate library budgets emerge as the foremost challenge academic libraries encounter when it comes to managing research data collections. The study highlights the urgent need for improved financial support to facilitate effective collection management practices.
However, the primary challenge in the collection development, and management of research data lies in the effective preparation of research data for long-term usability. This involves addressing various challenges across different categories. One significant category is the lack of institutional research data management policies and strategies, which can hinder the establishment of robust governance frameworks and standardized practices for data management. Additionally, the complexity of research data poses challenges, including issues related to data integration, quality assurance, and the need for specialized tools and expertise. Finally, there may be insufficient support for research data management, such as limited resources, training opportunities, and collaboration platforms, which can impede effective data stewardship.
By recognizing and addressing these challenges, librarians and institutions can enhance their ability to manage research data collections effectively and facilitate the accessibility and usability of valuable research data for current and future academic endeavors.
Lacking Institutional Research Data Management Strategy
An institutional research data management (RDM) strategy serves as a crucial policy for collection management, offering guidance and support to facilitate research activities, especially in the data collection and preservation stage. The three major funding agencies, the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Social Sciences and Humanities Research Council of Canada (SSHRC), also known as the Tri-Agency, formed the RDM policy in 2021 and stated the benefits of implementing an institutional RDM strategy. For instance, it enables the research community to understand better an institution’s RDM capacity, challenges, and needs, fostering collaboration among different institutions and advancing RDM practices across Canada (Government of Canada, 2021). An institutional RDM strategy provides guidance and support for the secure preservation, curation, and accessibility of research data through repositories. Establishing storage and preservation standards assists disciplinary communities within institutions in maintaining consistent practices. Furthermore, it supports researchers in adhering to ethical, legal, and commercial obligations related to data management.
However, despite the requirement set by the Tri-Agency’s research data management policy for Canadian institutions to establish a proper institutional RDM strategy by March 2023 (Government of Canada, 2021), many institutions still face challenges in developing an official strategic plan to support RDM. Cox et al. (2019) surveyed libraries worldwide to examine the development of library research data services. Their findings revealed that in 2018 no Canadian institutions had formulated a policy, with only 29% planning to do so in the following year, while 29% had no plans to develop an RDM policy. The small universities and colleges still struggled to form their own strategy due to a lack of knowledge on RDM and a limited number of librarians working on it (Concordia University of Edmonton, 2023).
The need for institutions to establish an institutional strategy has become pressing and time-sensitive.
The Complexity of Research Data
Research data presents a complex and diverse landscape encompassing various formats, sources, and disciplinary perspectives. Ohaji et al. (2019) highlight the complexity of research data stemming from non-standard data types and varying formats across different fields. Humanities researchers, for example, rely on interview data, whereas medical researchers analyze blood samples. Different data types and formats may lead to the difference in data size. This diversity presents a unique challenge compared to traditional collection management, which deals with more consistent formats such as books, newspapers, DVDs, and journals. Consequently, storing, describing, and accessing research data collections require tailored solutions that account for the specific characteristics of the data.
Furthermore, the sensitivity of the data must be carefully considered when developing research data collections. While researchers must adhere to ethical regulations established by research ethics boards to protect participant privacy and obtain informed consent, technological advancements have introduced new complexities (Government of Canada, 2019). The ease of accessing, storing, and analyzing large volumes of data increases the potential for re-identification of individuals based on their unique characteristics.
Insufficient Technology Support for Depositing Research Data
Just like bookshelves hold print collections, libraries need to have data repositories to preserve research data collection. As part of the research funding policy, the Tri-Agency requires researchers who plan to receive grants to deposit research data, metadata, and code that support research conclusions into a digital repository once the institutional RDM strategy phase has finished (Government of Canada, 2021).
In a study conducted by Xu et al. (2022), which involved seven first-time users of a data repository, participants reported overall satisfaction with the usability of the repository. However, they encountered challenges that required assistance navigating the repository structure, understanding terminology specific to the repository, and effectively creating metadata. Specifically, participants expressed difficulties in comprehending technology-related jargon, hindering their ability to create accurate and meaningful metadata. These findings underscore the crucial need for guidance in selecting suitable repositories and providing comprehensive support to researchers throughout the metadata creation process to ensure the optimal discoverability and reuse of research datasets.
Librarians, equipped with their expertise in library skills, can play a significant role in assisting researchers in metadata creation. By leveraging their knowledge and understanding of metadata standards and best practices, librarians can provide valuable guidance to researchers in creating high-quality metadata that adheres to established conventions (Gwynn et al., 2019). However, it should be noted that challenges and ambiguities in metadata creation have been identified in the context of database repository setup, as highlighted by the research conducted by Mannheimer et al. (2021).
Given the diverse nature of research data, developing standardized metadata descriptions can be time-consuming and complex. However, the investment in standardized metadata is crucial for enhancing the discoverability and reusability of research datasets. Librarians, in collaboration with researchers and other stakeholders, must explore practical solutions to streamline and support the metadata creation process, ensuring the quality and usability of research datasets in an increasingly data-driven scholarly landscape.
Insufficient Trained Support Staff on Providing Research Data Service
In a survey analyzing the institution’s research data service and workforce development, Tenopir et al. (2015) found that over half of the participants indicated the reassignment of existing library staff to provide research data services. This shift resulted in subject specialists transitioning from traditional responsibilities, such as evaluating and selecting collections, to new roles in liaison work, learning commons, digital repositories, and data services (Day & Novak, 2019; Kranich et al., 2020; Ohaji et al., 2019). In addition to these changes, Ohaji et al. (2019) reported that nearly 70% of research institutions preferred subject librarians or liaison librarians to offer research data management support. However, librarians in these roles often need more time and knowledge of data literacy, curation, and management.
Furthermore, while librarians often provide reference assistance in searching library databases and using resources effectively, there is a lack of specific guidance on accessing, utilizing, and citing open-access datasets. Research papers fail to address this important aspect of librarians’ extended role in academic resource collections. By addressing these challenges, librarians can enhance their support in accessing, using, and citing such datasets, thereby enriching the extended collection of academic resources.
Responses
Although academic libraries face numerous challenges in developing and managing research data, the proposed solutions discussed below provide avenues for addressing these challenges and enhancing the effectiveness of research data collection within libraries.
To effectively address research data collection and management challenges, libraries must collaborate with various institutional stakeholders (Boté, 2019). By engaging with stakeholders, such as faculty members, institutional research offices, and IT departments, libraries can establish a comprehensive institutional research data management strategy that aligns with the institution’s research and learning objectives. This strategy will reflect the institutional value placed on research and provide clear guidelines on crucial aspects like data curation, long-term preservation, and library support services (Boté, 2019). For instance, sub-strategy plans like a digital preservation strategy can be developed to assist researchers in adhering to research data deposit guidelines, ensuring the accessibility and longevity of valuable research datasets.
Collaborating with various institutional stakeholders offers significant advantages for libraries in creating an institutional research data management strategy. By working closely with faculty members involved in research projects, libraries gain valuable insights into the production of research output, particularly research data. This understanding becomes instrumental in developing and managing accurate metadata descriptions for research datasets (Day & Novak, 2019; Ohaji et al., 2019). Furthermore, close collaboration with the institutional research office is crucial for libraries to stay informed about the ever-changing funding and publishing requirements in research data management, empowering librarians to provide researchers with up-to-date policies and valuable suggestions that align with the evolving landscape (Tenopir et al., 2015).
The institutional information technology department is another stakeholder with which libraries should collaborate closely (Cox & Pinfield, 2014; Tenopir et al., 2015). To ensure the accessibility and reusability of the research data, IT provides support and maintenance on the instinctual repository so research datasets can be accessed and potentially reused. Libraries can gain information from the IT department and provide guidelines about depositing sensitive research data (Boté, 2019). Such collaborations foster a comprehensive strategy that ensures the library’s research data management support is in tune with the needs and expectations of the academic community.
National-wide collaboration would be another asset for the development of institutional RDM strategy. For instance, ARMIN (Alberta Research-Data Management Information Network) supports smaller Canadian universities and colleges in developing their institutional strategies for RDM by providing workshops, seminars, and discussions (Concordia University of Edmonton, n.d.). Moreover, the Digital Research Alliance of Canada facilitates knowledge exchange between researchers, librarians, and universities. It contributes to developing national or sector-specific policies and guidelines that align with Canadian institutions’ unique needs and priorities.
With the rapid technological advancements, supporting researchers’ evolving data demands has become an ongoing challenge. In order to effectively meet these demands, librarians must actively engage with emerging technologies and stay abreast of the latest trends. By keeping pace with this dynamic landscape, librarians can better understand and address researchers’ evolving needs. Researchers have identified several core competencies essential for librarians to enhance their support for the research community. Semeler and Pinto (2020) have identified four crucial skill sets for supporting research data. Firstly, interpersonal and behavioral characteristics, including strong oral and written communication skills, enable librarians to comprehend researchers’ data requests and develop case studies to facilitate effective research data management strategies. Ohaji et al. (2019) also emphasize the significance of interpersonal and communication skills in delivering exceptional customer service to researchers. Another vital skill set for librarians is contextual knowledge about the institutional environment. Librarians should be well-versed in funding policies supporting scientific research and understand ethical procedures, disciplinary research methods, scientific communication, intellectual property, access methods, and copyrights (Semeler & Pinto, 2020). This comprehensive knowledge equips librarians with the necessary foundation to navigate the complex landscape of research data management.
Two additional critical skill sets and areas of knowledge pertain to data and technology. According to Ohaji et al. (2019), a comprehensive understanding of data is essential for librarians. This includes knowledge of the research data life cycle, data literacy, business analysis, and metadata. Semeler and Pinto (2020) provide a more detailed description, emphasizing the importance of librarians understanding different data types, metadata, and the significance of questions related to unique identifiers and the preservation of digital data. Furthermore, librarians must possess a solid foundation in technology and its relevant tools. Ohaji et al. (2019) argue that librarians should have basic knowledge of technology, especially in areas such as big data, programming languages, database design, and natural language processing tools. Acquiring these skills enables librarians to engage effectively with the ever-changing technological landscape, as highlighted by Semeler and Pinto (2020). In addition to these four skill sets, Ohaji et al. (2019) also stress the importance of librarians’ familiarity with research practices. This includes understanding the research cycle and e-research, enabling librarians to align their support with the specific needs of the research community. By developing expertise in data management, technology, and research practices, librarians can play a pivotal role in supporting researchers and ensuring the effective management and utilization of research data.
Two broad training options for librarians are personal and organizational (Ohaji et al., 2019). Librarians can greatly benefit from collaborating with experienced professionals, such as fellow librarians or other experts, to gain insights into research data management practices. Engaging in meaningful conversations with researchers about their data usage, management techniques, and the tools they employ for data collection and analysis can significantly enhance librarians’ skill sets and knowledge. These personal opportunities serve as invaluable avenues for librarians to develop their expertise. Moreover, organizations and institutions are crucial in providing librarians with training opportunities; they should offer formal training programs specifically designed to address research data management needs within universities. Additionally, professional development opportunities focused on research data management should be made available to librarians. These can include workshops, seminars, and courses that equip librarians with the necessary skills and knowledge to effectively support researchers in managing their data. Institutions should also consider sponsoring librarians to attend conferences and workshops dedicated to research data management, allowing them to stay updated on the latest advancements and best practices in the field.
In conclusion, the solutions presented not only empower librarians to build comprehensive research data collections but also ensure the utilization of research data for future academic endeavors. By implementing these solutions, academic libraries can play a vital role in supporting research and fostering data-driven scholarship.
Conclusion
The evolving research and learning landscape necessitates changes in academic library collections and management. When addressing research data management (RDM), libraries should view it not merely as a new service but as an extension of collection management in the digital age to preserve the core value of collection management. Establishing a research data collection presents numerous challenges, but collaborating with all stakeholders in collection management allows libraries to develop a strategy aligned with the institution’s vision, mission, and policies, enhancing the library’s ability to serve the research community effectively. Simultaneously, as information professionals, librarians must proactively acquire data literacy skills to actively engage in the research life cycle, comprehend the needs of researchers, and provide optimal research support, thereby satisfying the requirements of library patrons.
Sources for Further Reading
Government of Canada. (2021, March 15). Tri-agency research data management policy. Government of Canada. https://science.gc.ca/site/science/en/interagency-research-funding/policies-and-guidelines/research-data-management/tri-agency-research-data-management-policy
The Tri-Agency RDM policy is the theoretical foundation for the development of institutional research data management strategy. Data librarians and research data management librarians could use this as guidance to 1) provide correct and updated research data management information to researchers; 2) collaborate with other stakeholders around the institution to provide better dataset collection management services.
Cooper, A., Steelworthy, M., Paquette-Bigras, È., Clary, E., MacPherson, E., Gillis, L., WIslons, L., & Broduer, J. (2021). Dataverse Curation Guide. Zenodo. https://zenodo.org/record/5579820#.Y2DdP3bMJD8
Dataverse Curation Guide gives step-by-step instructions for researchers and data scientists on how to deposit datasets into Dataverse. In Canada, Datavers works with many institutions to provide institutional repository services. The guidance also explains the differences between dataset and dataset collection in Dataverse.
Digital Research Alliance of Canada. (N.d). Our Services. https://alliancecan.ca/en/our-services
Previously called Portage, the Digital Research Alliance of Canada is the leading organization to provide support and consultations to Canadian researchers on topics related to advanced research computing (ARC), research data management (RDM), and research software (RS). Digital Research Alliance of Canada also provides a digital tool of data management template to researchers with best practice guidelines.
References
Association of College and Research Libraries. (2018). Standards for libraries in higher education. ACRL Board of Directors, Association of College and Research Libraries. https://www.ala.org/acrl/sites/ala.org.acrl/files/content/standards/slhe.pdf
Association of College and Research Libraries. (2022). Top trends in academic libraries: A review of the trends and issues. College & Research Libraries News, 83(6), 243.
Boté, J. J. (2019). Dataset management as a special collection. Collection Management, 44(2-4), 259-276.
Carpenter, M., Graybill, J., Offord Jr, J., & Piorun, M. (2011). Envisioning the library’s role in scholarly communication in the year 2025. Portal: Libraries and the Academy, 11(2), 659-681.
Cooper, A., Steelworthy, M., Paquette-Bigras, È., Clary, E., MacPherson, E., Gillis, L., WIslons, L., & Broduer, J. (2021). Dataverse Curation Guide. Zenodo. https://zenodo.org/record/5579820#.Y2DdP3bMJD8
Cox, A. M., & Pinfield, S. (2014). Research data management and libraries: Current activities and future priorities. Journal of librarianship and information science, 46(4), 299-316.
Cox, A. M., Kennan, M. A., Lyon, L., Pinfield, S., & Sbaffi, L. (2019). Maturing research data services and the transformation of academic libraries. Journal of Documentation, 75(6), 1432-1462.
Concordia University of Edmonton (n.d.). ARMIN, Alberta research-data management information network. https://concordia.ab.ca/research/research-at-concordia/armin/
Day, A., & Novak, J. (2019). The subject specialist is dead. Long live the subject specialist!. Collection Management, 44(2-4), 117-130.
Digital Curation Centre. (2017). How-to guides & checklists. https://www.dcc.ac.uk/resources/how-guides
Digital Research Alliance of Canada. (N.d). Our Services. https://alliancecan.ca/en/our-services
Dempsey, L. (2017). Library collections in the life of the user: two directions. LIBER Quarterly: The Journal of the Association of European Research Libraries, 26(4), 338-359.
Elsevier. (n.d.). Database linking. Elsevier. Retrieved November 29, 2022, from https://www.elsevier.com/authors/tools-and-resources/research-data/data-base-linking
Genoni, P. (2004). Content in institutional repositories: a collection management issue. Library management, 25(6-7), 300-306.
Government of Canada. (2022, June 1). Research data management. Government of Canada. https://science.gc.ca/site/science/en/interagency-research-funding/policies-and-guidelines/research-data-management
Government of Canada. (2021, March 15). Tri-agency research data management policy. Government of Canada. https://science.gc.ca/site/science/en/interagency-research-funding/policies-and-guidelines/research-data-management/tri-agency-research-data-management-policy
Gregory, V. L. (2019). Collection development and management for 21st century library collections: an introduction. American Library Association.
Gwynn, D., Henry, T., & Craft, A. R. (2019). Collection creation as collection management: Libraries as publishers and implications for collection development. Collection Management, 44(2-4), 206-220.
Hamad, F., Al-Fadel, M., & Al-Soub, A. (2021). Awareness of research data management services at academic libraries in Jordan: Roles, responsibilities and challenges. New Review of Academic Librarianship, 27(1), 76-96.
Horava, T. (2010). Challenges and possibilities for collection management in a digital age. Library Resources & Technical Services, 54(3), 142-152.
Khan, H. R., & Du, Y. (2018). What is a data librarian?: A content analysis of job advertisements for data librarians in the United States academic libraries [ Paper presentation]. IFLA 2018, Kuala Lumpur, Malaysia (2018).
Kranich, N., Lotts, M., Nielsen, J., & Ward, J. H. (2020). Moving from collecting to connecting: Articulating, assessing, and communicating the work of liaison librarians. portal: Libraries and the Academy, 20(2), 285-304.
Levine-Clark, M. (2019). Imagining the future academic library collection. Collection management, 44(2-4), 87-94.
Mannheimer, S., Clark, J. A., Hagerman, K., Schultz, J., & Espeland, J. (2021). Dataset search: A lightweight, community-built tool to support research data discovery. Journal of eScience Librarianship, 10(1):3. https://doi.org/10.7191/jeslib.2021.1189
Ohaji, I. K., Chawner, B., & Yoong, P. (2019). The role of a data librarian in academic and research libraries. University of Boras, Sweden, 24(4).
Government of Canada. (2019, September 23). TCPS 2 (2018) – Chapter 5: Privacy and confidentiality (2018, modified September 23, 2019). Panel on Research Ethics, Government of Canada. https://ethics.gc.ca/eng/tcps2-eptc2_2018_chapter5-chapitre5.html
Saponaro, M. Z., & Evans, G. E. (2019). Collection management basics. ABC-CLIO.
Scientific Data. (n.d.). Data repository guidance. Nature.com. Retrieved November 29, 2022, from https://www.nature.com/sdata/policies/repositories
Semeler, A. R., & Pinto, A. L. (2020). Data librarianship as a field study. Transinformação, 32. https://doi.org/10.1590/2318-0889202032e200034
Tenopir, C., Hughes, D., Allard, S., Frame, M., Birch, B., Baird, L., … & Lundeen, A. (2015). Research data services in academic libraries: Data intensive roles for the future?. Journal of eScience Librarianship, 4(2).
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., … & Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, 3(1), 1-9.
Xu, Z., Watts, J., Bankston, S., & Sare, L. (2022). Depositing data: A usability study of the Texas Data Repository. Journal of eScience Librarianship, 11(1). https://doi.org/10.7191/jeslib.2022.1233