Wallace Lungu Blog for Learning Purpose: Data Collection and Repositories

Data collection is a foundational stage in the data lifecycle; however, its significance extends beyond the mere gathering of information. It determines the credibility, reliability, and long-term value of research outputs. In Library and Information Science, data collection must be understood as a deliberate and methodologically rigorous process aimed at producing valid and reusable data. Without such rigor, the integrity of research is compromised, regardless of the sophistication of subsequent analysis.

Various methods of data collection including surveys, interviews, observations, and the use of secondary datasets offer distinct advantages and limitations. Creswell and Creswell (2018) emphasise that the selection of these methods should be guided by research objectives. Despite this, methodological choices are often influenced by convenience rather than suitability. Surveys, for example, are frequently adopted due to their efficiency, yet they may fail to capture the depth required for complex inquiries. This overreliance on convenience-based methods undermines data quality and calls for a more deliberate approach that prioritises validity and contextual relevance.

The importance of data collection becomes more evident when linked to data repositories. Once collected, data must be systematically organised and stored to ensure accessibility and preservation. Data repositories function as structured digital infrastructures that support the storage, management, and dissemination of datasets. Borgman (2015) highlights their importance in the digital research environment; however, their effectiveness depends not only on storage but also on proper data management practices.

Metadata is central to this process, as it provides the context necessary for interpreting and reusing datasets. The UK Data Service (2020) notes that metadata enhances discoverability and usability. Nevertheless, inadequate metadata practices remain a common challenge, often making datasets difficult to interpret. This reflects a broader weakness in research data management, where documentation is insufficiently prioritised.

In addition, repositories support open science by promoting transparency and reproducibility. Tenopir et al. (2011) observe that although researchers recognise the benefits of data sharing, actual practices remain inconsistent. This gap is often linked to concerns about intellectual property, data sensitivity, and misuse. As a result, the effectiveness of repositories is shaped not only by technology but also by institutional policies and researcher attitudes. In conclusion, data collection and repositories are interdependent processes that shape the quality and impact of research. Rigorous data collection ensures reliability, while effective repository management guarantees preservation and access. Strengthening both components is essential for advancing transparent, credible, and reusable research in the digital era.

References

Borgman, C. L. (2015). Big data, little data, no data: Scholarship in the networked world. MIT Press.

Creswell, J. W., & Creswell, J. D. (2018). Research design: Qualitative, quantitative, and mixed methods approaches. Publications.

Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., & Frame, M. (2011). Data sharing by scientists: Practices and perceptions. PLoS ONE, 6(6), e21101.

UK Data Service. (2020). Managing and sharing data: Best practice for researchers. UK Data Service.