Back To Schedule
Thursday, October 24 • 9:15am - 2:30pm
Workshop: Natural Language Processing (NLP) and Machine Learning for Digital Curation

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

This workshop will be an interactive session about use of open-source natural language processing (NLP) and machine learning (ML) tools to process and provide access to born-digital materials. It will focus on applying topic modeling and named entity recognition to characterize and explore contents of removable storage media (e.g. floppy disks, optical media) – functionality developed through the BitCurator Access and BitCurator NLP projects.  We will also explore open-source software (OSS) tools and methods for libraries, archives and museums (LAMs) to identify email in born-digital collections, review email sources for sensitive or restricted materials, and perform appraisal and triage tasks to identify and annotate records - specifically on products of the Review, Appraisal and Triage of Mail (RATOM) project’s use of machine learning to separate records from non-records, along with natural language processing methods to identify entities of interest within those records. In addition to gaining hands-on experience using the tools, participants will also learn about the rationale for their development, how they relate to other available software, and how NLP and ML can fit into larger digital curation workflows. We will conclude with a brief discussion of implications for participants in their own institutions.


Sangeeta Desai

Systems Integration Librarian, State Archives of North Carolina
avatar for Kam Woods

Kam Woods

Research Scientist, University of North Carolina
Research Scientist @ UNC SILS. RATOM Technical Lead. @kamwoods. he/him/his
avatar for Cal Lee

Cal Lee

Professor, University of North Carolina
Christopher (Cal) Lee is Professor at the School of Information and Library Science at UNC, Chapel Hill. He teaches courses and workshops in archives and records management. He is a Fellow of SAA, and he serves as editor of American Archivist.

Thursday October 24, 2019 9:15am - 2:30pm EDT
Arts Library Classroom, Room 119

Attendees (5)