Modules

Directory of modules.

Week 1: Course Overview and Introduction
Overview of and Introduction to Data Curation II.
[video] [content] [exercise]
Required Reading:
- Course Content
Week 2: Tables, Trees, & Triples
Given the fundamental relational nature of data we should start the course off by thinking about how to make decisions about ways data should be represented and made accessible to users.
[video] [content] [exercise]
Required Readings:
Suggested Readings:
- Abiteboul, S., Buneman, P., & Suciu, D. (2000). Data on the Web: from relations to semistructured data and XML
Week 3: Tidy Data
Overview of tidy data principles as they relate to data curation, plus extending tidy data to some of the underlying principles in organizing, managing, and preparing all kinds of structured data for meaningful use.
[video] [content] [exercise]
Required Readings:
Suggested Readings:
Week 4: Data Integration
Data integration as it operates at the logical level of tables, and data that feed into user interfaces.
[lecture] [guest lecture] [content] [exercise]
Required Readings:
- Course Content
Highly Recommended Readings:
- Whong, Chris (2020) “Taming the MTA’s Unruly Turnstile Data”
- Wikipedia article on data integration
Optional Readings:
- Halevy, A., Rajaraman, A., & Ordille, J. (2006, September). Data integration: The teenage years. In Proceedings of the 32nd international conference on Very large data bases (pp. 9-16)
- Abiteboul, S., Buneman, P., & Suciu, D. (2000). Data on the Web: from relations to semistructured data and XML. Morgan Kaufmann
Week 5: Data Packaging
How and why our work in developing metadata for data curation is paramount to sustainable access, and introduce a few broadly used standards for creating data packages.
[video] [content] [exercise]
Required Readings:
- Course Content
- Bechhofer, S., De Roure, D., Gamble, M., Goble, C., & Buchan, I. (2010). Research objects: Towards exchange and reuse of digital knowledge
- Skim this list of projects and tools for data packaging: Google Sheet or PDF
- Neylon (2017) Packaging data
Pick 1 of following to read or review in-depth:
Week 6: Repository Architectures
Builds upon Data Curation I discussion of a data repository as a layered architecture for curation.
[video] [content]
Required Readings:
- Course Content
- description of digital libraries (cyberinfrastructure) from the National Science Foundation program
- This post from the IQSS staff at Harvard’s Dataverse provides an excellent table comparing existing data repository services. Pay attention to the categories being compared, and how this related to the affordances of the software
- Fallaw, C., Dunham, E., … (2016). Overly honest data repository development. Code4Lib
Review documentation for just one repository platform listed below (be sure to also look at an example of the platform’s deployment):
- Samavera (Open-source repository for universities and institutional repositories)
  
  About
  
  Technical Stack
  
  Example deployment
- Dataverse (Open-source repository for social science data)
  
  About
  
  Documentation
  
  Example deployments https://data.qdr.syr.edu/ and https://dataverse.tdl.org/
  
  See the QDR Core Seal Trust documentation for more details on how Dataverse is configured
- Fedora (Open-source repository with semantic capabilities - often used by science repositories)
  
  About
  
  Specifications
  
  Developer Wiki
  
  Example Deployment
  
  ADS certification documentation for further info on how they use Fedora
- CKAN (open-source data repository - often used for civic data)
  
  About
  
  Documentation
  
  Example deployments https://data.gov.au/ and Data.gov
  
  Some additional info on Data.gov.au’s CKAN
- Clowder (Open-source for long-tail data)
  
  Description of project
  
  Description of technical design
  
  Example deployment
Suggested Readings:
Week 7: Data Acquisition, Search, and Discovery
A review of the fundamental challenges that data curators face in making data discoverable.
[video] [content] [exercise]
Required Readings:
Suggested Readings:
Case Study (Optional):
- A Data-Driven Approach to Appraisal and Selection at a Domain Data Repository.
Week 8: Metadata Application Profiles
Introduction to tidy metadata.
[video] [content] [exercise]
Required Readings:
- Course Content
- Application profiles:
  
  Heery, R., & Patel, M. (2000). Application profiles: mixing and matching metadata schemas. Ariadne, (25)
  
  The Singapore Framework for Application Profiles Note this is currently under revision by DCMI. You can catch up on their work here (and also see an example of use cases in the wild)
- Some examples of metadata application profiles:
  
  DPLA
  
  Cornell Library
  
  Carnegie Hall Archives
Suggested Readings:
- Hebron, T. K. (2018). Extending and Adapting Metadata Audit Tools for Mountain West Digital Library Members Code4Lib Journal, (41)
- Curado Malta, M., Bermúdez Sabel, H., Baptista, A. A., & González-Blanco García, E. (2018). Validation of a metadata application profile domain model
- Stein, A., & Dunham, E. (2018). Meaningful Data Sharing: Developing the Illinois Data Bank Metadata Framework. Journal of Library Metadata, 18(2), 59-83
Week 9: Linked Data
Introducing some working definitions and providing an overview of concepts related to linked data and the promise, but ultimate failure of the semantic web.
[video] [content]
Required Readings:
- Course Content
- Allemang, D., & Hendler, J. (2011). Semantic web for the working ontologist: effective modeling in RDFS and OWL. Second Edition
  
  Read Chapter 1 for an introduction to SW’s concepts. If you are interested Chapter 2 gives a bit more detail on how the SW works, and Chapter 3 introduces RDF and knowledge modeling.
- Ontology Development 101 (Noy and McGuiness)
  
  Read Section 1 and 2; (3 and 4 are optional)
  
  Note - this is a classic formulation of what an ontology is and how to create one. The software they reference in building out the example is called Protege (free https://protege.stanford.edu/). If you are really keen you can follow along. (For reference - this short list from Wikipedia is quite helpful.)
- Ontology for Data Science
- Semantic Web for the Legal Domain
Suggested Readings:
- ARL White Paper on Wikidata: Opportunities and Recommendations (2019)
Week 10: Emerging Topics
The future in data curation.
[content]
Required Reading:
- Course Content