Schedule
-
EventDateDescriptionCourse Material
-
Module03/29/2021
MondayRequired Reading:
-
Due03/30/2021 23:59
TuesdaySign Up for 15-minute Meet & GreetWhere:
- Canvas Calendar (see Course Overview Lecture)
- (This isn’t a hard deadline - just be aware that sign-ups are happening)
-
Due04/04/2021 23:59
SundayIntroduce Yourself on CanvasWhere:
-
Due04/04/2021 23:59
Sunday -
Module04/05/2021
Monday -
Due04/11/2021 23:59
SundayWeek 2 Exercise DueWhere:
-
Due04/11/2021 23:59
Sunday -
Module04/12/2021
MondayRequired Readings:
- Course Content
- Rowson and Munoz (2016) Against Cleaning
- Wickham, H. (2014), “Tidy Data,” Journal of Statistical Software, 59, 1–23
Suggested Readings:
- Wickham, H. (2014), “Tidy Data,” Journal of Statistical Software (more code & examples than required reading)
- Tierney, N. J., & Cook, D. H. (2018). Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations
- Leek (2016) Non-tidy data
- Broman, K. W., & Woo, K. H. (2018). Data organization in spreadsheets. The American Statistician, 72(1), 2-10
- Tort, F. (2010). Teaching spreadsheets: Curriculum design principles
- Mack, K., Lee, J., Chang, K., Karahalios, K., & Parameswaran, A. (2018, April). Characterizing scalability issues in spreadsheet software using online forums
- Formatting data tables in spreadsheets: Data Carpentry Lesson
-
Due04/18/2021 23:59
SundayWeek 3 Exercise DueWhere:
-
Due04/18/2021 23:59
Sunday -
Module04/19/2021
MondayWeek 4: Data IntegrationRequired Readings:
Highly Recommended Readings:
Optional Readings:
- Halevy, A., Rajaraman, A., & Ordille, J. (2006, September). Data integration: The teenage years. In Proceedings of the 32nd international conference on Very large data bases (pp. 9-16)
- Abiteboul, S., Buneman, P., & Suciu, D. (2000). Data on the Web: from relations to semistructured data and XML. Morgan Kaufmann
-
Due04/25/2021 23:59
SundayWeek 4 Exercise DueWhere:
-
Due04/25/2021 23:59
Sunday -
Module04/26/2021
MondayRequired Readings:
- Course Content
- Bechhofer, S., De Roure, D., Gamble, M., Goble, C., & Buchan, I. (2010). Research objects: Towards exchange and reuse of digital knowledge
- Skim this list of projects and tools for data packaging: Google Sheet or PDF
- Neylon (2017) Packaging data
Pick 1 of following to read or review in-depth:
-
Due05/02/2021 23:59
Sunday -
Module05/03/2021
MondayRequired Readings:
- Course Content
- description of digital libraries (cyberinfrastructure) from the National Science Foundation program
- This post from the IQSS staff at Harvard’s Dataverse provides an excellent table comparing existing data repository services. Pay attention to the categories being compared, and how this related to the affordances of the software
- Fallaw, C., Dunham, E., … (2016). Overly honest data repository development. Code4Lib
Review documentation for just one repository platform listed below (be sure to also look at an example of the platform’s deployment):
- Samavera (Open-source repository for universities and institutional repositories)
- Dataverse (Open-source repository for social science data)
- Fedora (Open-source repository with semantic capabilities - often used by science repositories)
- CKAN (open-source data repository - often used for civic data)
- About
- Documentation
- Example deployments https://data.gov.au/ and Data.gov
- Some additional info on Data.gov.au’s CKAN
- Clowder (Open-source for long-tail data)
Suggested Readings:
- Amorim, R. C., Castro, J. A., Da Silva, J. R., & Ribeiro, C. (2017). A comparison of research data management platforms: architecture, flexible metadata and interoperability. Universal Access in the Information Society, 16(4), 851-862
- Lnenicka, M. (2015). An in-depth analysis of open data portals as an emerging public e-service. International Journal of Social, Education, Economics and Management Engineering, 9(2), 589-599. (see table 3 in particular for a comparative approach to Open Data portal evaluation)
- Cornell University Library Repository Principles and Strategies Handbook (I highly recommend this if you are looking for some background on how a University Library strategizes around digital infrastructures)
- Blanke, T., & Hedges, M. (2013). Scholarly primitives: Building institutional infrastructure for humanities e-Science. Future Generation Computer Systems, 29(2), 654-661
-
Module05/10/2021
MondayRequired Readings:
- Course Content
- Google Dataset Search: Building a search engine for datasets in an open Web ecosystem.
- Facilitating the discovery of public datasets
- Discovering millions of datasets on the web
Suggested Readings:
- Data Discovery Paradigms: User Requirements and Recommendations for Data Repositories.
- Understanding data search as a socio-technical practice.
- Scientific user requirements for a herbarium data portal.
- Scholar‐built collections: A study of user requirements for (Humanities) research in large‐scale digital libraries.
- Improving the discoverability and web impact of open repositories: techniques and evaluation.
Case Study (Optional):
-
Due05/16/2021 23:59
SundayWeek 7 Exercise DueWhere:
-
Due05/16/2021 23:59
Sunday -
Module05/17/2021
MondayRequired Readings:
- Course Content
- Application profiles:
- Heery, R., & Patel, M. (2000). Application profiles: mixing and matching metadata schemas. Ariadne, (25)
- The Singapore Framework for Application Profiles Note this is currently under revision by DCMI. You can catch up on their work here (and also see an example of use cases in the wild)
- Some examples of metadata application profiles:
Suggested Readings:
- Hebron, T. K. (2018). Extending and Adapting Metadata Audit Tools for Mountain West Digital Library Members Code4Lib Journal, (41)
- Curado Malta, M., Bermúdez Sabel, H., Baptista, A. A., & González-Blanco García, E. (2018). Validation of a metadata application profile domain model
- Stein, A., & Dunham, E. (2018). Meaningful Data Sharing: Developing the Illinois Data Bank Metadata Framework. Journal of Library Metadata, 18(2), 59-83
-
Due05/23/2021 23:59
Sunday -
Module05/24/2021
MondayRequired Readings:
- Course Content
- Allemang, D., & Hendler, J. (2011). Semantic web for the working ontologist: effective modeling in RDFS and OWL. Second Edition
- Read Chapter 1 for an introduction to SW’s concepts. If you are interested Chapter 2 gives a bit more detail on how the SW works, and Chapter 3 introduces RDF and knowledge modeling.
- Ontology Development 101 (Noy and McGuiness)
- Read Section 1 and 2; (3 and 4 are optional)
- Note - this is a classic formulation of what an ontology is and how to create one. The software they reference in building out the example is called Protege (free https://protege.stanford.edu/). If you are really keen you can follow along. (For reference - this short list from Wikipedia is quite helpful.)
- Ontology for Data Science
- Semantic Web for the Legal Domain
Suggested Readings:
-
Due05/30/2021 23:59
Sunday -
Module05/31/2021
MondayWeek 10: Emerging Topics[content]Required Reading:
-
Due06/03/2021 23:59
Thursday -
Due06/06/2021 23:59
Sunday