All in One View
Content from Introduction: The "Works on My Machine" Trap
Last updated on 2026-05-29 | Edit this page
Estimated time: 26 minutes
Overview
Questions
- Why does research software often fail to run on other machines?
- What is the “bus factor” problem in research software?
- How do FAIR4RS principles address these challenges?
Objectives
- Recognize common barriers to software reuse and reproducibility
- Understand how FAIR4RS principles apply to research software
- Identify the key components needed to make software citable and discoverable
- See what a “complete” FAIR research software project looks like
The “Works on My Machine” Trap
You send your code to a colleague…
- “It crashes on line 1” 😞
- “I can’t install the dependencies”
- “Which Python version did you use?”
Here’s what they see:
OUTPUT
$ python src/analysis.py
Traceback (most recent call last):
File "analysis.py", line 1, in <module>
import numpy as np
ModuleNotFoundError: No module named 'numpy'
This is where most research software lives. Fragile.
This concrete error message resonates with learners who have experienced dependency issues. Pause here to ask: “Has anyone experienced this problem?” This builds connection and motivation.
The Bus Factor Problem
The Problem: Code works perfectly on ONE laptop. If it disappears, the science is gone.
The Bus Factor: The number of people who need to be “hit by a bus” before your project becomes unmaintainable.
How common is this?
A 2021 analysis found that over 48% of research articles mention software, but consistent sharing and citation remains the exception. Most of that software is either unavailable, uncredited, or impossible to reproduce.
Why This Matters for Research
Research software that lives only on one person’s machine:
- Cannot be verified or reproduced
- Cannot be built upon by others
- Cannot receive proper academic credit
- Disappears when that person moves on
The Software Citation Problem
Even when software works, it often doesn’t get credited. Most researchers cite code by dropping a URL into a paper:
The wrong way
“We used the analysis script from https://github.com/jt14den/software-demo”
Problems:
- URLs break (link rot)
- No version specified (which run of the code produced which result?)
- No formal credit; the author’s name isn’t even visible
- Cannot appear in citation metrics
This isn’t hypothetical. Forges close.
- Gitorious shut down in 2015, making thousands of projects unreachable overnight
- Google Code shut down in 2016, same result
- A username change, a repo rename, or a deleted account breaks any URL-based citation just as completely
That citation is now dead.
The right way
Dennis, T. (2025). Biodiversity Analysis Toolkit (v0.1.0). Zenodo. https://doi.org/10.5281/zenodo.123456
Why this works:
- DOI is permanent; it survives username changes, repo moves, even if GitHub disappears
- Version is explicit, so someone can reproduce the exact run
- Author gets formal credit and citation metrics
The fix is a CITATION.cff file and a DOI, both of which
you’ll create in this lesson.
The Fix: FAIR4RS Principles
The solution is to apply FAIR Principles for Research Software (FAIR4RS):
F - Findable: Software and metadata are easy to discover → DOI, CITATION.cff, metadata
A - Accessible: Retrievable via standard protocols → Public GitHub, git clone, Zenodo
I - Interoperable: Exchanges data with other tools → Standard formats, documented dependencies
R - Reusable: Can be executed AND modified → LICENSE, README, environment files
What You’re Doing TODAY
In this workshop, you will transform a fragile research script into a FAIR software project by adding:
✅ License - So others can legally use your work ✅ Environment - So it runs on any machine ✅ Citation - So you get academic credit ✅ README - So people can find and use it
What “Done Right” Looks Like
The “Before” State (Branch: 01-start)
Check out the initial state of our demo repository:
You’ll see:
- ❌ No LICENSE
- ❌ Vague environment (
python = "*") - ❌ No citation information
- ❌ No DOI
- ❌ Minimal README
The “After” State (Branch: 06-metadata)
Now look at the final state:
Notice what’s been added:
- ✅ LICENSE (BSD-3-Clause)
- ✅ pixi.toml (documented environment)
- ✅ CITATION.cff (citation metadata)
- ✅ README.md (complete documentation)
- ✅ CONTRIBUTING.md (contribution guidelines)
- ✅ CODE_OF_CONDUCT.md (community standards)
- ✅ .zenodo.json (Zenodo metadata)
- ✅ DOI badge in README
This is what makes software Findable, Accessible, Interoperable, and Reusable.
Real-World Example: Spack
Want to see a production research software project that follows these principles?
Spack is used by national labs, universities, and supercomputing centers: https://github.com/spack/spack
Check their repository and notice:
- ✅ README with clear description and getting started guide
- ✅ LICENSE (MIT and Apache-2.0)
- ✅ CITATION.cff file
- ✅ Documentation
- ✅ Code of conduct and Contributing guide
Challenge: Reflect on Your Own Code
Think about a script or analysis you’ve written recently:
- Could someone else run it on their machine today? What would they need?
- If you moved to a new institution, could you still access and use it?
- If someone wanted to cite your code in a paper, how would they do it?
Share one challenge you’ve faced with code sharing or reuse.
Common challenges include:
- No documentation of dependencies or versions
- Code only works with specific file paths on your machine
- Missing LICENSE file (so others legally can’t reuse it)
- No citation information available
- Undocumented assumptions about the computing environment
All of these will be addressed in this lesson!
Challenge 2: Who Gets Credit?
You created a Python script used to generate results in a collaborative paper. The paper cites the dataset but not the script. What’s missing, and what would a proper software citation look like?
The script should be citable as a distinct research output. Without citation metadata or a DOI, the software contribution goes unrecognized and nobody can find or reproduce the exact version used.
A proper citation would include: author(s), software title, version,
repository or archive URL, and a persistent identifier (DOI). A
CITATION.cff file in the repository provides all of this
automatically.
What We’ll Build Together
Starting from the minimal 01-start state, you will
progressively add each component:
-
Episode 1: Sharing the demo repository
(understanding
01-start) -
Episode 2: Adding an open-source license
(
02-license) -
Episode 3: Managing environments with pixi
(
03-pixi) (optional) -
Episode 4: Creating a CITATION.cff file
(
04-citation) -
Episode 5: Minting a DOI and creating releases
(
05-release) -
Episode 6: Improving metadata and discoverability
(
06-metadata) - Episode 7: Wrap-up and reflection
Each episode builds on the previous one, following the branch progression in the demo repository.
- Research software often fails due to the “works on my machine” trap
- The bus factor problem means knowledge is lost when people leave
- Software citations using URLs break over time; DOIs are permanent and version-specific
- FAIR4RS principles (Findable, Accessible, Interoperable, Reusable) provide a framework
- Making software FAIR requires: LICENSE, environment files, citation metadata, and documentation
- A CITATION.cff file is the simplest way to make software citable
- The demo repository progresses through branches: 01-start → 06-metadata
Content from Sharing Research Software Effectively
Last updated on 2026-05-29 | Edit this page
Estimated time: 15 minutes
Overview
Questions
- Why share research software in a public repository?
- What are the minimum elements that make a repository useful and discoverable?
Objectives
- Describe why public repositories increase visibility and credit for research software.
- Identify the essential components of a well-structured repository.
- Recognize the “before” state of the demo repository used throughout the lesson.
Introduction
Publishing your research software in a public repository helps others find, understand, reuse, and cite your work. This visibility strengthens the transparency of your research process and increases the likelihood that you receive formal credit.
In this lesson, we start with a minimal example repository (branch
01-start). As you progress through the episodes, you will
progressively refine it until it is citable, discoverable, and ready for
reuse.
To demonstrate the “before and after” states of research software, use the provided automation script:
-
Locate the script:
create_demo_repo.shis in the root of the lesson repository. -
Run the script: Move it to a non-git directory
(e.g.,
~/projects/) and runbash create_demo_repo.sh. -
Progression: The script creates branches
(
01-startthrough06-metadata). You cangit checkoutthese branches during the lesson to show incremental progress. - GitHub Hosting: We recommend pushing this generated repository to your GitHub account before the workshop so learners can follow along online. Commands for pushing are printed at the end of the script.
Use progressive disclosure by showing only the
top-level structure of the demo repository first.
If learners are new to GitHub, you may display two contrasting
examples:
- a sparse, hard-to-understand repo (checkout branch
01-start) - a clear, well-documented repo (checkout branch
06-metadata)
The Starting Point
Learners will download or clone the “before” state of the example repository:
BASH
software-demo/
├── README.md
├── src/
│ └── analysis.py
└── environment.toml #(pixi environment file)
This repository intentionally lacks many elements of good
research software practice.
By the end of the lesson, it will include licensing, citation metadata,
improved discoverability information, and versioning.
Challenge 1: What Makes a Repository Reusable?
Could you think about the last time you tried to use someone else’s code?
If you have a public GitHub repository, open it now. If not, visit a repository from your field that you’ve used or seen cited.
Check for these elements:
Which of these elements help most? Which would you add to your own work first?
Useful elements commonly include:
- README with context and usage
- a clear file structure
- license information
- installation instructions
- dependencies or environment files
- contributors or authorship information
Missing pieces often include absent documentation, unclear purpose, or no license.
Learn More About Effective READMEs
Want to dive deeper into README best practices?
- Elegant READMEs - practical guide on writing clear, maintainable documentation
- Awesome README - curated examples from real projects
Full references available on the Reference page.
Challenge 2: Inventory the Demo Repository
Open the software-demo repository you downloaded.
Spend 2 minutes exploring, then answer:
- Could you run this code today? What’s missing?
- Would you know who created it or how to credit them?
- Could you legally reuse or modify it?
We’ll address these gaps together over the next episodes.
Learners may observe:
- README is minimal
- No license
- No citation file
- No metadata to support discoverability
- Environment file exists but is not yet introduced
These gaps will be filled across subsequent episodes.
- Public repositories increase findability, reuse potential, and
citation credit.
- A well-structured repository lowers the barrier for others to
understand your work.
- The lesson begins with a minimal “before” repository that will be incrementally improved.
Content from Choosing an Open-Source License
Last updated on 2026-05-29 | Edit this page
Estimated time: 30 minutes
Overview
Questions
- Why do you need a license for your code?
- How can an open-source license increase reuse and citation?
- What licenses does the UC system recommend?
Objectives
- Explain why unlicensed software is legally restricted
- Describe the main categories of open-source licenses
- Choose an appropriate license for a UC research project
- Add a license file to a GitHub repository
- Identify UC resources for licensing decisions
Episode Branch: 01-start →
02-license
Why licensing matters
When researchers publish code without a license, most people assume it is “public.” Legally, it is not. Copyright law applies automatically. Without a license, others cannot legally reuse, modify, or redistribute the code.
Clear licensing tells others what they can and cannot do with your code, which is the minimum needed for open, reproducible research. The UC OSPO License Guide covers UC institutional requirements.
Institutional Context: Who Owns Your Software?
At most universities, software created using institutional resources is owned by the institution, not the individual researcher. Before releasing code under an open-source license, check with your Technology Transfer or Intellectual Property office. They will verify ownership, funding requirements, and any third-party restrictions.
If you are at a UC campus: software is typically owned by The Regents of the University of California. Your campus Tech Transfer office can help you select from the UC-approved license list. (UC-specific)
At other institutions: check with your research computing, library, or legal office. Most will have a similar process and a list of preferred licenses.
Understanding license categories
Open-source licenses fall into two broad groups.
Permissive licenses
Examples: BSD, MIT, Apache 2.0
These allow broad reuse with minimal restrictions. Anyone can copy, modify, or redistribute the code. They are common in research because they’re simple and maximize flexibility.
BSD licenses are a common first choice at many research institutions because they:
- originated at UC Berkeley
- are simple to understand
- protect both the institution and authors
- integrate well with most other licenses
- have minimal restrictions
Copyleft licenses
Example: GPL 2.0
These require that derivative works also remain open-source. This protects openness across the lifecycle of a project.
Note on GPL 3.0
The UC system does not recommend GPL 3.0 for university-owned software due to patent provisions that may conflict with UC policies. If you need copyleft protection, consult your campus Tech Transfer office about GPL 2.0 or alternatives.
How to choose a license
Five “low-risk” licenses are suitable for most research projects. Here’s a decision guide:
graph TD
Start[Starting a new UC research software project?] --> Check{Do you have<br/>special requirements?}
Check -->|No special needs| BSD[Use BSD 3-Clause<br/>✓ Common research default<br/>✓ Simple and protective<br/>✓ Widely compatible]
Check -->|Need simpler text| MIT[Use MIT License<br/>✓ Nearly identical to BSD<br/>✓ Shorter, easier to read<br/>✓ Very popular]
Check -->|Industry partnership<br/>or patent concerns| Apache[Use Apache 2.0<br/>✓ Explicit patent protection<br/>✓ Detailed contribution terms<br/>✓ Industry-friendly]
Check -->|Educational focus| ECL[Consider ECL 2.0<br/>✓ Education-specific variant<br/>✓ Based on Apache 2.0]
BSD --> TTO[Verify with campus<br/>Tech Transfer Office]
MIT --> TTO
Apache --> TTO
ECL --> TTO
Check -->|Need copyleft| Copyleft{GPL version?}
Copyleft -->|GPL 2.0| GPL2[May be acceptable<br/>Consult Tech Transfer]
Copyleft -->|GPL 3.0| GPL3[❌ Not recommended by UC<br/>Patent conflicts]
GPL2 --> TTO
GPL3 --> TTO
style BSD fill:#90EE90
style MIT fill:#90EE90
style Apache fill:#90EE90
style ECL fill:#90EE90
style GPL2 fill:#FFFF99
style GPL3 fill:#FFB6C6
style TTO fill:#87CEEB
Quick reference
| Your need | Recommended license | SPDX identifier | Why |
|---|---|---|---|
| Default / most projects | BSD 3-Clause | BSD-3-Clause |
Common default at research institutions |
| Simplest possible | MIT | MIT |
Minimal text, very popular |
| Industry collaboration | Apache 2.0 | Apache-2.0 |
Explicit patent terms |
| Educational focus | ECL 2.0 | ECL-2.0 |
Education-specific variant |
The SPDX identifier is the short, machine-readable
code used by GitHub, Zenodo, and your CITATION.cff file to
communicate your license automatically. When GitHub shows a license
badge in the sidebar, it’s reading the SPDX identifier.
Always consult your institution’s Tech Transfer or IP office before releasing software created with institutional resources.
What about data and documentation?
Software licenses (BSD, MIT, Apache) are written for executable code. If your repository also contains datasets, figures, or documentation, those files need a separate license.
The standard choice for research outputs is Creative Commons Attribution 4.0 (CC BY 4.0), which allows broad reuse with attribution.
A common pattern:
-
/srcor your code files →BSD-3-ClauseorMIT -
/dataor/docs→CC-BY-4.0
You can note this split in your README and in
CITATION.cff under the license field, which
accepts a list:
Most research repositories don’t need this, but if you’re sharing a dataset alongside code, it’s worth thinking through.
Resources
- ChooseALicense.com – Compare features across all common licenses.
- SPDX License List – Authoritative registry of license identifiers used in CITATION.cff and package metadata.
- UC OSPO License Guide (UC-specific) – UC institutional requirements and templates.
- UC OSS Chart and Companion Guide (UC-specific) – UC-approved “low-risk” license list.
Challenge: Add a BSD License to Your Repository
We will add the BSD 3-Clause license to your demo repository:
- Navigate to your repository on GitHub.
- Click Add file → Create new file.
- Name it
LICENSE(orLICENSE.txt). - Click Choose a license template and select BSD 3-Clause License.
- Update the copyright holder to reflect who owns the software. At UC
campuses this is
The Regents of the University of California; at other institutions check with your Tech Transfer office. (If this is a personal project, use your own name.) - Update the year to 2026.
- Commit the file to your
mainbranch.
Verify: Does your repository now display the “BSD-3-Clause” license badge in the sidebar?
GitHub automatically detects the LICENSE file and
displays it in the sidebar. Your file should look like this:
BSD 3-Clause License
Copyright (c) 2026, The Regents of the University of California
All rights reserved.
If the badge doesn’t appear, ensure the file is in the root directory
and named exactly LICENSE or LICENSE.txt.
Communicating your license
After adding a LICENSE file, reference it in your README so users immediately understand usage terms.
Add this section near the top of your README:
MARKDOWN
## License
This project is licensed under the BSD 3-Clause License - see the [LICENSE](LICENSE) file for details.
Why this matters: Users reading your README on platforms other than GitHub (Zenodo, email, exported PDFs) will see your license terms even without GitHub’s automatic detection.
Exercise: License Scenarios
Which license would you recommend for each UC research scenario?
Scenario 1: A Python package for ecological data analysis. You want maximum adoption across academia and industry.
Scenario 2: A data visualization tool developed with a biotech partner who may commercialize derivatives.
Scenario 3: A simple utility script you’re sharing with collaborators.
Scenario 1: BSD 3-Clause (UC’s default recommendation, maximum flexibility and adoption)
Scenario 2: Apache 2.0 (explicit patent protection important for industry partnerships)
Scenario 3: Either BSD 3-Clause or MIT (both work well for simple sharing; BSD preferred by UC)
In all cases, verify with your campus Tech Transfer office before releasing.
Summary
Licensing is foundational to making research software usable, citable, and shareable. In this episode, you added a BSD license to a repository following UC recommendations.
- Without a license, software is legally restricted and not reusable
- BSD 3-Clause is a common default at research institutions; MIT and Apache 2.0 are strong alternatives
- Permissive licenses (BSD, MIT, Apache 2.0) maximize flexibility and adoption
- Always consult your institution’s Tech Transfer or IP office before releasing institutionally-owned software
- GitHub makes adding standard licenses straightforward
Content from Adding a CITATION.cff File
Last updated on 2026-05-29 | Edit this page
Estimated time: 28 minutes
Overview
Questions
- What is a CITATION.cff file and why does it matter?
- How does GitHub use CITATION.cff to generate ready-made citations?
- What minimal metadata should researchers include?
Objectives
- Explain the role of CITATION.cff in software citation.
- Create and customize a CITATION.cff file in a GitHub repository.
- Understand how the file connects to later steps like releases and DOIs.
Episode Branch: 02-license →
04-citation
This episode adds a CITATION.cff file.
Starting point:
After this episode:
Catch-up point: If joining now, run
git checkout 02-license
Note: the 04-citation branch also contains pixi
environment files from the optional reproducibility episode. You can
ignore those files for now.
Introduction
A CITATION.cff file is the simplest, most direct way to make your software citable.
It provides structured citation metadata that:
- tells others how to reference your work
- allows GitHub to display a “Cite this repository”
button
- supports good scholarly practice and FAIR4RS principles
You can create this file before releases or
DOIs.
If you later add a DOI or version tag, you can update the file at any
time.
Show learners what the citation panel looks like on a GitHub
repository that already has a CITATION.cff file. This gives
them a clear target and reduces cognitive load.
Reassure learners that a tiny file is fine. They can refine it later as their software matures.
What belongs in a CITATION.cff file?
A minimal file includes:
-
title of the software
-
authors (ORCID recommended if available)
-
version (optional at this stage)
- message with basic instructions
As your project grows, you can add:
- release versions
- DOIs from Zenodo or another service
- keywords
- abstract
- repository URLs
Learners do not need to know the entire schema.
The point is to start small and publish useful metadata early.
Why the CFF format works well
- human-readable
- YAML-based
- validated automatically by GitHub
- supported by tools including Zotero, Zenodo, and reference managers
How to Create a CITATION.cff File
You have two options:
Option 1: Use cffinit (Recommended for beginners)
The CITATION.cff community provides a web-based wizard:
Use cffinit:
- Visit: https://citation-file-format.github.io/cffinit/
- Fill out the form with your software details
- Download the generated
CITATION.cfffile - Add it to your repository root
Benefits:
- Interactive form guides you through required fields
- Validates your file automatically
- No need to memorize YAML syntax
- More information on CFF: https://citation-file-format.github.io/
Option 2: Create it manually
If you prefer to write it yourself:
Step 1: Create the file
In the root of your repository:
Step 2: Add minimal metadata
Here’s a complete example from the slides:
YAML
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Dennis"
given-names: "Tim"
orcid: "https://orcid.org/0000-0002-1234-5678"
title: "Biodiversity Analysis Toolkit"
version: 0.1.0
date-released: 2025-01-15
url: "https://github.com/jt14den/software-demo"
GitHub creates a “Cite this repository” button automatically!
If your software does not yet have version tags, you may omit the version field until Episode 4 when you create releases.
Linking to a published paper
Many researchers want users to cite a journal article
alongside or instead of the raw software repository.
The preferred-citation field handles this: it tells GitHub,
Zotero, and other tools which reference to show first.
YAML
cff-version: 1.2.0
message: "If you use this software, please cite the paper below."
authors:
- family-names: "Dennis"
given-names: "Tim"
orcid: "https://orcid.org/0000-0002-1234-5678"
title: "Biodiversity Analysis Toolkit"
version: 0.1.0
date-released: 2025-01-15
url: "https://github.com/jt14den/software-demo"
preferred-citation:
type: article
title: "Biodiversity Analysis at Scale: Methods and Software"
authors:
- family-names: "Dennis"
given-names: "Tim"
journal: "Journal of Open Source Software"
year: 2025
doi: "10.21105/joss.00000"
Without preferred-citation, GitHub shows the software
repository citation by default. Adding it ensures that anyone clicking
“Cite this repository” gets your paper’s citation instead, which is
usually what you want for impact tracking.
No paper yet? Skip it.
Leave out preferred-citation if you don’t have a
published article. You can add it later. The rest of the file works fine
without it.
Step 3: Commit and refresh
After you commit the file, GitHub:
- parses and validates it
- displays a “Cite this repository” panel
- provides download options (BibTeX, EndNote, CFF, APA)
This feature works even without a DOI.
Exercise 1: Identify missing metadata
Look at your repository (or the example repository provided with the lesson).
Reflect:
- What metadata is easy to add today?
- What might require input from collaborators?
- What do you prefer to add later?
Share one observation.
Typical missing pieces include:
- ORCID IDs
- complete contributor list
- description or abstract
- license information
- DOI (added later if desired)
Exercise 2: Add a CITATION.cff file
Steps:
- Create CITATION.cff.
- Add at least: title, author(s), and message.
- Commit and refresh to see GitHub’s citation panel.
- A CITATION.cff file is the foundation of software citation.
- It can be added before releases, DOIs, or version tags.
- GitHub displays machine-readable citations automatically when this file is present.
- Start simple and expand over time as your project develops.
Content from Making Your Software Citable
Last updated on 2026-05-29 | Edit this page
Estimated time: 32 minutes
Overview
Questions
- What makes software citable?
- How do releases and DOIs strengthen software citation?
- How do I create a release and, optionally, mint a DOI?
Objectives
- Describe why software citation matters in research.
- Create a versioned release in GitHub.
- Understand when and why to mint a DOI with Zenodo.
Episode Branch: 04-citation →
05-release
Introduction
Software is a research product.
Like articles and datasets, it should be cited so
others can acknowledge your work, find the exact version you used, and
understand how your software contributed to their results.
Your software becomes citable as soon as it includes:
-
Structured citation metadata, such as a
CITATION.cfffile.
-
A public location where the code is
available.
- A stable version someone can reference.
A DOI is optional but valuable. It strengthens citability by giving each version a persistent identifier.
So far in this lesson, you have:
- shared a public repository
- added a license
- added a
CITATION.cfffile
In this episode, you will create a GitHub release and learn how DOIs fit into software citation workflows.
Some learners may feel anxious about creating a DOI.
Reassure them that:
- a DOI is not required for citability
- Zenodo is free and widely used
- they can watch the demo and complete the steps later
What Makes Software Citable?
Software is citable when:
- it includes authorship and version information
- the referenced version is stable
- others can access the code
A DOI enhances these qualities, but does not define them.
Why add a DOI?
A DOI is helpful for:
- increasing visibility and discoverability
- long-term persistence
- citing exact versions
- meeting journal and funder expectations
But the core citability comes from your metadata and release process.
Create a Release in GitHub
A release captures a specific version of your software.
It is the snapshot that others can cite.
Steps
- Open your GitHub repository.
- Select Releases → Draft a new release.
- Create a tag such as
v0.1.0.
- Add release notes summarizing changes.
- Publish the release.
Your CITATION.cff file will automatically reference this
tagged version unless you specify otherwise.
Semantic Versioning (SemVer)
You might wonder why we chose v0.1.0. This follows
Semantic Versioning
(MAJOR.MINOR.PATCH):
-
MAJOR version when you make incompatible API
changes (e.g.,
1.0.0) -
MINOR version when you add functionality in a
backward compatible manner (e.g.,
0.1.0->0.2.0) -
PATCH version when you make backward compatible bug
fixes (e.g.,
0.1.1)
Starting with 0.x.x indicates your software is in
initial development and the API is not yet stable.
Challenge 1: What belongs in a release?
Take one minute to reflect:
- What information would help a future you understand what changed in this version?
Useful release notes include:
- what changed
- what was added or removed
- what bugs were fixed
- what might break for users
- anything important about reproducibility
Clear release notes help both people and tools interpret your software’s evolution.
Minting a DOI with Zenodo Sandbox
To practice minting a DOI without polluting the permanent scholarly record, we will use Zenodo Sandbox. It works exactly like the real Zenodo but is for testing.
The Complete 6-Step Workflow
Step 1: Log in to Zenodo with GitHub
- Visit https://sandbox.zenodo.org
- Click “Log in with GitHub”
- Authorize Zenodo to access your repositories
Step 2: Enable your repository (toggle ON)
- Go to Settings → GitHub in Zenodo Sandbox
- Find your repository in the list
- Toggle the switch to ON (green)
- This tells Zenodo to watch for new releases
Step 3: Create GitHub Release (tag v1.0.0)
- Go to your GitHub repository
- Click Releases → Draft a new release
- Create a tag:
v1.0.0 - Add release notes describing what’s in this version
- Click Publish release
Step 4: Zenodo auto-archives and mints DOI
- Zenodo automatically detects your new release
- Creates an archived snapshot
- Assigns a permanent DOI
- Wait a few minutes for processing
Step 5: Add DOI badge to your README
- Copy the DOI badge from your Zenodo record
- Add it to the top of your README:
MARKDOWN
[](https://doi.org/10.5281/zenodo.123456)
Step 6: Update your CFF file with your DOI
- Add the DOI to your
CITATION.cff:
Result: You now have LICENSE, CITATION.cff, and DOI.
Now You Have Everything
✅ F - Findable: Added DOI, CITATION.cff, rich metadata ✅ A - Accessible: Public GitHub, archived on Zenodo ✅ I - Interoperable: Standard formats (YAML, CFF) ✅ R - Reusable: LICENSE (BSD-3), README with setup
Even if GitHub disappears, your DOI still works.
Going further: Software Heritage
Zenodo archives a snapshot of your code at release time. Software Heritage goes further: it continuously crawls GitHub, GitLab, and other forges and archives everything, assigning a SWHID (Software Heritage Identifier) to every file, directory, commit, and release.
A SWHID looks like this:
swh:1:rel:22ece559cc7cc2364edc5e5593d63ae8bd229f9f
It points to an exact, immutable snapshot that survives forge closures (Gitorious shut down in 2015; Google Code in 2016, both making thousands of repos unreachable). Your Zenodo DOI is the right identifier for citation; a SWHID is the long-term preservation record.
To find and record your SWHID:
- Go to https://archive.softwareheritage.org/
- Paste your GitHub repository URL into the search box
- If your repo is already archived, copy the SWHID for your release
- If it hasn’t been crawled yet, click Save code now to trigger immediate archival
- Once archived, add the SWHID to your
CITATION.cff:
This approach is recommended in the 2026 CODE Beyond FAIR roadmap (Di Cosmo et al., Scientific Data).
Critical: Ensure learners use sandbox.zenodo.org. Real DOIs cannot be deleted.
Challenge 2: Why are versioned DOIs important?
Why might it matter that a DOI points to a specific release rather than the whole project?
Versioned DOIs:
- support reproducibility by identifying an exact snapshot
- prevent confusion when the software changes
- allow users to cite precisely the version they used
- support FAIR4RS and publisher guidelines
- Software is citable as soon as it includes a citation file and a
stable version.
- GitHub releases create versioned snapshots for citation.
- DOIs are optional but strengthen discoverability, persistence, and
reproducibility.
- Zenodo can automatically mint a DOI for each GitHub release.
Content from Managing Reproducible Environments with pixi
Last updated on 2026-05-29 | Edit this page
Estimated time: 24 minutes
Optional Episode
This episode covers environment management using pixi.
It is optional; you can skip it and move directly to Improving Metadata and
Discoverability.
If you skip this episode, you will still complete all citation steps.
The pixi.toml and pixi.lock files you see in
the demo repo branches were added here; you can ignore them.
Other environment tools: conda,
mamba, pip/venv, and
renv (for R) all serve the same purpose. The concepts here
apply to any environment manager; pixi is used because it handles
Python, R, and other languages with a single tool and generates an
automatic lockfile.
Overview
Questions
- Why do software projects need well defined environments?
- How can
pixihelp learners run the same code the developer used? - How does environment management improve the reproducibility and citability of research software?
Objectives
- Explain why environment definition is central to reproducible research.
- Create a minimal
pixi.tomlfile for a project. - Use
pixito run Python or R code inside a clean, isolated environment. - Describe how environment files support FAIR software and citation practices.
Episode Branch: 03-pixi
This optional episode explores the environment management layer of the demo repository.
To follow along:
This branch sits between 02-license and
04-citation in the demo repo history, but you can explore
it at any point in the lesson.
This episode appears after the release/DOI episode in the lesson
order. The demo repository branch 03-pixi sits earlier in
the repo history (between 02-license and
04-citation), so the branch number and lesson position
don’t match, which is expected.
How to handle the branch:
Have learners check out 03-pixi to follow along. They
are exploring it as a standalone example, not building on it. After this
episode, direct them back to 05-release to continue with
the metadata episode.
BASH
# During this episode
git checkout 03-pixi
# After this episode, return here to continue
git checkout 05-release
If skipping this episode: Learners who went from
02-license straight to 04-citation will
already have pixi files in their repo (they were baked into the branch).
Acknowledge this briefly when they encounter pixi.toml or
pixi.lock in the metadata episode: “Those files are
from the optional environment management episode; you can leave them
as-is.”
Pixi not installed? Learners can follow the concepts without a working pixi installation. The key idea is that a lockfile pins exact dependency versions. That’s the transferable lesson, not the tool itself.
Why Environments Matter
The Problem: Research software often “works on my machine” and nowhere else.
Code rots. Python updates, packages break, and 6 months from now, your script won’t run.
Different operating systems, outdated packages, and mismatched library versions frequently break code.
What Environment Management Captures
Environment management reduces this friction because it captures:
- The exact language versions used
- Required packages
- The dependency set needed to run the software
- Instructions for reproducing the execution environment
The Payoff: We aren’t just shipping code; we’re shipping the computer state needed to run it.
Why pixi?
pixi is a modern, fast environment manager that works
for Python, R, and many other languages. We use it in this lesson
because it is:
- Cross-platform: Works on macOS, Linux, Windows
- Fast: Faster than Conda
-
Automatic lockfiles: Creates
pixi.lockautomatically, guaranteeing everyone runs the exact same versions of every package - Multi-language: Supports Python, R, and more
FAIR Connection: Standard formats + clear dependencies = Interoperable & Reusable software
Installing pixi
Full installation docs: https://pixi.sh
Common installation for macOS or Linux:
Windows users can install via MSI installer or
winget.
pixi includes its own language runtimes.
Learners do not need preinstalled Python, R, compilers, or system
packages.
Creating a New pixi Project
Create a directory and initialize a pixi project:
This creates a pixi.toml file, which documents your
environment.
Keeping Repositories Clean (.gitignore)
When you run pixi init, it automatically creates a
.gitignore file. This file tells Git which files to
ignore.
For pixi, this is critical because it creates a hidden
folder .pixi/ containing thousands of environment files.
You never want to commit this folder to GitHub. It is
large, platform-specific, and can be regenerated by anyone using your
pixi.lock file.
Always check your .gitignore to ensure
generated files (like .DS_Store, __pycache__,
or data outputs) are not accidentally shared.
Add Python:
Add the NumPy package:
Add R and one package:
Your pixi.toml is now a reproducible record of all
dependencies needed for the software.
What’s Inside pixi.toml?
Here’s what the file looks like (this will be automatically created for you):
TOML
[workspace]
authors = ["Leigh Phan <leighphan@ucla.edu>"]
channels = ["conda-forge"]
name = "myproject"
platforms = ["osx-arm64"]
version = "0.1.0"
[tasks]
[dependencies]
python = ">=3.14.3,<3.15"
numpy = ">=2.4.2,<3"
r = ">=4.5,<4.6"
r-dplyr = ">=1.2.0,<2"
The pixi.toml file is now a reproducible record
of all dependencies needed for the software.
When you run pixi install, it also creates a
pixi.lock file with exact versions locked:
pixi.lock contains:
python = "3.14.3"
numpy = "2.4.2"
r = "4.5.4"
r-dplyr = "1.2.0"
+ 47 other dependencies
This lockfile guarantees byte-for-byte reproducibility.
Running Code With pixi
Run Python code:
Run R code:
Every command is executed inside the environment described
by pixi.toml.
This makes it easier for others to test, cite, extend, and build upon your work.
How Environments Support Citation and Reuse
A reusable research software project contains not only code, but:
- licensing
- authorship and citation metadata (
CITATION.cff)
- version information
- a documented environment
Including pixi.toml in your repository or DOI deposit
helps future readers:
- recreate the execution environment
- verify results
- adapt your code for new analyses
- evaluate whether the software is FAIR (Findable, Accessible, Interoperable, Reusable)
When publishing your software, include:
the
pixi.tomlfile-
instructions such as:
Challenge: Add a new dependency
Use pixi to add pandas or
r-ggplot2 to your project.
What changed in your pixi.toml file?
- Reproducible environments reduce troubleshooting and support more reusable software.
-
pixiprovides fast, cross platform environment management. - The
pixi.tomlfile acts as documentation that supports citation and FAIR4RS principles. - Use
pixi runto execute Python or R code inside a reproducible environment.
Content from Improving Metadata and Discoverability
Last updated on 2026-05-29 | Edit this page
Estimated time: 37 minutes
Overview
Questions
- What metadata makes research software easier to find and reuse?
- How can we improve discoverability across GitHub, Zenodo, and scholarly indexes?
Objectives
- Identify key metadata elements that increase visibility and reuse.
- Enhance discoverability using GitHub features and Zenodo metadata fields.
- Connect metadata across CITATION.cff, GitHub, and your DOI record for consistency.
Episode Branch: 05-release →
06-metadata
Introduction
Clear metadata helps others understand, evaluate, and find
your software.
It also reduces the cognitive effort for future users because essential
information is organized and easy to locate.
In earlier episodes, you created:
- a CITATION.cff file
- a license
- a repository structure
- a Zenodo record with a DOI
This episode brings these together. You will describe your project in consistent ways across platforms so search engines, citation tools, and colleagues can discover it.
Encourage learners to compare well-described repositories with sparse
ones.
Highlight how even small metadata additions increase visibility in
GitHub search, Zenodo indexing, and DataCite services.
What counts as useful metadata?
Good metadata answers predictable questions with minimal effort from the reader:
- What is this software? (short description or abstract)
- Who made it? (authors, ORCIDs)
- How do I cite it? (CITATION.cff + DOI)
- What domain is it for? (keywords)
-
What else does it relate to?
- related article DOI
- datasets used
- funding source
- project website
- related article DOI
You may add these in multiple places, but they should remain consistent.
GitHub-specific discoverability features
GitHub uses structured metadata to improve search ranking and cross-repository linking.
Add these items in
Settings → General → Topics:
- discipline tags (e.g.,
geospatial,text-mining,materials-science) - methodological tags (
simulation,visualization,machine-learning) - language tags (
python,r)
Writing an Effective README
The 30-Second Rule
Your README is your software’s front door.
If users can’t understand what it does, how to install it, or how to use it in 30 seconds → they leave.
README Structure (7 Essential Sections)
The UC OSPO README Guide (UC-specific) recommends this standard structure:
- About: What does this do? (2-3 sentences)
- Features: Key capabilities (Reproducible, Citable, Open source)
- Getting Started: Prerequisites + installation
- Usage: Minimal working example
- Citation: Link to CITATION.cff or DOI
- License: Explicitly state terms (e.g., “BSD-3 - see LICENSE file”)
- Contact: How to get help
Citation
[zenodo.org/badge/DOI/10.5281/zenodo.123456.svg]
License: BDS-3 - see LICENSE file ```
Professional. Citable. Usable.
README Best Practices: 5 Quick Tips
- Clear description → Answer “What problem does this solve?”
- Show, don’t tell → Include code examples
- Link metadata → Add DOI badge, link CITATION.cff
- Keep updated → Refresh when features change
- Use a template → UC OSPO Templates (UC-specific) or Awesome README
Every tip maps to FAIR principles.
Don’t Reinvent the Wheel
- Awesome README: curated examples from real open-source projects
- UC OSPO README Template (UC-specific): ready-to-use template
It is also critical to link your metadata:
- Add a badge for your Zenodo DOI
- Link to your
CITATION.cfffile
A structured README ensures researchers can quickly evaluate and use your software.
Beyond the README: Community Health Files
Beyond technical metadata, files that describe how to interact with your project matter for long-term sustainability and for signaling that the project is welcoming.
GitHub looks for these files:
CONTRIBUTING.md → How to contribute
The CONTRIBUTING.md file is the first place new
contributors look to see if a project is open to participation.
Following a contributing guide template (the UC
OSPO Contributing Guide (UC-specific) is one good example)
ensures you cover essential ground:
- Welcome Statement: Explicitly inviting others to join
- Ways to Contribute: Identifying non-code contributions (e.g., documentation, testing, issues)
- Setup Instructions: How to get the project running locally
- Pull Request Lifecycle: What happens after a contribution is submitted
CODE_OF_CONDUCT.md → Behavioral standards
A CODE_OF_CONDUCT.md establishes behavioral expectations
and ensures a safe, inclusive environment for all researchers. The
standard choice is the Contributor Covenant,
widely adopted across open-source projects. (See also: UC
OSPO Code of Conduct Guide (UC-specific))
CHANGELOG.md → Version history
A CHANGELOG.md documents what changed between versions.
This helps users understand:
- What’s new in each release
- What bugs were fixed
- What breaking changes occurred
- How the software evolved over time
Why it matters: Signals your project is professionally managed and welcoming.
Templates Available
- Choose a License: license selection
- Contributor Covenant: code of conduct template
- Keep a Changelog: changelog format guide
- UC OSPO Template Repository (UC-specific): ready-to-use CONTRIBUTING.md, CODE_OF_CONDUCT.md, CHANGELOG.md, README.md
Adding these files to your repository root helps GitHub display a “Community Standards” checklist in your insights, signaling that your project is professionally managed and ready for collaboration.
Institutional Repositories: Dataverse, Dryad, and Zenodo
Many institutions use repositories like Dataverse or Dryad for research data deposits. These are good for datasets but have limited software support; they don’t integrate with GitHub releases or mint version-specific DOIs automatically.
For software, Zenodo is the recommended deposit location because:
- It integrates directly with GitHub to archive each release automatically
- It mints a DOI for every version
- Records flow into DataCite and are indexed by Google Scholar and library catalogs
- It’s free, CERN-operated, and widely recognized by journals and funders
What about your institution’s repository?
If your institution requires or prefers a local IR (Dataverse instance, DSpace, etc.), you can deposit there in addition to Zenodo, not instead of it. Use the Zenodo DOI as the persistent identifier in your CITATION.cff, and note the institutional deposit in your README or Zenodo metadata as a related work.
Some funders (NSF, NIH, Wellcome Trust) have specific deposit requirements. Check your award terms before deciding where the authoritative copy lives.
Zenodo and DOI metadata
When you deposit software on Zenodo, the record flows into:
- DataCite: the DOI registration agency for research data and software; DataCite records are harvested by library catalogs, institutional discovery systems, and tools like OpenAlex and Scholix
- Google Scholar: picks up Zenodo records with structured metadata
- Library catalogs: many discovery layers (EBSCO, Ex Libris Primo, OCLC WorldCat) harvest DataCite metadata, meaning your software can appear in a library search alongside journal articles
- Domain repositories that harvest DOIs
What this means practically: the metadata you put into your Zenodo record is the metadata that librarians and discovery systems see. Thin metadata (no description, no keywords, no author ORCIDs) limits findability even if the DOI is valid.
Add or refine:
- authors + ORCIDs
- keywords (discipline tags, method tags, language tags; use the same ones you add to GitHub Topics)
- related works (link to the paper that used this software, the dataset it analyzes, the grant that funded it)
- funding references
- version notes
- a readable software description (2-3 sentences; think abstract, not README)
Your goal is context. A researcher or a librarian helping a researcher should be able to read the Zenodo record and decide in 30 seconds whether this software is relevant to them.
Challenge 1: Identify Useful Metadata
List three elements you would add to improve a repository’s discoverability.
Examples:
- ORCID IDs for each author
- Keywords describing the domain and function
- A link to a related article or dataset
- A project description in your README
- Funding acknowledgment
Challenge 2: Improve Your Zenodo Record
After generating a DOI in the earlier episode, expand its metadata:
- Open your Zenodo record.
- Select “Edit.”
- Add:
- keywords
- authors and ORCIDs
- description
- related publication DOIs
- funding
- keywords
- Save and publish the updated record.
- Metadata increases discoverability in GitHub, Zenodo, and scholarly
indexes.
- Use consistent information across CITATION.cff, README, GitHub
topics, and Zenodo.
- Thoughtful metadata supports FAIR principles and helps others reuse your software.
Content from Wrap-Up and Reflection
Last updated on 2026-05-29 | Edit this page
Estimated time: 17 minutes
Overview
Questions
- What small steps can make your research software more citable and discoverable?
- How can you apply these practices to your current or future projects?
Objectives
- Reflect on the practical steps taken during the session
- Identify at least one improvement to apply in your own software projects
Your FAIR4RS Checklist
Congratulations! You’ve transformed fragile research software into a FAIR software project.
What We Covered Today:
✅ F - Findable: Added DOI, CITATION.cff, rich metadata ✅ A - Accessible: Public GitHub, archived on Zenodo ✅ I - Interoperable: Standard formats (YAML, CFF), documented dependencies (pixi.toml) ✅ R - Reusable: LICENSE (BSD-3), README with setup, environment reproducibility
From Fragile to FAIR
Before (Branch: 01-start):
- ❌ No LICENSE
- ❌ No environment
- ❌ No citation
- ❌ No DOI
After (Branch: 06-metadata):
- ✅ LICENSE (BSD-3)
- ✅ Environment (pixi.toml)
- ✅ CITATION.cff
- ✅ DOI from Zenodo
- ✅ README with documentation
- ✅ Community health files
Your software is now citable, discoverable, and reusable.
Introduction
Over this session, you’ve learned how to make your research software more visible, citable, and impactful. These small, practical steps support scholarly communication, reproducibility, and the FAIR principles.
Use this time to reflect on what you’ve learned and decide on one action you’ll take with your own project.
Give learners a few minutes of quiet reflection, then facilitate a group discussion. Invite volunteers to share a step they plan to take next.
Challenge 1: Choose Your Next Step
Which of the practices from today’s session will you apply to a current or future project?
Answers may vary: making a repo public, adding a license, archiving on Zenodo, writing a README, creating a CITATION file, etc.
Challenge 2: Find a FAIR Win
Think of one thing you can do in under 30 minutes to make your software more FAIR.
Examples:
- Add a LICENSE file
- Write a short README
- Register your ORCID on Zenodo
- Create a GitHub release
Resources to Take With You
Lesson Materials
- Lesson repository: https://github.com/UC-OSPO-Network/research-software-citable-discoverable
Tools
- CITATION.cff Helper: https://citation-file-format.github.io/cff-initializer-javascript/
- Zenodo: https://zenodo.org
- Pixi: https://pixi.sh
- Choose a License: https://choosealicense.com
UC-Specific Resources (for UC campus learners)
- UC OSS Chart: https://security.ucop.edu/files/documents/resources/oss-chart.pdf
- UC OSPO License Guide: https://ucospo.net/oss-resources/template-guides/license-guide/
- UC OSPO Templates: https://github.com/UC-OSPO-Network/templates
- UC Open Source Program Office (OSPO) Network: https://ucospo.net/
General Open Source Resources
- Open Source Initiative (OSI): https://opensource.org/licenses (authoritative license registry)
- FAIR4RS Principles: https://doi.org/10.15497/RDA00068 (RDA/FORCE11/ReSA paper)
- Software Citation Principles: https://doi.org/10.7717/peerj-cs.86 (Smith et al. 2016)
- CODE Beyond FAIR: https://doi.org/10.1038/s41597-026-06705-6 (Di Cosmo et al. 2026; covers Software Heritage, institutional roles, and the library’s part in software metadata)
- Software Heritage Archive: https://www.softwareheritage.org/ (universal source code archive; assigns SWHIDs for long-term preservation)
- You’ve successfully made your software FAIR: Findable, Accessible, Interoperable, and Reusable
- Even small actions can significantly improve your software’s impact
- Making code citable and discoverable benefits both you and the research community
- Start with one change, then build from there
- Use the UC OSPO resources and templates to streamline the process