All in One View

Content from Introduction: The "Works on My Machine" Trap


Last updated on 2026-05-29 | Edit this page

Overview

Questions

  • Why does research software often fail to run on other machines?
  • What is the “bus factor” problem in research software?
  • How do FAIR4RS principles address these challenges?

Objectives

  • Recognize common barriers to software reuse and reproducibility
  • Understand how FAIR4RS principles apply to research software
  • Identify the key components needed to make software citable and discoverable
  • See what a “complete” FAIR research software project looks like

The “Works on My Machine” Trap


You send your code to a colleague…

  • “It crashes on line 1” 😞
  • “I can’t install the dependencies”
  • “Which Python version did you use?”

Here’s what they see:

OUTPUT

$ python src/analysis.py

Traceback (most recent call last):
  File "analysis.py", line 1, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'

This is where most research software lives. Fragile.

The Bus Factor Problem


The Problem: Code works perfectly on ONE laptop. If it disappears, the science is gone.

The Bus Factor: The number of people who need to be “hit by a bus” before your project becomes unmaintainable.

Callout

How common is this?

A 2021 analysis found that over 48% of research articles mention software, but consistent sharing and citation remains the exception. Most of that software is either unavailable, uncredited, or impossible to reproduce.

Callout

Why This Matters for Research

Research software that lives only on one person’s machine:

  • Cannot be verified or reproduced
  • Cannot be built upon by others
  • Cannot receive proper academic credit
  • Disappears when that person moves on

The Software Citation Problem


Even when software works, it often doesn’t get credited. Most researchers cite code by dropping a URL into a paper:

The wrong way

“We used the analysis script from https://github.com/jt14den/software-demo

Problems:

  • URLs break (link rot)
  • No version specified (which run of the code produced which result?)
  • No formal credit; the author’s name isn’t even visible
  • Cannot appear in citation metrics

This isn’t hypothetical. Forges close.

  • Gitorious shut down in 2015, making thousands of projects unreachable overnight
  • Google Code shut down in 2016, same result
  • A username change, a repo rename, or a deleted account breaks any URL-based citation just as completely

That citation is now dead.

The right way

Dennis, T. (2025). Biodiversity Analysis Toolkit (v0.1.0). Zenodo. https://doi.org/10.5281/zenodo.123456

Why this works:

  • DOI is permanent; it survives username changes, repo moves, even if GitHub disappears
  • Version is explicit, so someone can reproduce the exact run
  • Author gets formal credit and citation metrics

The fix is a CITATION.cff file and a DOI, both of which you’ll create in this lesson.

The Fix: FAIR4RS Principles


The solution is to apply FAIR Principles for Research Software (FAIR4RS):

F - Findable: Software and metadata are easy to discover → DOI, CITATION.cff, metadata

A - Accessible: Retrievable via standard protocols → Public GitHub, git clone, Zenodo

I - Interoperable: Exchanges data with other tools → Standard formats, documented dependencies

R - Reusable: Can be executed AND modified → LICENSE, README, environment files

Callout

What You’re Doing TODAY

In this workshop, you will transform a fragile research script into a FAIR software project by adding:

License - So others can legally use your work ✅ Environment - So it runs on any machine ✅ Citation - So you get academic credit ✅ README - So people can find and use it

What “Done Right” Looks Like


The “Before” State (Branch: 01-start)

Check out the initial state of our demo repository:

BASH

cd software-demo
git checkout 01-start

You’ll see:

  • ❌ No LICENSE
  • ❌ Vague environment (python = "*")
  • ❌ No citation information
  • ❌ No DOI
  • ❌ Minimal README

The “After” State (Branch: 06-metadata)

Now look at the final state:

BASH

git checkout 06-metadata

Notice what’s been added:

  • LICENSE (BSD-3-Clause)
  • pixi.toml (documented environment)
  • CITATION.cff (citation metadata)
  • README.md (complete documentation)
  • CONTRIBUTING.md (contribution guidelines)
  • CODE_OF_CONDUCT.md (community standards)
  • .zenodo.json (Zenodo metadata)
  • ✅ DOI badge in README

This is what makes software Findable, Accessible, Interoperable, and Reusable.

Callout

Real-World Example: Spack

Want to see a production research software project that follows these principles?

Spack is used by national labs, universities, and supercomputing centers: https://github.com/spack/spack

Check their repository and notice:

  • ✅ README with clear description and getting started guide
  • ✅ LICENSE (MIT and Apache-2.0)
  • ✅ CITATION.cff file
  • ✅ Documentation
  • ✅ Code of conduct and Contributing guide
Challenge

Challenge: Reflect on Your Own Code

Think about a script or analysis you’ve written recently:

  1. Could someone else run it on their machine today? What would they need?
  2. If you moved to a new institution, could you still access and use it?
  3. If someone wanted to cite your code in a paper, how would they do it?

Share one challenge you’ve faced with code sharing or reuse.

Common challenges include:

  • No documentation of dependencies or versions
  • Code only works with specific file paths on your machine
  • Missing LICENSE file (so others legally can’t reuse it)
  • No citation information available
  • Undocumented assumptions about the computing environment

All of these will be addressed in this lesson!

Challenge

Challenge 2: Who Gets Credit?

You created a Python script used to generate results in a collaborative paper. The paper cites the dataset but not the script. What’s missing, and what would a proper software citation look like?

The script should be citable as a distinct research output. Without citation metadata or a DOI, the software contribution goes unrecognized and nobody can find or reproduce the exact version used.

A proper citation would include: author(s), software title, version, repository or archive URL, and a persistent identifier (DOI). A CITATION.cff file in the repository provides all of this automatically.

What We’ll Build Together


Starting from the minimal 01-start state, you will progressively add each component:

  • Episode 1: Sharing the demo repository (understanding 01-start)
  • Episode 2: Adding an open-source license (02-license)
  • Episode 3: Managing environments with pixi (03-pixi) (optional)
  • Episode 4: Creating a CITATION.cff file (04-citation)
  • Episode 5: Minting a DOI and creating releases (05-release)
  • Episode 6: Improving metadata and discoverability (06-metadata)
  • Episode 7: Wrap-up and reflection

Each episode builds on the previous one, following the branch progression in the demo repository.

Key Points
  • Research software often fails due to the “works on my machine” trap
  • The bus factor problem means knowledge is lost when people leave
  • Software citations using URLs break over time; DOIs are permanent and version-specific
  • FAIR4RS principles (Findable, Accessible, Interoperable, Reusable) provide a framework
  • Making software FAIR requires: LICENSE, environment files, citation metadata, and documentation
  • A CITATION.cff file is the simplest way to make software citable
  • The demo repository progresses through branches: 01-start → 06-metadata

Content from Sharing Research Software Effectively


Last updated on 2026-05-29 | Edit this page

Overview

Questions

  • Why share research software in a public repository?
  • What are the minimum elements that make a repository useful and discoverable?

Objectives

  • Describe why public repositories increase visibility and credit for research software.
  • Identify the essential components of a well-structured repository.
  • Recognize the “before” state of the demo repository used throughout the lesson.
Callout

Episode Branch: 01-start

This episode explores the initial state of the demo repository.

To follow along:

BASH

cd software-demo
git checkout 01-start

Catch-up point: If you’re joining this episode partway through, start here.

Introduction


Publishing your research software in a public repository helps others find, understand, reuse, and cite your work. This visibility strengthens the transparency of your research process and increases the likelihood that you receive formal credit.

In this lesson, we start with a minimal example repository (branch 01-start). As you progress through the episodes, you will progressively refine it until it is citable, discoverable, and ready for reuse.

The Starting Point


Learners will download or clone the “before” state of the example repository:

BASH

software-demo/
├── README.md
├── src/
 └── analysis.py
└── environment.toml #(pixi environment file)

This repository intentionally lacks many elements of good research software practice.
By the end of the lesson, it will include licensing, citation metadata, improved discoverability information, and versioning.

Challenge

Challenge 1: What Makes a Repository Reusable?

Could you think about the last time you tried to use someone else’s code?

If you have a public GitHub repository, open it now. If not, visit a repository from your field that you’ve used or seen cited.

Check for these elements:

Which of these elements help most? Which would you add to your own work first?

Useful elements commonly include:

  • README with context and usage
  • a clear file structure
  • license information
  • installation instructions
  • dependencies or environment files
  • contributors or authorship information

Missing pieces often include absent documentation, unclear purpose, or no license.

Callout

Learn More About Effective READMEs

Want to dive deeper into README best practices?

Full references available on the Reference page.

Challenge

Challenge 2: Inventory the Demo Repository

Open the software-demo repository you downloaded.

Spend 2 minutes exploring, then answer:

  • Could you run this code today? What’s missing?
  • Would you know who created it or how to credit them?
  • Could you legally reuse or modify it?

We’ll address these gaps together over the next episodes.

Learners may observe:

  • README is minimal
  • No license
  • No citation file
  • No metadata to support discoverability
  • Environment file exists but is not yet introduced

These gaps will be filled across subsequent episodes.

Key Points
  • Public repositories increase findability, reuse potential, and citation credit.
  • A well-structured repository lowers the barrier for others to understand your work.
  • The lesson begins with a minimal “before” repository that will be incrementally improved.

Content from Choosing an Open-Source License


Last updated on 2026-05-29 | Edit this page

Overview

Questions

  • Why do you need a license for your code?
  • How can an open-source license increase reuse and citation?
  • What licenses does the UC system recommend?

Objectives

  • Explain why unlicensed software is legally restricted
  • Describe the main categories of open-source licenses
  • Choose an appropriate license for a UC research project
  • Add a license file to a GitHub repository
  • Identify UC resources for licensing decisions
Callout

Episode Branch: 01-start02-license

This episode adds a LICENSE file to the repository.

Starting point:

BASH

git checkout 01-start    # Start from the beginning

After this episode:

BASH

git checkout 02-license  # See the result with LICENSE added

Catch-up point: If joining now, run git checkout 01-start

Why licensing matters


When researchers publish code without a license, most people assume it is “public.” Legally, it is not. Copyright law applies automatically. Without a license, others cannot legally reuse, modify, or redistribute the code.

Clear licensing tells others what they can and cannot do with your code, which is the minimum needed for open, reproducible research. The UC OSPO License Guide covers UC institutional requirements.

Callout

Institutional Context: Who Owns Your Software?

At most universities, software created using institutional resources is owned by the institution, not the individual researcher. Before releasing code under an open-source license, check with your Technology Transfer or Intellectual Property office. They will verify ownership, funding requirements, and any third-party restrictions.

If you are at a UC campus: software is typically owned by The Regents of the University of California. Your campus Tech Transfer office can help you select from the UC-approved license list. (UC-specific)

At other institutions: check with your research computing, library, or legal office. Most will have a similar process and a list of preferred licenses.

Understanding license categories


Open-source licenses fall into two broad groups.

Permissive licenses

Examples: BSD, MIT, Apache 2.0

These allow broad reuse with minimal restrictions. Anyone can copy, modify, or redistribute the code. They are common in research because they’re simple and maximize flexibility.

BSD licenses are a common first choice at many research institutions because they:

  • originated at UC Berkeley
  • are simple to understand
  • protect both the institution and authors
  • integrate well with most other licenses
  • have minimal restrictions

Copyleft licenses

Example: GPL 2.0

These require that derivative works also remain open-source. This protects openness across the lifecycle of a project.

Callout

Note on GPL 3.0

The UC system does not recommend GPL 3.0 for university-owned software due to patent provisions that may conflict with UC policies. If you need copyleft protection, consult your campus Tech Transfer office about GPL 2.0 or alternatives.

How to choose a license


Five “low-risk” licenses are suitable for most research projects. Here’s a decision guide:

graph TD
    Start[Starting a new UC research software project?] --> Check{Do you have<br/>special requirements?}

    Check -->|No special needs| BSD[Use BSD 3-Clause<br/>✓ Common research default<br/>✓ Simple and protective<br/>✓ Widely compatible]

    Check -->|Need simpler text| MIT[Use MIT License<br/>✓ Nearly identical to BSD<br/>✓ Shorter, easier to read<br/>✓ Very popular]

    Check -->|Industry partnership<br/>or patent concerns| Apache[Use Apache 2.0<br/>✓ Explicit patent protection<br/>✓ Detailed contribution terms<br/>✓ Industry-friendly]

    Check -->|Educational focus| ECL[Consider ECL 2.0<br/>✓ Education-specific variant<br/>✓ Based on Apache 2.0]

    BSD --> TTO[Verify with campus<br/>Tech Transfer Office]
    MIT --> TTO
    Apache --> TTO
    ECL --> TTO

    Check -->|Need copyleft| Copyleft{GPL version?}
    Copyleft -->|GPL 2.0| GPL2[May be acceptable<br/>Consult Tech Transfer]
    Copyleft -->|GPL 3.0| GPL3[❌ Not recommended by UC<br/>Patent conflicts]

    GPL2 --> TTO
    GPL3 --> TTO

    style BSD fill:#90EE90
    style MIT fill:#90EE90
    style Apache fill:#90EE90
    style ECL fill:#90EE90
    style GPL2 fill:#FFFF99
    style GPL3 fill:#FFB6C6
    style TTO fill:#87CEEB

Quick reference

Your need Recommended license SPDX identifier Why
Default / most projects BSD 3-Clause BSD-3-Clause Common default at research institutions
Simplest possible MIT MIT Minimal text, very popular
Industry collaboration Apache 2.0 Apache-2.0 Explicit patent terms
Educational focus ECL 2.0 ECL-2.0 Education-specific variant

The SPDX identifier is the short, machine-readable code used by GitHub, Zenodo, and your CITATION.cff file to communicate your license automatically. When GitHub shows a license badge in the sidebar, it’s reading the SPDX identifier.

Always consult your institution’s Tech Transfer or IP office before releasing software created with institutional resources.

Callout

What about data and documentation?

Software licenses (BSD, MIT, Apache) are written for executable code. If your repository also contains datasets, figures, or documentation, those files need a separate license.

The standard choice for research outputs is Creative Commons Attribution 4.0 (CC BY 4.0), which allows broad reuse with attribution.

A common pattern:

  • /src or your code files → BSD-3-Clause or MIT
  • /data or /docsCC-BY-4.0

You can note this split in your README and in CITATION.cff under the license field, which accepts a list:

YAML

license:
  - BSD-3-Clause
  - CC-BY-4.0

Most research repositories don’t need this, but if you’re sharing a dataset alongside code, it’s worth thinking through.

Callout

Resources

Challenge

Challenge: Add a BSD License to Your Repository

We will add the BSD 3-Clause license to your demo repository:

  1. Navigate to your repository on GitHub.
  2. Click Add fileCreate new file.
  3. Name it LICENSE (or LICENSE.txt).
  4. Click Choose a license template and select BSD 3-Clause License.
  5. Update the copyright holder to reflect who owns the software. At UC campuses this is The Regents of the University of California; at other institutions check with your Tech Transfer office. (If this is a personal project, use your own name.)
  6. Update the year to 2026.
  7. Commit the file to your main branch.

Verify: Does your repository now display the “BSD-3-Clause” license badge in the sidebar?

GitHub automatically detects the LICENSE file and displays it in the sidebar. Your file should look like this:

BSD 3-Clause License

Copyright (c) 2026, The Regents of the University of California
All rights reserved.

If the badge doesn’t appear, ensure the file is in the root directory and named exactly LICENSE or LICENSE.txt.

Communicating your license


After adding a LICENSE file, reference it in your README so users immediately understand usage terms.

Add this section near the top of your README:

MARKDOWN

## License

This project is licensed under the BSD 3-Clause License - see the [LICENSE](LICENSE) file for details.

Why this matters: Users reading your README on platforms other than GitHub (Zenodo, email, exported PDFs) will see your license terms even without GitHub’s automatic detection.

Challenge

Exercise: License Scenarios

Which license would you recommend for each UC research scenario?

Scenario 1: A Python package for ecological data analysis. You want maximum adoption across academia and industry.

Scenario 2: A data visualization tool developed with a biotech partner who may commercialize derivatives.

Scenario 3: A simple utility script you’re sharing with collaborators.

Scenario 1: BSD 3-Clause (UC’s default recommendation, maximum flexibility and adoption)

Scenario 2: Apache 2.0 (explicit patent protection important for industry partnerships)

Scenario 3: Either BSD 3-Clause or MIT (both work well for simple sharing; BSD preferred by UC)

In all cases, verify with your campus Tech Transfer office before releasing.

Summary


Licensing is foundational to making research software usable, citable, and shareable. In this episode, you added a BSD license to a repository following UC recommendations.

Key Points
  • Without a license, software is legally restricted and not reusable
  • BSD 3-Clause is a common default at research institutions; MIT and Apache 2.0 are strong alternatives
  • Permissive licenses (BSD, MIT, Apache 2.0) maximize flexibility and adoption
  • Always consult your institution’s Tech Transfer or IP office before releasing institutionally-owned software
  • GitHub makes adding standard licenses straightforward

Content from Adding a CITATION.cff File


Last updated on 2026-05-29 | Edit this page

Overview

Questions

  • What is a CITATION.cff file and why does it matter?
  • How does GitHub use CITATION.cff to generate ready-made citations?
  • What minimal metadata should researchers include?

Objectives

  • Explain the role of CITATION.cff in software citation.
  • Create and customize a CITATION.cff file in a GitHub repository.
  • Understand how the file connects to later steps like releases and DOIs.
Callout

Episode Branch: 02-license04-citation

This episode adds a CITATION.cff file.

Starting point:

BASH

git checkout 02-license  # Start with LICENSE added

After this episode:

BASH

git checkout 04-citation # See the result with CITATION.cff added

Catch-up point: If joining now, run git checkout 02-license

Note: the 04-citation branch also contains pixi environment files from the optional reproducibility episode. You can ignore those files for now.

Introduction


A CITATION.cff file is the simplest, most direct way to make your software citable.

It provides structured citation metadata that:

  • tells others how to reference your work
  • allows GitHub to display a “Cite this repository” button
  • supports good scholarly practice and FAIR4RS principles

You can create this file before releases or DOIs.
If you later add a DOI or version tag, you can update the file at any time.

What belongs in a CITATION.cff file?


A minimal file includes:

  • title of the software
  • authors (ORCID recommended if available)
  • version (optional at this stage)
  • message with basic instructions

As your project grows, you can add:

  • release versions
  • DOIs from Zenodo or another service
  • keywords
  • abstract
  • repository URLs

Learners do not need to know the entire schema.
The point is to start small and publish useful metadata early.

Callout

Why the CFF format works well

  • human-readable
  • YAML-based
  • validated automatically by GitHub
  • supported by tools including Zotero, Zenodo, and reference managers

How to Create a CITATION.cff File


You have two options:

The CITATION.cff community provides a web-based wizard:

Use cffinit:

Benefits:

Option 2: Create it manually

If you prefer to write it yourself:

Step 1: Create the file

In the root of your repository:

BASH

touch CITATION.cff

Step 2: Add minimal metadata

Here’s a complete example from the slides:

YAML

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "Dennis"
    given-names: "Tim"
    orcid: "https://orcid.org/0000-0002-1234-5678"
title: "Biodiversity Analysis Toolkit"
version: 0.1.0
date-released: 2025-01-15
url: "https://github.com/jt14den/software-demo"

GitHub creates a “Cite this repository” button automatically!

If your software does not yet have version tags, you may omit the version field until Episode 4 when you create releases.

Linking to a published paper


Many researchers want users to cite a journal article alongside or instead of the raw software repository. The preferred-citation field handles this: it tells GitHub, Zotero, and other tools which reference to show first.

YAML

cff-version: 1.2.0
message: "If you use this software, please cite the paper below."
authors:
  - family-names: "Dennis"
    given-names: "Tim"
    orcid: "https://orcid.org/0000-0002-1234-5678"
title: "Biodiversity Analysis Toolkit"
version: 0.1.0
date-released: 2025-01-15
url: "https://github.com/jt14den/software-demo"
preferred-citation:
  type: article
  title: "Biodiversity Analysis at Scale: Methods and Software"
  authors:
    - family-names: "Dennis"
      given-names: "Tim"
  journal: "Journal of Open Source Software"
  year: 2025
  doi: "10.21105/joss.00000"

Without preferred-citation, GitHub shows the software repository citation by default. Adding it ensures that anyone clicking “Cite this repository” gets your paper’s citation instead, which is usually what you want for impact tracking.

Callout

No paper yet? Skip it.

Leave out preferred-citation if you don’t have a published article. You can add it later. The rest of the file works fine without it.

Step 3: Commit and refresh

After you commit the file, GitHub:

  • parses and validates it
  • displays a “Cite this repository” panel
  • provides download options (BibTeX, EndNote, CFF, APA)

This feature works even without a DOI.

Challenge

Exercise 1: Identify missing metadata

Look at your repository (or the example repository provided with the lesson).

Reflect:

  • What metadata is easy to add today?
  • What might require input from collaborators?
  • What do you prefer to add later?

Share one observation.

Typical missing pieces include:

  • ORCID IDs
  • complete contributor list
  • description or abstract
  • license information
  • DOI (added later if desired)
Challenge

Exercise 2: Add a CITATION.cff file

Steps:

  1. Create CITATION.cff.
  2. Add at least: title, author(s), and message.
  3. Commit and refresh to see GitHub’s citation panel.

A valid minimal example:

YAML

cff-version: 1.2.0
title: ExampleProject
message: "Please cite this software."
authors:
  - given-names: Alex
    family-names: Researcher
Key Points
  • A CITATION.cff file is the foundation of software citation.
  • It can be added before releases, DOIs, or version tags.
  • GitHub displays machine-readable citations automatically when this file is present.
  • Start simple and expand over time as your project develops.

Content from Making Your Software Citable


Last updated on 2026-05-29 | Edit this page

Overview

Questions

  • What makes software citable?
  • How do releases and DOIs strengthen software citation?
  • How do I create a release and, optionally, mint a DOI?

Objectives

  • Describe why software citation matters in research.
  • Create a versioned release in GitHub.
  • Understand when and why to mint a DOI with Zenodo.
Callout

Episode Branch: 04-citation05-release

This episode creates a release and tag.

Starting point:

BASH

git checkout 04-citation # Start with LICENSE and CITATION.cff

After this episode:

BASH

git checkout 05-release  # See the result with v0.1.0 tag

Catch-up point: If joining now, run git checkout 04-citation

Introduction


Software is a research product.
Like articles and datasets, it should be cited so others can acknowledge your work, find the exact version you used, and understand how your software contributed to their results.

Your software becomes citable as soon as it includes:

  1. Structured citation metadata, such as a CITATION.cff file.
  2. A public location where the code is available.
  3. A stable version someone can reference.

A DOI is optional but valuable. It strengthens citability by giving each version a persistent identifier.

So far in this lesson, you have:

  • shared a public repository
  • added a license
  • added a CITATION.cff file

In this episode, you will create a GitHub release and learn how DOIs fit into software citation workflows.

What Makes Software Citable?


Software is citable when:

  • it includes authorship and version information
  • the referenced version is stable
  • others can access the code

A DOI enhances these qualities, but does not define them.

Callout

Why add a DOI?

A DOI is helpful for:

  • increasing visibility and discoverability
  • long-term persistence
  • citing exact versions
  • meeting journal and funder expectations

But the core citability comes from your metadata and release process.

Create a Release in GitHub


A release captures a specific version of your software.
It is the snapshot that others can cite.

Steps

  1. Open your GitHub repository.
  2. Select Releases → Draft a new release.
  3. Create a tag such as v0.1.0.
  4. Add release notes summarizing changes.
  5. Publish the release.

Your CITATION.cff file will automatically reference this tagged version unless you specify otherwise.

Callout

Semantic Versioning (SemVer)

You might wonder why we chose v0.1.0. This follows Semantic Versioning (MAJOR.MINOR.PATCH):

  • MAJOR version when you make incompatible API changes (e.g., 1.0.0)
  • MINOR version when you add functionality in a backward compatible manner (e.g., 0.1.0 -> 0.2.0)
  • PATCH version when you make backward compatible bug fixes (e.g., 0.1.1)

Starting with 0.x.x indicates your software is in initial development and the API is not yet stable.

Challenge

Challenge 1: What belongs in a release?

Take one minute to reflect:

  • What information would help a future you understand what changed in this version?

Useful release notes include:

  • what changed
  • what was added or removed
  • what bugs were fixed
  • what might break for users
  • anything important about reproducibility

Clear release notes help both people and tools interpret your software’s evolution.

Minting a DOI with Zenodo Sandbox


To practice minting a DOI without polluting the permanent scholarly record, we will use Zenodo Sandbox. It works exactly like the real Zenodo but is for testing.

The Complete 6-Step Workflow

Step 1: Log in to Zenodo with GitHub

Step 2: Enable your repository (toggle ON)

  • Go to Settings → GitHub in Zenodo Sandbox
  • Find your repository in the list
  • Toggle the switch to ON (green)
  • This tells Zenodo to watch for new releases

Step 3: Create GitHub Release (tag v1.0.0)

  • Go to your GitHub repository
  • Click Releases → Draft a new release
  • Create a tag: v1.0.0
  • Add release notes describing what’s in this version
  • Click Publish release

Step 4: Zenodo auto-archives and mints DOI

  • Zenodo automatically detects your new release
  • Creates an archived snapshot
  • Assigns a permanent DOI
  • Wait a few minutes for processing

Step 5: Add DOI badge to your README

  • Copy the DOI badge from your Zenodo record
  • Add it to the top of your README:

MARKDOWN

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.123456.svg)](https://doi.org/10.5281/zenodo.123456)

Step 6: Update your CFF file with your DOI

  • Add the DOI to your CITATION.cff:

YAML

doi: 10.5281/zenodo.123456

Result: You now have LICENSE, CITATION.cff, and DOI.

Callout

Now You Have Everything

F - Findable: Added DOI, CITATION.cff, rich metadata ✅ A - Accessible: Public GitHub, archived on Zenodo ✅ I - Interoperable: Standard formats (YAML, CFF) ✅ R - Reusable: LICENSE (BSD-3), README with setup

Even if GitHub disappears, your DOI still works.

Callout

Going further: Software Heritage

Zenodo archives a snapshot of your code at release time. Software Heritage goes further: it continuously crawls GitHub, GitLab, and other forges and archives everything, assigning a SWHID (Software Heritage Identifier) to every file, directory, commit, and release.

A SWHID looks like this:

swh:1:rel:22ece559cc7cc2364edc5e5593d63ae8bd229f9f

It points to an exact, immutable snapshot that survives forge closures (Gitorious shut down in 2015; Google Code in 2016, both making thousands of repos unreachable). Your Zenodo DOI is the right identifier for citation; a SWHID is the long-term preservation record.

To find and record your SWHID:

  1. Go to https://archive.softwareheritage.org/
  2. Paste your GitHub repository URL into the search box
  3. If your repo is already archived, copy the SWHID for your release
  4. If it hasn’t been crawled yet, click Save code now to trigger immediate archival
  5. Once archived, add the SWHID to your CITATION.cff:

YAML

repository-artifact: "swh:1:rel:22ece559cc7cc2364edc5e5593d63ae8bd229f9f"

This approach is recommended in the 2026 CODE Beyond FAIR roadmap (Di Cosmo et al., Scientific Data).

Challenge

Challenge 2: Why are versioned DOIs important?

Why might it matter that a DOI points to a specific release rather than the whole project?

Versioned DOIs:

  • support reproducibility by identifying an exact snapshot
  • prevent confusion when the software changes
  • allow users to cite precisely the version they used
  • support FAIR4RS and publisher guidelines
Key Points
  • Software is citable as soon as it includes a citation file and a stable version.
  • GitHub releases create versioned snapshots for citation.
  • DOIs are optional but strengthen discoverability, persistence, and reproducibility.
  • Zenodo can automatically mint a DOI for each GitHub release.

Content from Managing Reproducible Environments with pixi


Last updated on 2026-05-29 | Edit this page

Callout

Optional Episode

This episode covers environment management using pixi. It is optional; you can skip it and move directly to Improving Metadata and Discoverability.

If you skip this episode, you will still complete all citation steps. The pixi.toml and pixi.lock files you see in the demo repo branches were added here; you can ignore them.

Other environment tools: conda, mamba, pip/venv, and renv (for R) all serve the same purpose. The concepts here apply to any environment manager; pixi is used because it handles Python, R, and other languages with a single tool and generates an automatic lockfile.

Overview

Questions

  • Why do software projects need well defined environments?
  • How can pixi help learners run the same code the developer used?
  • How does environment management improve the reproducibility and citability of research software?

Objectives

  • Explain why environment definition is central to reproducible research.
  • Create a minimal pixi.toml file for a project.
  • Use pixi to run Python or R code inside a clean, isolated environment.
  • Describe how environment files support FAIR software and citation practices.
Callout

Episode Branch: 03-pixi

This optional episode explores the environment management layer of the demo repository.

To follow along:

BASH

git checkout 03-pixi     # Branch with pixi.toml and lockfile added

This branch sits between 02-license and 04-citation in the demo repo history, but you can explore it at any point in the lesson.

Why Environments Matter


The Problem: Research software often “works on my machine” and nowhere else.

Code rots. Python updates, packages break, and 6 months from now, your script won’t run.

Different operating systems, outdated packages, and mismatched library versions frequently break code.

❌ The Vague Way

TOML

[dependencies]
python = "*"
numpy = "*"

python = "*" is like saying “I need some food.”

Problems:

  • Works today, breaks tomorrow
  • Different versions on different machines
  • “Works on my machine” syndrome

✅ The Locked Way

TOML

pixi.lock contains:
python = "3.11.4"
numpy = "1.24.3"
+ 47 dependencies

pixi.lock is like saying “I need a pepperoni pizza from Mario’s, baked at 5:00 PM.”

Benefits:

  • ✅ Exact versions locked
  • ✅ Same environment everywhere
  • ✅ Reproducible in 5 years

What Environment Management Captures


Environment management reduces this friction because it captures:

  • The exact language versions used
  • Required packages
  • The dependency set needed to run the software
  • Instructions for reproducing the execution environment

The Payoff: We aren’t just shipping code; we’re shipping the computer state needed to run it.

Why pixi?


pixi is a modern, fast environment manager that works for Python, R, and many other languages. We use it in this lesson because it is:

  • Cross-platform: Works on macOS, Linux, Windows
  • Fast: Faster than Conda
  • Automatic lockfiles: Creates pixi.lock automatically, guaranteeing everyone runs the exact same versions of every package
  • Multi-language: Supports Python, R, and more

FAIR Connection: Standard formats + clear dependencies = Interoperable & Reusable software


Installing pixi


Full installation docs: https://pixi.sh

Common installation for macOS or Linux:

BASH

curl -fsSL https://pixi.sh/install.sh | bash

Windows users can install via MSI installer or winget.

Callout

pixi includes its own language runtimes.
Learners do not need preinstalled Python, R, compilers, or system packages.

Creating a New pixi Project


Create a directory and initialize a pixi project:

BASH

mkdir myproject
cd myproject
pixi init

This creates a pixi.toml file, which documents your environment.

Keeping Repositories Clean (.gitignore)

When you run pixi init, it automatically creates a .gitignore file. This file tells Git which files to ignore.

For pixi, this is critical because it creates a hidden folder .pixi/ containing thousands of environment files. You never want to commit this folder to GitHub. It is large, platform-specific, and can be regenerated by anyone using your pixi.lock file.

Always check your .gitignore to ensure generated files (like .DS_Store, __pycache__, or data outputs) are not accidentally shared.

Add Python:

BASH

pixi add python

Add the NumPy package:

BASH

pixi add numpy

Add R and one package:

BASH

pixi add r
pixi add r-dplyr

Your pixi.toml is now a reproducible record of all dependencies needed for the software.

What’s Inside pixi.toml?

Here’s what the file looks like (this will be automatically created for you):

TOML

[workspace]
authors = ["Leigh Phan <leighphan@ucla.edu>"]
channels = ["conda-forge"]
name = "myproject"
platforms = ["osx-arm64"]
version = "0.1.0"

[tasks]

[dependencies]
python = ">=3.14.3,<3.15"
numpy = ">=2.4.2,<3"
r = ">=4.5,<4.6"
r-dplyr = ">=1.2.0,<2"

The pixi.toml file is now a reproducible record of all dependencies needed for the software.

When you run pixi install, it also creates a pixi.lock file with exact versions locked:

pixi.lock contains:
python = "3.14.3"
numpy = "2.4.2"
r = "4.5.4"
r-dplyr = "1.2.0"
+ 47 other dependencies

This lockfile guarantees byte-for-byte reproducibility.


Running Code With pixi


Run Python code:

BASH

pixi run python script.py

Run R code:

BASH

pixi run Rscript analysis.R

Every command is executed inside the environment described by pixi.toml.

This makes it easier for others to test, cite, extend, and build upon your work.


How Environments Support Citation and Reuse


A reusable research software project contains not only code, but:

  • licensing
  • authorship and citation metadata (CITATION.cff)
  • version information
  • a documented environment

Including pixi.toml in your repository or DOI deposit helps future readers:

  • recreate the execution environment
  • verify results
  • adapt your code for new analyses
  • evaluate whether the software is FAIR (Findable, Accessible, Interoperable, Reusable)

When publishing your software, include:

  • the pixi.toml file

  • instructions such as:

    BASH

    pixi run python script.py
Challenge

Challenge: Add a new dependency

Use pixi to add pandas or r-ggplot2 to your project.

What changed in your pixi.toml file?

BASH

pixi add pandas

or:

BASH

pixi add r-ggplot2

You should see the new package listed under [project.dependencies] in your pixi.toml.

Key Points
  • Reproducible environments reduce troubleshooting and support more reusable software.
  • pixi provides fast, cross platform environment management.
  • The pixi.toml file acts as documentation that supports citation and FAIR4RS principles.
  • Use pixi run to execute Python or R code inside a reproducible environment.

Content from Improving Metadata and Discoverability


Last updated on 2026-05-29 | Edit this page

Overview

Questions

  • What metadata makes research software easier to find and reuse?
  • How can we improve discoverability across GitHub, Zenodo, and scholarly indexes?

Objectives

  • Identify key metadata elements that increase visibility and reuse.
  • Enhance discoverability using GitHub features and Zenodo metadata fields.
  • Connect metadata across CITATION.cff, GitHub, and your DOI record for consistency.
Callout

Episode Branch: 05-release06-metadata

This episode completes the repository with full documentation and metadata.

Starting point:

BASH

git checkout 05-release  # Start with release tagged

After this episode:

BASH

git checkout 06-metadata # See the complete FAIR repository

Catch-up point: If joining now, run git checkout 05-release

Introduction


Clear metadata helps others understand, evaluate, and find your software.
It also reduces the cognitive effort for future users because essential information is organized and easy to locate.

In earlier episodes, you created:

  • a CITATION.cff file
  • a license
  • a repository structure
  • a Zenodo record with a DOI

This episode brings these together. You will describe your project in consistent ways across platforms so search engines, citation tools, and colleagues can discover it.

What counts as useful metadata?


Good metadata answers predictable questions with minimal effort from the reader:

  • What is this software? (short description or abstract)
  • Who made it? (authors, ORCIDs)
  • How do I cite it? (CITATION.cff + DOI)
  • What domain is it for? (keywords)
  • What else does it relate to?
    • related article DOI
    • datasets used
    • funding source
    • project website

You may add these in multiple places, but they should remain consistent.

GitHub-specific discoverability features


GitHub uses structured metadata to improve search ranking and cross-repository linking.

Add these items in Settings → General → Topics:

  • discipline tags (e.g., geospatial, text-mining, materials-science)
  • methodological tags (simulation, visualization, machine-learning)
  • language tags (python, r)

Writing an Effective README


The 30-Second Rule

Your README is your software’s front door.

If users can’t understand what it does, how to install it, or how to use it in 30 seconds → they leave.

README Structure (7 Essential Sections)

The UC OSPO README Guide (UC-specific) recommends this standard structure:

  1. About: What does this do? (2-3 sentences)
  2. Features: Key capabilities (Reproducible, Citable, Open source)
  3. Getting Started: Prerequisites + installation
  4. Usage: Minimal working example
  5. Citation: Link to CITATION.cff or DOI
  6. License: Explicitly state terms (e.g., “BSD-3 - see LICENSE file”)
  7. Contact: How to get help

Before vs. After Example

❌ Before (Branch: 01-start)

MARKDOWN

# Biodiversity Analysis Toolkit

A script.

No description. No instructions. No citation. Unusable.

✅ After (Branch: 06-metadata)

MARKDOWN

# Biodiversity Analysis Toolkit

Analysis tools for biodiversity research.

## Features
- Reproducible (pixi) • Citable (DOI) • Open source (BSD-3)

## Getting Started
```bash
pixi install
pixi run python src/analysis.py

Citation


[zenodo.org/badge/DOI/10.5281/zenodo.123456.svg]

License: BDS-3 - see LICENSE file ```

Professional. Citable. Usable.

README Best Practices: 5 Quick Tips

  1. Clear description → Answer “What problem does this solve?”
  2. Show, don’t tell → Include code examples
  3. Link metadata → Add DOI badge, link CITATION.cff
  4. Keep updated → Refresh when features change
  5. Use a templateUC OSPO Templates (UC-specific) or Awesome README

Every tip maps to FAIR principles.

Callout

Don’t Reinvent the Wheel

It is also critical to link your metadata:

  • Add a badge for your Zenodo DOI
  • Link to your CITATION.cff file

A structured README ensures researchers can quickly evaluate and use your software.

Beyond the README: Community Health Files


Beyond technical metadata, files that describe how to interact with your project matter for long-term sustainability and for signaling that the project is welcoming.

GitHub looks for these files:

CONTRIBUTING.md → How to contribute

The CONTRIBUTING.md file is the first place new contributors look to see if a project is open to participation. Following a contributing guide template (the UC OSPO Contributing Guide (UC-specific) is one good example) ensures you cover essential ground:

  • Welcome Statement: Explicitly inviting others to join
  • Ways to Contribute: Identifying non-code contributions (e.g., documentation, testing, issues)
  • Setup Instructions: How to get the project running locally
  • Pull Request Lifecycle: What happens after a contribution is submitted

CODE_OF_CONDUCT.md → Behavioral standards

A CODE_OF_CONDUCT.md establishes behavioral expectations and ensures a safe, inclusive environment for all researchers. The standard choice is the Contributor Covenant, widely adopted across open-source projects. (See also: UC OSPO Code of Conduct Guide (UC-specific))

CHANGELOG.md → Version history

A CHANGELOG.md documents what changed between versions. This helps users understand:

  • What’s new in each release
  • What bugs were fixed
  • What breaking changes occurred
  • How the software evolved over time

Why it matters: Signals your project is professionally managed and welcoming.

Callout

Templates Available

Adding these files to your repository root helps GitHub display a “Community Standards” checklist in your insights, signaling that your project is professionally managed and ready for collaboration.


Institutional Repositories: Dataverse, Dryad, and Zenodo


Many institutions use repositories like Dataverse or Dryad for research data deposits. These are good for datasets but have limited software support; they don’t integrate with GitHub releases or mint version-specific DOIs automatically.

For software, Zenodo is the recommended deposit location because:

  • It integrates directly with GitHub to archive each release automatically
  • It mints a DOI for every version
  • Records flow into DataCite and are indexed by Google Scholar and library catalogs
  • It’s free, CERN-operated, and widely recognized by journals and funders
Callout

What about your institution’s repository?

If your institution requires or prefers a local IR (Dataverse instance, DSpace, etc.), you can deposit there in addition to Zenodo, not instead of it. Use the Zenodo DOI as the persistent identifier in your CITATION.cff, and note the institutional deposit in your README or Zenodo metadata as a related work.

Some funders (NSF, NIH, Wellcome Trust) have specific deposit requirements. Check your award terms before deciding where the authoritative copy lives.

Zenodo and DOI metadata


When you deposit software on Zenodo, the record flows into:

  • DataCite: the DOI registration agency for research data and software; DataCite records are harvested by library catalogs, institutional discovery systems, and tools like OpenAlex and Scholix
  • Google Scholar: picks up Zenodo records with structured metadata
  • Library catalogs: many discovery layers (EBSCO, Ex Libris Primo, OCLC WorldCat) harvest DataCite metadata, meaning your software can appear in a library search alongside journal articles
  • Domain repositories that harvest DOIs

What this means practically: the metadata you put into your Zenodo record is the metadata that librarians and discovery systems see. Thin metadata (no description, no keywords, no author ORCIDs) limits findability even if the DOI is valid.

Add or refine:

  • authors + ORCIDs
  • keywords (discipline tags, method tags, language tags; use the same ones you add to GitHub Topics)
  • related works (link to the paper that used this software, the dataset it analyzes, the grant that funded it)
  • funding references
  • version notes
  • a readable software description (2-3 sentences; think abstract, not README)

Your goal is context. A researcher or a librarian helping a researcher should be able to read the Zenodo record and decide in 30 seconds whether this software is relevant to them.

Challenge

Challenge 1: Identify Useful Metadata

List three elements you would add to improve a repository’s discoverability.

Examples:

  • ORCID IDs for each author
  • Keywords describing the domain and function
  • A link to a related article or dataset
  • A project description in your README
  • Funding acknowledgment
Challenge

Challenge 2: Improve Your Zenodo Record

After generating a DOI in the earlier episode, expand its metadata:

  1. Open your Zenodo record.
  2. Select “Edit.”
  3. Add:
    • keywords
    • authors and ORCIDs
    • description
    • related publication DOIs
    • funding
  4. Save and publish the updated record.
Key Points
  • Metadata increases discoverability in GitHub, Zenodo, and scholarly indexes.
  • Use consistent information across CITATION.cff, README, GitHub topics, and Zenodo.
  • Thoughtful metadata supports FAIR principles and helps others reuse your software.

Content from Wrap-Up and Reflection


Last updated on 2026-05-29 | Edit this page

Overview

Questions

  • What small steps can make your research software more citable and discoverable?
  • How can you apply these practices to your current or future projects?

Objectives

  • Reflect on the practical steps taken during the session
  • Identify at least one improvement to apply in your own software projects

Your FAIR4RS Checklist


Congratulations! You’ve transformed fragile research software into a FAIR software project.

What We Covered Today:

F - Findable: Added DOI, CITATION.cff, rich metadata ✅ A - Accessible: Public GitHub, archived on Zenodo ✅ I - Interoperable: Standard formats (YAML, CFF), documented dependencies (pixi.toml) ✅ R - Reusable: LICENSE (BSD-3), README with setup, environment reproducibility

From Fragile to FAIR

Before (Branch: 01-start):

  • ❌ No LICENSE
  • ❌ No environment
  • ❌ No citation
  • ❌ No DOI

After (Branch: 06-metadata):

  • ✅ LICENSE (BSD-3)
  • ✅ Environment (pixi.toml)
  • ✅ CITATION.cff
  • ✅ DOI from Zenodo
  • ✅ README with documentation
  • ✅ Community health files

Your software is now citable, discoverable, and reusable.

Introduction


Over this session, you’ve learned how to make your research software more visible, citable, and impactful. These small, practical steps support scholarly communication, reproducibility, and the FAIR principles.

Use this time to reflect on what you’ve learned and decide on one action you’ll take with your own project.

Challenge

Challenge 1: Choose Your Next Step

Which of the practices from today’s session will you apply to a current or future project?

Answers may vary: making a repo public, adding a license, archiving on Zenodo, writing a README, creating a CITATION file, etc.

Challenge

Challenge 2: Find a FAIR Win

Think of one thing you can do in under 30 minutes to make your software more FAIR.

Examples:

  • Add a LICENSE file
  • Write a short README
  • Register your ORCID on Zenodo
  • Create a GitHub release

Resources to Take With You


UC-Specific Resources (for UC campus learners)

General Open Source Resources

Key Points
  • You’ve successfully made your software FAIR: Findable, Accessible, Interoperable, and Reusable
  • Even small actions can significantly improve your software’s impact
  • Making code citable and discoverable benefits both you and the research community
  • Start with one change, then build from there
  • Use the UC OSPO resources and templates to streamline the process