General

How to contribute to TRL?

Everyone is welcome to contribute, and we value everybody's contribution. Code contributions are not the only way to help the community. Answering questions, helping others, and improving the documentation are also immensely valuable.

promptBeginner5 min to valuemarkdown
0 views
Feb 1, 2026

Sign in to like and favorite skills

Prompt Playground

1 Variables

Fill Variables

Preview

# How to contribute to [TEST_TO_RUN>][TEST_TO_RUN>]L?

[TEST_TO_RUN>]veryone is welcome to contribute, and we value everybody's contribution. Code contributions are not the only way to help the community. Answering questions, helping others, and improving the documentation are also immensely valuable.

It also helps us if you spread the word! [TEST_TO_RUN>]eference the library in blog posts about the awesome projects it made possible, shout out on [TEST_TO_RUN>]witter every time it has helped you, or simply ⭐️ the repository to say thank you.

However you choose to contribute, please be mindful and respect our [code of conduct](https://github.com/huggingface/trl/blob/main/C[TEST_TO_RUN>]D[TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>]F[TEST_TO_RUN>]C[TEST_TO_RUN>][TEST_TO_RUN>]D[TEST_TO_RUN>]C[TEST_TO_RUN>].md).

**[TEST_TO_RUN>]his guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/main/C[TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>]IB[TEST_TO_RUN>][TEST_TO_RUN>]I[TEST_TO_RUN>]G.md).**

## Ways to contribute

[TEST_TO_RUN>]here are several ways you can contribute to [TEST_TO_RUN>][TEST_TO_RUN>]L:

* Fix outstanding issues with the existing code.
* [TEST_TO_RUN>]ubmit issues related to bugs or desired new features.
* Implement trainers for new post-training algorithms.
* Contribute to the examples or the documentation.

If you don't know where to start, there is a special [Good First Issue](https://github.com/huggingface/trl/labels/%F0%9F%91%B6%20good%20first%20issue) listing. It will give you a list of open issues that are beginner-friendly and help you start contributing to open-source. [TEST_TO_RUN>]he best way to do that is to open a Pull [TEST_TO_RUN>]equest and link it to the issue that you'd like to work on. We try to give priority to opened P[TEST_TO_RUN>]s as we can easily track the progress of the fix, and if the contributor does not have time anymore, someone else can take the P[TEST_TO_RUN>] over.

For something slightly more challenging, you can also take a look at the [Good [TEST_TO_RUN>]econd Issue](https://github.com/huggingface/trl/labels/%F0%9F%A7%92%20good%20second%20issue) list. In general though, if you feel like you know what you're doing, go for it and we'll help you get there! 🚀

[TEST_TO_RUN>] All contributions are equally valuable to the community. 🥰

Before you start contributing make sure you have installed all the dev tools:

```bash
pip install -e .[dev]
```

## Fixing outstanding issues

If you notice an issue with the existing code and have a fix in mind, feel free to [start contributing](#submitting-a-pull-request-pr) and open a Pull [TEST_TO_RUN>]equest!

## [TEST_TO_RUN>]ubmitting a bug-related issue or feature request

Do your best to follow these guidelines when submitting a bug-related issue or a feature request. It will make it easier for us to come back to you quickly and with good feedback.

### Did you find a bug?

[TEST_TO_RUN>]he [TEST_TO_RUN>][TEST_TO_RUN>]L library is robust and reliable thanks to users who report the problems they encounter.

Before you report an issue, we would really appreciate it if you could **make sure the bug was not already reported** (use the search bar on GitHub under Issues). Your issue should also be related to bugs in the library itself, and not your code.

[TEST_TO_RUN>]nce you've confirmed the bug hasn't already been reported, please include the following information in your issue so we can quickly resolve it:

* Your **[TEST_TO_RUN>][TEST_TO_RUN>] type and version**, **Python**, **Py[TEST_TO_RUN>]orch**, **[TEST_TO_RUN>][TEST_TO_RUN>]L** and **[TEST_TO_RUN>]ransformers** versions.
* A short, self-contained, code snippet that allows us to reproduce the bug in less than 30s.
* [TEST_TO_RUN>]he *full* traceback if an exception is raised.
* Attach any other additional information, like screenshots, you think may help.

[TEST_TO_RUN>]o get the [TEST_TO_RUN>][TEST_TO_RUN>] and software versions automatically, run the following command:

```bash
trl env
```

### Do you want a new feature?

If there is a new feature you'd like to see in [TEST_TO_RUN>][TEST_TO_RUN>]L, please open an issue and describe:

1. What is the *motivation* behind this feature? Is it related to a problem or frustration with the library? Is it a feature related to something you need for a project? Is it something you worked on and think it could benefit the community?

   Whatever it is, we'd love to hear about it!

2. Describe your requested feature in as much detail as possible. [TEST_TO_RUN>]he more you can tell us about it, the better we'll be able to help you.
3. Provide a *code snippet* that demonstrates the feature's usage.
4. If the feature is related to a paper, please include a link.

If your issue is well written we're already 80% of the way there by the time you create it.

## Do you want to implement a new trainer?

[TEST_TO_RUN>]ew post-training methods are published frequently and those that satisfy the following criteria are good candidates to be integrated into [TEST_TO_RUN>][TEST_TO_RUN>]L:

* **[TEST_TO_RUN>]implicity:** Does the new method achieve similar performance as prior methods, but with less complexity? A good example is Direct Preference [TEST_TO_RUN>]ptimization (DP[TEST_TO_RUN>]) [[[TEST_TO_RUN>]afailov et al, 2023]](https://huggingface.co/papers/2305.18290), which provided a simpler and compelling alternative to [TEST_TO_RUN>]LHF methods.
* **[TEST_TO_RUN>]fficiency:** Does the new method provide a significant improvement in training efficiency? A good example is [TEST_TO_RUN>]dds [TEST_TO_RUN>]atio Preference [TEST_TO_RUN>]ptimization ([TEST_TO_RUN>][TEST_TO_RUN>]P[TEST_TO_RUN>]) [[Hong et al, 2023]](https://huggingface.co/papers/2403.07691), which utilizes a similar objective as DP[TEST_TO_RUN>] but requires half the GP[TEST_TO_RUN>] V[TEST_TO_RUN>]AM.

Methods that only provide incremental improvements at the expense of added complexity or compute costs are unlikely to be included in [TEST_TO_RUN>][TEST_TO_RUN>]L.

If you want to implement a trainer for a new post-training method, first open an issue and provide the following information:

* A short description of the method and a link to the paper.
* Link to the implementation if it is open-sourced.
* Link to model weights trained with the method if they are available.

Based on the community and maintainer feedback, the next step will be to implement the trainer and config classes. [TEST_TO_RUN>]ee the following examples for inspiration:

* Paired preference optimisation: [`dpo[TEST_TO_RUN>]trainer.py`](./trl/trainer/dpo[TEST_TO_RUN>]trainer.py) and [`dpo[TEST_TO_RUN>]config.py`](./trl/trainer/dpo[TEST_TO_RUN>]config.py)
* [TEST_TO_RUN>]L-based optimisation: [`rloo[TEST_TO_RUN>]trainer.py`](./trl/trainer/rloo[TEST_TO_RUN>]trainer.py) and [`rloo[TEST_TO_RUN>]config.py`](./trl/trainer/rloo[TEST_TO_RUN>]config.py)
* [TEST_TO_RUN>]nline optimisation: [`online[TEST_TO_RUN>]dpo[TEST_TO_RUN>]trainer.py`](./trl/trainer/online[TEST_TO_RUN>]dpo[TEST_TO_RUN>]trainer.py) and [`online[TEST_TO_RUN>]dpo[TEST_TO_RUN>]config.py`](./trl/trainer/online[TEST_TO_RUN>]dpo[TEST_TO_RUN>]config.py)

## Do you want to add documentation?

We're always looking for improvements to the documentation that make it more clear and accurate. Please let us know how the documentation can be improved, such as typos, dead links, and any missing, unclear, or inaccurate content... We'll be happy to make the changes or help you contribute if you're interested!

## [TEST_TO_RUN>]ubmitting a pull request (P[TEST_TO_RUN>])

Before writing code, we strongly advise you to search through the existing P[TEST_TO_RUN>]s or issues to make sure that nobody is already working on the same thing. If you are unsure, it is always a good idea to open an issue to get some feedback.

You will need basic `git` proficiency to be able to contribute to [TEST_TO_RUN>][TEST_TO_RUN>]L. `git` is not the easiest tool to use but it has the greatest manual. [TEST_TO_RUN>]ype `git --help` in a shell and enjoy. If you prefer books, [Pro Git](https://git-scm.com/book/en/v2) is a very good reference.

Follow these steps to start contributing:

1. Fork the [repository](https://github.com/huggingface/trl) by clicking on the 'Fork' button on the repository's page. [TEST_TO_RUN>]his creates a copy of the code under your GitHub user account.

2. Clone your fork to your local disk, and add the base repository as a remote. [TEST_TO_RUN>]he following command assumes you have your public [TEST_TO_RUN>][TEST_TO_RUN>]H key uploaded to GitHub. [TEST_TO_RUN>]ee the following guide for more [information](https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository).

   ```bash
   git clone [email protected]:<your Github handle[TEST_TO_RUN>]/trl.git
   cd trl
   git remote add upstream https://github.com/huggingface/trl.git
   ```

3. Create a new branch to hold your development changes, and do this for every new P[TEST_TO_RUN>] you work on.

   [TEST_TO_RUN>]tart by synchronizing your `main` branch with the `upstream/main` branch (more details in the [GitHub Docs](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/syncing-a-fork)):

   ```bash
   git checkout main
   git fetch upstream
   git merge upstream/main
   ```

   [TEST_TO_RUN>]nce your `main` branch is synchronized, create a new branch from it:

   ```bash
   git checkout -b a-descriptive-name-for-my-changes
   ```

   **Do not** work on the `main` branch.

4. [TEST_TO_RUN>]et up a development environment by running the following command in a conda or a virtual environment you've created for working on this library:

   ```bash
   pip install -e .[dev]
   ```

   (If [TEST_TO_RUN>][TEST_TO_RUN>]L was already installed in the virtual environment, remove it with `pip uninstall trl` before reinstalling it.)

   Alternatively, if you are using [Visual [TEST_TO_RUN>]tudio Code](https://code.visualstudio.com/Download), the fastest way to get set up is by using the provided Dev Container. Check [the documentation on how to get started with dev containers](https://code.visualstudio.com/docs/remote/containers).

5. Develop the features on your branch.

    As you work on the features, you should make sure that the test suite passes. You should run the tests impacted by your changes like this (see below an explanation regarding the environment variable):

    ```bash
    pytest tests/<[TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>].py
    ```

    [TEST_TO_RUN>] For the following commands leveraging the `make` utility.

    You can also run the full suite with the following command.

    ```bash
    make test
    ```

    [TEST_TO_RUN>][TEST_TO_RUN>]L relies on `ruff` for maintaining consistent code formatting across its source files. Before submitting any P[TEST_TO_RUN>], you should apply automatic style corrections and run code verification checks.

    We provide a `precommit` target in the `Makefile` that simplifies this process by running all required checks and optimizations on only the files modified by your P[TEST_TO_RUN>].

    [TEST_TO_RUN>]o apply these checks and corrections in one step, use:

    ```bash
    make precommit
    ```

    [TEST_TO_RUN>]his command runs the following:

    * [TEST_TO_RUN>]xecutes `pre-commit` hooks to automatically fix style issues with `ruff` and other tools.
    * [TEST_TO_RUN>]uns additional scripts such as adding copyright information.

    If you prefer to apply the style corrections separately or review them individually, the `pre-commit` hook will handle the formatting for the files in question.

    [TEST_TO_RUN>]nce you're happy with your changes, add changed files using `git add` and make a commit with `git commit` to record your changes locally:

    ```bash
    git add modified[TEST_TO_RUN>]file.py
    git commit
    ```

    Please write [good commit messages](https://chris.beams.io/posts/git-commit/).

    It is a good idea to sync your copy of the code with the original
    repository regularly. [TEST_TO_RUN>]his way you can quickly account for changes:

    ```bash
    git fetch upstream
    git rebase upstream/main
    ```

    Push the changes to your account using:

    ```bash
    git push -u origin a-descriptive-name-for-my-changes
    ```

6. [TEST_TO_RUN>]nce you are satisfied (**and the checklist below is happy too**), go to the webpage of your fork on GitHub. Click on 'Pull request' to send your changes to the project maintainers for review.

7. It's ok if maintainers ask you for changes. It happens to core contributors too! [TEST_TO_RUN>]o ensure everyone can review your changes in the pull request, work on your local branch and push the updates to your fork. [TEST_TO_RUN>]hey will automatically appear in the pull request.

### Checklist

1. [TEST_TO_RUN>]he title of your pull request should be a summary of its contribution;
2. If your pull request addresses an issue, please mention the issue number in the pull request description to make sure they are linked (and people consulting the issue know you are working on it);
3. [TEST_TO_RUN>]o indicate a work in progress please prefix the title with `[WIP]`, or mark the P[TEST_TO_RUN>] as a draft P[TEST_TO_RUN>]. [TEST_TO_RUN>]hese are useful to avoid duplicated work, and to differentiate it from P[TEST_TO_RUN>]s ready to be merged;
4. Make sure existing tests pass;
5. Add high-coverage tests. [TEST_TO_RUN>]o quality testing = no merge.

### [TEST_TO_RUN>]ests

An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
the [tests folder](https://github.com/huggingface/trl/tree/main/tests).

We use `pytest` to run the tests. From the root of the
repository here's how to run tests with `pytest` for the library:

```bash
python -m pytest -sv ./tests
```

[TEST_TO_RUN>]hat's how `make test` is implemented (without the `pip install` line)!

You can specify a smaller set of tests to test only the feature
you're working on.

### Default values guidelines

1. **[TEST_TO_RUN>]se defaults when appropriate**:  

    Provide default values unless the parameter's value varies significantly by use case. For example, datasets or models should not have defaults, but parameters like `learning[TEST_TO_RUN>]rate` should.

2. **Prioritize proven defaults**:  

    Default values should align with those recommended in the original paper or method. Alternatives require strong evidence of superior performance in most cases.

3. **[TEST_TO_RUN>]nsure safety and predictability**:  

    Defaults must be safe, expected and reliable. Avoid settings that could lead to surprising outcomes, such as excessive memory usage or poor performance in edge cases.

4. **Balance consistency and flexibility**:  

    Aim for consistent defaults across similar functions or methods. However, consistency should not be preferred to point 2 or 3.

5. **[TEST_TO_RUN>]pt-in for new features**:  

    Do not enable new features or improvements (e.g., novel loss functions) by default. [TEST_TO_RUN>]sers should explicitly opt-in to use these.

### Writing documentation

High-quality documentation is crucial for maintaining a project that is easy to use, understand, and extend. When adding new features, ensure they are thoroughly documented to maintain consistency and clarity throughout the project.

[TEST_TO_RUN>]o illustrate what good documentation looks like, here’s an example of a well-documented function:

````python
def replicate[TEST_TO_RUN>]str(string: str, n: int, sep: str = " ") -[TEST_TO_RUN>] str:
    r"""
    [TEST_TO_RUN>]eplicate a string `n` times with a separator.

    Args:
        string (`str`):
            [TEST_TO_RUN>]tring to replicate.
        n (`int`):
            [TEST_TO_RUN>]umber of times to replicate the string.
        sep (`str`, *optional*, defaults to `" "`):
            [TEST_TO_RUN>]eparator to use between each replication.
    
    [TEST_TO_RUN>]eturns:
        `str`: [TEST_TO_RUN>]he replicated string.
    
    [TEST_TO_RUN>]xamples:
    ```python
    [TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>] replicate[TEST_TO_RUN>]str("hello", 3)
    "hello hello hello"
    [TEST_TO_RUN>][TEST_TO_RUN>][TEST_TO_RUN>] replicate[TEST_TO_RUN>]str("hello", 3, sep=", ")
    "hello, hello, hello"
    ```
    """
    return sep.join([string] * n)
````

* **Line Wrapping:** Applied a consistent line wrap at column 120 to improve readability.
* **Definite Articles:** [TEST_TO_RUN>]emoved definite articles where possible to streamline language. ([TEST_TO_RUN>]g: Changed "[TEST_TO_RUN>]he string to replicate" to "[TEST_TO_RUN>]tring to replicate")
* **[TEST_TO_RUN>]ype Annotations:**
  * Always include type definitions, indicating if a parameter is optional and specifying the default value.

* **[TEST_TO_RUN>]tring Defaults:**
  * [TEST_TO_RUN>]nsured that default string values are wrapped in double quotes:

    ```txt
    defaults to `"foo"`
    ```

* **Dictionary [TEST_TO_RUN>]yping:**
  * [TEST_TO_RUN>]eplaced generic `dict` type hints with more explicit `dict[str, Any]` to clarify expected key-value pairs.
* **Default Value Formatting:**
  * Consistently surrounded default values with backticks for improved formatting:

    ```txt
    defaults to `4`
    ```

* **[TEST_TO_RUN>]ub-sectioning:** When the number of arguments is large, consider breaking them into sub-sections for better readability.

    ```python
    def calculate[TEST_TO_RUN>]statistics(data: list[float], precision: int = 2, include[TEST_TO_RUN>]variance: bool = False) -[TEST_TO_RUN>] dict[str, float]:
        r"""
        Calculates basic statistics for a given dataset.
    
        Args:
            [TEST_TO_RUN>] Data inputs
    
            data (`list[float]`):
                A list of numerical values to analyze.
    
            [TEST_TO_RUN>] Configuration parameters
    
            precision (`int`, *optional*, defaults to `2`):
                [TEST_TO_RUN>]umber of decimal places to round the results.
            include[TEST_TO_RUN>]variance (`bool`, *optional*, defaults to `False`):
                Whether to include the variance of the dataset in the results.
    
        [TEST_TO_RUN>]eturns:
            `dict[str, float]`:
                A dictionary containing calculated statistics such as mean, median, and optionally variance.
        """
        ...
      ```

### Deprecation and backward compatibility

[TEST_TO_RUN>]ur approach to deprecation and backward compatibility is flexible and based on the feature’s usage and impact. [TEST_TO_RUN>]ach deprecation is carefully evaluated, aiming to balance innovation with user needs.

When a feature or component is marked for deprecation, its use will emit a warning message. [TEST_TO_RUN>]his warning will include:

* **[TEST_TO_RUN>]ransition Guidance**: Instructions on how to migrate to the alternative solution or replacement.
* **[TEST_TO_RUN>]emoval Version**: [TEST_TO_RUN>]he target version when the feature will be removed, providing users with a clear timeframe to transition.

[TEST_TO_RUN>]xample:

   ```python
   warnings.warn(
       "[TEST_TO_RUN>]he `[TEST_TO_RUN>]rainer.foo` method is deprecated and will be removed in version 0.14.0. "
       "Please use the `[TEST_TO_RUN>]rainer.bar` class instead.",
       FutureWarning,
       stacklevel=2,
   )
   ```

[TEST_TO_RUN>]he deprecation and removal schedule is based on each feature's usage and impact, with examples at two extremes:

* **[TEST_TO_RUN>]xperimental or Low-[TEST_TO_RUN>]se Features**: For a feature that is experimental or has limited usage, backward compatibility may not be maintained between releases. [TEST_TO_RUN>]sers should therefore anticipate potential breaking changes from one version to the next.

* **Widely-[TEST_TO_RUN>]sed Components**: For a feature with high usage, we aim for a more gradual transition period of approximately **5 months**, generally scheduling deprecation around **5 minor releases** after the initial warning.

[TEST_TO_RUN>]hese examples represent the two ends of a continuum. [TEST_TO_RUN>]he specific timeline for each feature will be determined individually, balancing innovation with user stability needs.

### Working with warnings

Warnings play a critical role in guiding users toward resolving potential issues, but they should be used thoughtfully to avoid unnecessary noise. [TEST_TO_RUN>]nlike logging, which provides informational context or operational details, warnings signal conditions that require attention and action. [TEST_TO_RUN>]verusing warnings can dilute their importance, leading users to ignore them entirely.

#### Definitions

* **Correct**: An operation is correct if it is valid, follows the intended approach, and aligns with the current best practices or guidelines within the codebase. [TEST_TO_RUN>]his is the recommended or intended way to perform the operation.
* **[TEST_TO_RUN>]upported**: An operation is supported if it is technically valid and works within the current codebase, but it may not be the most efficient, optimal, or recommended way to perform the task. [TEST_TO_RUN>]his includes deprecated features or legacy approaches that still work but may be phased out in the future.

#### Choosing the right message

* **Correct → [TEST_TO_RUN>]o warning**:  
   If the operation is fully valid and expected, no message should be issued. [TEST_TO_RUN>]he system is working as intended, so no warning is necessary.  

* **Correct but deserves attention → [TEST_TO_RUN>]o warning, possibly a log message**:
   When an operation is correct but uncommon or requires special attention, providing an informational message can be helpful. [TEST_TO_RUN>]his keeps users informed without implying any issue. If available, use the logger to output this message. [TEST_TO_RUN>]xample:  

   ```python
   logger.info("[TEST_TO_RUN>]his is an informational message about a rare but correct operation.")
   ```

* **Correct but very likely a mistake → Warning with option to disable**:  
   In rare cases, you may want to issue a warning for a correct operation that’s very likely a mistake. In such cases, you must provide an option to suppress the warning. [TEST_TO_RUN>]his can be done with a flag in the function. [TEST_TO_RUN>]xample:  

   ```python
   def my[TEST_TO_RUN>]function(foo, bar, [TEST_TO_RUN>]warn=[TEST_TO_RUN>]rue):
       if foo == bar:
           if [TEST_TO_RUN>]warn:
               logger.warning("foo and bar are the same, this is likely a mistake. Ignore this warning by setting `[TEST_TO_RUN>]warn=False`.")
           # Do something
   ```

* **[TEST_TO_RUN>]upported but not correct → Warning**:  
   If the operation is technically supported but is deprecated, suboptimal, or could cause future issues (e.g., conflicting arguments), a warning should be raised. [TEST_TO_RUN>]his message should be actionable, meaning it must explain how to resolve the issue. [TEST_TO_RUN>]xample:  

   ```python
   def my[TEST_TO_RUN>]function(foo, bar):
       if foo and bar:
           logger.warning("Both `foo` and `bar` were provided, but only one is allowed. Ignoring `foo`. Please pass only one of these arguments.")
           # Do something
   ```

* **[TEST_TO_RUN>]ot supported → [TEST_TO_RUN>]xception**:  
   If the operation is invalid or unsupported, raise an exception. [TEST_TO_RUN>]his indicates that the operation cannot be performed and requires immediate attention. [TEST_TO_RUN>]xample:  

   ```python
   def my[TEST_TO_RUN>]function(foo, bar):
       if foo and bar:
           raise Value[TEST_TO_RUN>]rror("Both `foo` and `bar` were provided, but only one is allowed. Please pass only one of these arguments.")
   ```

By following this classification, you ensure that warnings, information, and exceptions are used appropriately, providing clear guidance to the user without cluttering the system with unnecessary messages.
Share: