<h1 align="center">

Supercharge Your LLM Application Evaluations 🚀

Objective metrics, intelligent test generation, and data-driven insights for LLM apps

Ragas is your ultimate toolkit for evaluating and optimizing Large Language Model (LLM) applications. Say goodbye to time-consuming, subjective assessments and hello to data-driven, efficient evaluation workflows. Don't have a test dataset ready? We also do production-aligned test set generation.

Key Features

🎯 Objective Metrics: Evaluate your LLM applications with precision using both LLM-based and traditional metrics.
🧪 Test Data Generation: Automatically create comprehensive test datasets covering a wide range of scenarios.
🔗 Seamless Integrations: Works flawlessly with popular LLM frameworks like LangChain and major observability tools.
📊 Build feedback loops: Leverage production data to continually improve your LLM applications.

🛡 Installation

Pypi:

pip install ragas

Alternatively, from source:

pip install git+https://github.com/vibrantlabsai/ragas

🔥 Quickstart

Clone a Complete Example Project

The fastest way to get started is to use the

ragas quickstart

command:

# List available templates
ragas quickstart

# Create a RAG evaluation project
ragas quickstart rag_eval

# Specify where you want to create it.
ragas quickstart rag_eval -o ./my-project

Available templates:

```
rag_eval
```
- Evaluate RAG systems

Coming Soon:

```
agent_evals
```
- Evaluate AI agents
```
benchmark_llm
```
- Benchmark and compare LLMs
```
prompt_evals
```
- Evaluate prompt variations
```
workflow_eval
```
- Evaluate complex workflows

Evaluate your LLM App

ragas

comes with pre-built metrics for common evaluation tasks. For example, Aspect Critique evaluates any aspect of your output using

DiscreteMetric

import asyncio
from openai import AsyncOpenAI
from ragas.metrics import DiscreteMetric
from ragas.llms import llm_factory

# Setup your LLM
client = AsyncOpenAI()
llm = llm_factory("gpt-4o", client=client)

# Create a custom aspect evaluator
metric = DiscreteMetric(
    name="summary_accuracy",
    allowed_values=["accurate", "inaccurate"],
    prompt="""Evaluate if the summary is accurate and captures key information.

Response: {response}

Answer with only 'accurate' or 'inaccurate'."""
)

# Score your application's output
async def main():
    score = await metric.ascore(
        llm=llm,
        response="The summary of the text is..."
    )
    print(f"Score: {score.value}")  # 'accurate' or 'inaccurate'
    print(f"Reason: {score.reason}")


if __name__ == "__main__":
    asyncio.run(main())

Note: Make sure your
OPENAI_API_KEY
environment variable is set.

Find the complete Quickstart Guide

Want help in improving your AI application using evals?

In the past 2 years, we have seen and helped improve many AI applications using evals. If you want help with improving and scaling up your AI application using evals.

🔗 Book a slot or drop us a line: [email protected].

🫂 Community

If you want to get more involved with Ragas, check out our discord server. It's a fun community where we geek out about LLM, Retrieval, Production issues, and more.

Contributors

+----------------------------------------------------------------------------+
|     +----------------------------------------------------------------+     |
|     | Developers: Those who built with `ragas`.                      |     |
|     | (You have `import ragas` somewhere in your project)            |     |
|     |     +----------------------------------------------------+     |     |
|     |     | Contributors: Those who make `ragas` better.       |     |     |
|     |     | (You make PR to this repo)                         |     |     |
|     |     +----------------------------------------------------+     |     |
|     +----------------------------------------------------------------+     |
+----------------------------------------------------------------------------+

We welcome contributions from the community! Whether it's bug fixes, feature additions, or documentation improvements, your input is valuable.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

🔍 Open Analytics

At Ragas, we believe in transparency. We collect minimal, anonymized usage data to improve our product and guide our development efforts.

✅ No personal or company-identifying information

✅ Open-source data collection code

✅ Publicly available aggregated data

To opt-out, set the

RAGAS_DO_NOT_TRACK

environment variable to

true

Cite Us

@misc{ragas2024,
  author       = {VibrantLabs},
  title        = {Ragas: Supercharge Your LLM Application Evaluations},
  year         = {2024},
  howpublished = {\url{https://github.com/vibrantlabsai/ragas}},
}

Supercharge Your LLM Application Evaluations 🚀

Objective metrics, intelligent test generation, and data-driven insights for LLM apps

Key Features

🎯 Objective Metrics: Evaluate your LLM applications with precision using both LLM-based and traditional metrics.
🧪 Test Data Generation: Automatically create comprehensive test datasets covering a wide range of scenarios.
🔗 Seamless Integrations: Works flawlessly with popular LLM frameworks like LangChain and major observability tools.
📊 Build feedback loops: Leverage production data to continually improve your LLM applications.

🛡 Installation

Pypi:

pip install ragas

Alternatively, from source:

pip install git+https://github.com/vibrantlabsai/ragas

🔥 Quickstart

Clone a Complete Example Project

The fastest way to get started is to use the

ragas quickstart

command:

# List available templates
ragas quickstart

# Create a RAG evaluation project
ragas quickstart rag_eval

# Specify where you want to create it.
ragas quickstart rag_eval -o ./my-project

Available templates:

```
rag_eval
```
- Evaluate RAG systems

Coming Soon:

```
agent_evals
```
- Evaluate AI agents
```
benchmark_llm
```
- Benchmark and compare LLMs
```
prompt_evals
```
- Evaluate prompt variations
```
workflow_eval
```
- Evaluate complex workflows

Evaluate your LLM App

ragas

comes with pre-built metrics for common evaluation tasks. For example, Aspect Critique evaluates any aspect of your output using

DiscreteMetric

import asyncio
from openai import AsyncOpenAI
from ragas.metrics import DiscreteMetric
from ragas.llms import llm_factory

# Setup your LLM
client = AsyncOpenAI()
llm = llm_factory("gpt-4o", client=client)

# Create a custom aspect evaluator
metric = DiscreteMetric(
    name="summary_accuracy",
    allowed_values=["accurate", "inaccurate"],
    prompt="""Evaluate if the summary is accurate and captures key information.

Response: {response}

Answer with only 'accurate' or 'inaccurate'."""
)

# Score your application's output
async def main():
    score = await metric.ascore(
        llm=llm,
        response="The summary of the text is..."
    )
    print(f"Score: {score.value}")  # 'accurate' or 'inaccurate'
    print(f"Reason: {score.reason}")


if __name__ == "__main__":
    asyncio.run(main())

Note: Make sure your
OPENAI_API_KEY
environment variable is set.

Find the complete Quickstart Guide

Want help in improving your AI application using evals?

In the past 2 years, we have seen and helped improve many AI applications using evals. If you want help with improving and scaling up your AI application using evals.

🔗 Book a slot or drop us a line: [email protected].

🫂 Community

If you want to get more involved with Ragas, check out our discord server. It's a fun community where we geek out about LLM, Retrieval, Production issues, and more.

Contributors

+----------------------------------------------------------------------------+
|     +----------------------------------------------------------------+     |
|     | Developers: Those who built with `ragas`.                      |     |
|     | (You have `import ragas` somewhere in your project)            |     |
|     |     +----------------------------------------------------+     |     |
|     |     | Contributors: Those who make `ragas` better.       |     |     |
|     |     | (You make PR to this repo)                         |     |     |
|     |     +----------------------------------------------------+     |     |
|     +----------------------------------------------------------------+     |
+----------------------------------------------------------------------------+

We welcome contributions from the community! Whether it's bug fixes, feature additions, or documentation improvements, your input is valuable.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

🔍 Open Analytics

At Ragas, we believe in transparency. We collect minimal, anonymized usage data to improve our product and guide our development efforts.

✅ No personal or company-identifying information

✅ Open-source data collection code

✅ Publicly available aggregated data

To opt-out, set the

RAGAS_DO_NOT_TRACK

environment variable to

true

Cite Us

@misc{ragas2024,
  author       = {VibrantLabs},
  title        = {Ragas: Supercharge Your LLM Application Evaluations},
  year         = {2024},
  howpublished = {\url{https://github.com/vibrantlabsai/ragas}},
}

<h1 align="center">

Key Features

🛡 Installation

🔥 Quickstart

Clone a Complete Example Project

Evaluate your LLM App

Want help in improving your AI application using evals?

🫂 Community

Contributors

🔍 Open Analytics

Cite Us

Related Skills

<h1 align="center">

- Identify gaps

2. Apply Deepthink Protocol (reason about dependencies

Key Features

🛡 Installation

🔥 Quickstart

Clone a Complete Example Project

Evaluate your LLM App

Want help in improving your AI application using evals?

🫂 Community

Contributors

🔍 Open Analytics

Cite Us

<h1 align="center">

Documentation | Quick start | Join Discord | Blog | NewsLetter | Careers

Related Skills

<h1 align="center">

- Identify gaps

2. Apply Deepthink Protocol (reason about dependencies

Documentation | Quick start | Join Discord | Blog | NewsLetter | Careers