ABC Project (AI Benchmark Cluster)

Overview

ABC (AI Benchmark Cluster) is an advanced LLM benchmarking platform that evaluates AI models against human educational standards. The system provides comprehensive testing across multiple subjects and educational levels, from elementary school to PhD, using Ollama for model execution.

Key Features

Educational Level Benchmarking: Compare LLM performance against:
- 5th Grade Level
- High School Level
- Masters Level
- PhD Level
Subject Areas:
- Mathematics
- Computer Science
- Problem Solving
- General Reasoning
- Grammar
- Creative Writing
Automated Documentation: Self-generating performance reports and analysis through GitLab CI/CD pipelines
Pass/Fail Grading: Objective evaluation criteria for each educational level

Directory Structure

abc/
├── docs/          # Documentation and benchmark results
│   ├── results/      # Auto-generated benchmark results
│   ├── analysis/     # Performance analysis reports
│   └── comparisons/  # Educational level comparisons
├── src/          # Source code
│   ├── analysis/     # Analysis and metrics
│   ├── benchmarking/ # Core benchmarking system
│   ├── costs/        # Resource usage tracking
│   ├── database/     # Results storage
│   ├── pipeline/     # CI/CD pipeline integration
│   ├── runner/       # Ollama model runners
│   └── testing/      # Test suites by subject
├── tests/        # Test framework
└── templates/    # Report templates

Requirements

Development Environment

WSL (Windows Subsystem for Linux)
Python 3.12 or higher with
```
pyenv
```
and
```
uv
```
Ollama
Docker & Docker Compose
GitLab Runner (for CI/CD)
```
glab
```
CLI tool
```
kubectl
```
and
```
helm
```
for Kubernetes deployments

Environment Validation

Run the environment check script to verify your setup:

./scripts/check_dev.sh

This script will validate the installation of all required tools and provide installation instructions for any missing components.

Recommended Setup

The recommended way to run ABC is using Docker Compose, which ensures consistent environment and dependencies across all platforms.

Installation

Using Docker Compose (Recommended)

Clone the repository:

git clone https://gitlab.com/ai9804501/abc.git
cd abc

Build and start services:

docker-compose up -d

Manual Installation (Alternative)

Clone the repository:

git clone https://gitlab.com/ai9804501/abc.git
cd abc

Install Ollama:

curl https://ollama.ai/install.sh | sh

Ensure Python 3.12 is installed:

python3 --version  # Should output Python 3.12.x

Install uv:

pip install uv

Create virtual environment and install dependencies:

uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e ".[dev]"

Running Benchmarks

Using Docker Compose (Recommended)

Run benchmarks:

docker-compose exec app python -m src.pipeline.cli run-benchmarks

Manual Method

Start Ollama service:

ollama serve

Pull required models:

ollama pull llama2
# Add other models as needed

Run benchmarks:

python -m src.pipeline.cli run-benchmarks

Benchmark Reports

Reports are automatically generated in the GitLab CI pipeline and can be found in:

Pipeline artifacts under
```
docs/results/
```
Project wiki (auto-updated)
Generated site at
```
pages/benchmarks/
```

Sample Report Structure

Overall Performance Summary
Educational Level Comparisons
Subject-Specific Analysis
Pass/Fail Statistics
Resource Usage Metrics

Contributing

Create a new branch:

git checkout -b feature/your-feature-name

Run tests:

pytest

Submit merge request

DevOps Setup

CI/CD Pipeline

The project uses GitLab CI/CD with the following stages:

Setup: Prepares the Python environment
Test: Runs unit and integration tests
Benchmark: Executes model benchmarks
Analyze: Processes benchmark results
Document: Generates documentation and updates wiki
Build: Creates Docker images
Deploy: Deploys to Kubernetes environments
Cleanup: Manages environment resources

Kubernetes Deployment

The application can be deployed to Kubernetes using Helm:

Configure kubectl context:

kubectl config use-context your-cluster-context

Deploy to staging:

# Kubernetes deployment configuration has been removed
# Please refer to Docker Compose for deployment

Note: Kubernetes deployment configuration has been removed from this project. Please use Docker Compose for deployment as described above.

GitLab Configuration

Required GitLab CI/CD variables:

```
KUBE_CONFIG
```
: Base64 encoded kubeconfig file
```
CI_REGISTRY_USER
```
: GitLab registry username
```
CI_REGISTRY_PASSWORD
```
: GitLab registry password
```
GITLAB_TOKEN
```
: Token for wiki updates

Pre-commit Hooks

Install pre-commit hooks to ensure code quality:

uv pip install pre-commit
pre-commit install

This will run linters and formatters before each commit.

License

MIT License - see LICENSE file for details

ABC Project (AI Benchmark Cluster)

Overview

Key Features

Educational Level Benchmarking: Compare LLM performance against:
- 5th Grade Level
- High School Level
- Masters Level
- PhD Level
Subject Areas:
- Mathematics
- Computer Science
- Problem Solving
- General Reasoning
- Grammar
- Creative Writing
Automated Documentation: Self-generating performance reports and analysis through GitLab CI/CD pipelines
Pass/Fail Grading: Objective evaluation criteria for each educational level

Directory Structure

abc/
├── docs/          # Documentation and benchmark results
│   ├── results/      # Auto-generated benchmark results
│   ├── analysis/     # Performance analysis reports
│   └── comparisons/  # Educational level comparisons
├── src/          # Source code
│   ├── analysis/     # Analysis and metrics
│   ├── benchmarking/ # Core benchmarking system
│   ├── costs/        # Resource usage tracking
│   ├── database/     # Results storage
│   ├── pipeline/     # CI/CD pipeline integration
│   ├── runner/       # Ollama model runners
│   └── testing/      # Test suites by subject
├── tests/        # Test framework
└── templates/    # Report templates

Requirements

Development Environment

WSL (Windows Subsystem for Linux)
Python 3.12 or higher with
```
pyenv
```
and
```
uv
```
Ollama
Docker & Docker Compose
GitLab Runner (for CI/CD)
```
glab
```
CLI tool
```
kubectl
```
and
```
helm
```
for Kubernetes deployments

Environment Validation

Run the environment check script to verify your setup:

./scripts/check_dev.sh

This script will validate the installation of all required tools and provide installation instructions for any missing components.

Clone the repository:

git clone https://gitlab.com/ai9804501/abc.git
cd abc

Build and start services:

docker-compose up -d

Manual Installation (Alternative)

Clone the repository:

git clone https://gitlab.com/ai9804501/abc.git
cd abc

Install Ollama:

curl https://ollama.ai/install.sh | sh

Ensure Python 3.12 is installed:

python3 --version  # Should output Python 3.12.x

Install uv:

pip install uv

Create virtual environment and install dependencies:

uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e ".[dev]"

Running Benchmarks

Using Docker Compose (Recommended)

Run benchmarks:

docker-compose exec app python -m src.pipeline.cli run-benchmarks

Manual Method

Start Ollama service:

ollama serve

Pull required models:

ollama pull llama2
# Add other models as needed

Run benchmarks:

python -m src.pipeline.cli run-benchmarks

Benchmark Reports

Reports are automatically generated in the GitLab CI pipeline and can be found in:

Pipeline artifacts under
```
docs/results/
```
Project wiki (auto-updated)
Generated site at
```
pages/benchmarks/
```

Sample Report Structure

Overall Performance Summary
Educational Level Comparisons
Subject-Specific Analysis
Pass/Fail Statistics
Resource Usage Metrics

Contributing

Create a new branch:

git checkout -b feature/your-feature-name

Run tests:

pytest

Submit merge request

DevOps Setup

CI/CD Pipeline

The project uses GitLab CI/CD with the following stages:

Setup: Prepares the Python environment
Test: Runs unit and integration tests
Benchmark: Executes model benchmarks
Analyze: Processes benchmark results
Document: Generates documentation and updates wiki
Build: Creates Docker images
Deploy: Deploys to Kubernetes environments
Cleanup: Manages environment resources

Kubernetes Deployment

The application can be deployed to Kubernetes using Helm:

Configure kubectl context:

kubectl config use-context your-cluster-context

Deploy to staging:

# Kubernetes deployment configuration has been removed
# Please refer to Docker Compose for deployment

Note: Kubernetes deployment configuration has been removed from this project. Please use Docker Compose for deployment as described above.

GitLab Configuration

Required GitLab CI/CD variables:

```
KUBE_CONFIG
```
: Base64 encoded kubeconfig file
```
CI_REGISTRY_USER
```
: GitLab registry username
```
CI_REGISTRY_PASSWORD
```
: GitLab registry password
```
GITLAB_TOKEN
```
: Token for wiki updates

Pre-commit Hooks

Install pre-commit hooks to ensure code quality:

uv pip install pre-commit
pre-commit install

This will run linters and formatters before each commit.

License

MIT License - see LICENSE file for details

ABC Project (AI Benchmark Cluster)

Related Skills

<h1 align="center">

- Identify gaps

2. Apply Deepthink Protocol (reason about dependencies