Version: 3.0 | Date: 2025-10-03 | Phase: III (Containerized Microservices)
Sieveo is an intelligent knowledge discovery and RAG (Retrieval-Augmented Generation) platform that enables technical professionals to search and analyze code repositories using natural language queries. Phase III transforms the Phase II CLI application into containerized, production-ready microservices with enhanced performance, scalability, and observability.
Before analyzing a request and creating TODOs, review the following instructions to determine if you should forward the request to a sub-agent.
- Use the scrum-master agent for all requirements, planning, task, and progress reporting requests.
- Use the python-dev agent for all for development of new features, modifying existing features, and resolving defects/issues with features
- Use the tester agent for all testing and quality checks including - formatting, typing, unit, integration, and end to end (e2e), and code coverage checks and tests
Language/Version: Python 3.13+ with modern type hints and async/await patterns
Primary Dependencies: FastAPI, Haystack framework, Pydantic v2, Structlog, OpenTelemetry
Vector Databases: Modular architecture supporting Qdrant (primary), Elasticsearch (enterprise), PostgreSQL+pgvector (cost-effective), Chroma (legacy)
Storage: Hybrid storage with vector DB for embeddings, PostgreSQL for metadata, Redis for caching
Job Queue: Celery with Redis broker for async indexing operations
Testing: pytest with 70% integration testing focus, Ragas for RAG evaluation
Deployment: Docker Compose (dev) and Kubernetes (prod) with Helm charts
Target Platform: Containerized microservices with auto-scaling (Linux/macOS development, cloud deployment)
Project Type: Three-service microservices architecture (Query, Index, Admin)
- Query Service (Port 8000): Read-optimized search operations with multi-tier caching
- Index Service (Port 8001): Write-optimized document ingestion with distributed job queue
- Admin Service (Port 8002): Control plane for user management, API keys, and system health
- Service Communication: Internal REST APIs with OpenAPI 3.1 specifications
- Service Discovery: Kubernetes DNS or Docker Compose hostnames
- Independent Scaling: Horizontal scaling for Query/Index, vertical for Admin
- Abstract Interface: VectorStoreInterface ABC with strict type hints
- Qdrant Backend: High-performance vector search with gRPC (5-10x faster, recommended for production)
- Elasticsearch Backend: Hybrid BM25+kNN search for enterprise deployments with existing infrastructure
- PostgreSQL+pgvector Backend: Unified database for vectors AND metadata, ACID transactions, lowest operational complexity
- Chroma Backend: Simple deployment for development and Phase II backward compatibility
- Factory Pattern: Configuration-driven backend selection without code changes
- Migration Tools: Data transfer between all backends with 100% integrity verification
- L1 Cache: In-memory LRU per Query Service instance (40-50% hit rate, <1ms latency)
- L2 Cache: Redis distributed cluster (20-30% hit rate, <10ms latency)
- Target Performance: 70%+ combined cache hit ratio, 50x latency improvement for cached queries
- Smart Invalidation: Event-driven cache invalidation via Redis pub/sub
- Cache Strategy: Query embeddings (infinite TTL), search results (1 hour TTL)
- Async-First Design: Native async/await for I/O-bound operations
- Pydantic v2 Validation: Runtime validation with compile-time type hints
- Dependency Injection: Service layer access via FastAPI dependencies
- OpenAPI 3.1: Auto-generated API documentation with Swagger UI
- Middleware Stack: Authentication, logging, rate limiting, compression, CORS
- BGE Reranker: Local deployment via sentence-transformers (no API costs)
- Model Selection: bge-reranker-base (dev), bge-reranker-large (prod), bge-reranker-v2-m3 (multilingual)
- Performance: ~50ms local inference vs 200-500ms API calls (4-10x faster)
- Quality: 0.95+ NDCG@10 on MS MARCO benchmark
- Integration: Rerank top-100 candidates � top-10 final results
- Distributed Tracing: OpenTelemetry with automatic FastAPI instrumentation
- Metrics: Prometheus-compatible metrics with Grafana dashboards
- Structured Logging: JSON logs with correlation IDs and trace context (structlog)
- Health Checks: Liveness, readiness, and startup probes for Kubernetes
- Performance Monitoring: P50/P95/P99 latency tracking, cache hit ratios, throughput metrics
- Docker Compose: Local development with profiles for vector DB selection
- Kubernetes: Production deployment with Helm charts and auto-scaling
- Blue-Green Deployment: Zero-downtime updates with health check coordination
- Multi-Architecture: Support for amd64 and arm64 container images
- Resource Management: CPU/memory requests and limits for all services
- API Key Authentication: Bearer token authentication via Admin Service
- Role-Based Access Control (RBAC): Viewer, contributor, and admin roles
- Scoped Permissions: Fine-grained access control (search:read, index:write, admin:manage)
- Rate Limiting: Per-API-key throttling with configurable limits
- Audit Logging: Immutable audit trail with user attribution and correlation IDs
- Search Latency (P50): <50ms (cached), <100ms (warm cache), <500ms (cold cache)
- Search Latency (P95): <100ms (target vs 2000ms in Phase II = 20x improvement)
- Cache Hit Ratio: 70%+ combined L1+L2 (40-50% L1, 20-30% L2)
- Indexing Throughput: 1000+ documents/minute (10x faster than Phase II)
- Concurrent Users: 1000+ simultaneous connections (100x scale vs Phase II)
- Service Availability: 99.9% target (max 43 minutes downtime per month)
- Auto-Scaling: Query service scales within 30 seconds of load increase
- Vector database abstraction enables backend selection without code changes
- Unified ingestion interface maintained through Haystack pipeline architecture
- GitHub repositories and local folders as pluggable data sources
- Hybrid search (semantic + keyword) maintains focus on retrieval accuracy
- BGE reranking enhances result quality with minimal latency impact
- Measurable quality metrics through search relevance scoring and NDCG
- All indexing uses Haystack component patterns
- Pipeline configurations externalized and version-controlled
- Maintains consistency with Phase I and Phase II implementations
- Incremental indexing preserved from Phase II with distributed job queue
- Resumable and fault-tolerant processing with Celery task chains
- Minimal reprocessing through Git-based change detection
- Enhanced structured logging with correlation IDs across all services
- Distributed tracing with OpenTelemetry for end-to-end visibility
- Performance monitoring with Prometheus metrics and Grafana dashboards
- Knowledge base statistics exposed through Admin Service health endpoints
- Comprehensive Pydantic v2 models with runtime validation across all services
- Type hints mandatory across all Phase III functionality (mypy strict mode)
- OpenAPI 3.1 specifications for all REST APIs with contract testing
- Vector store interface exemplifies plugin architecture with factory pattern
- Multiple backends (Qdrant, Elasticsearch, Chroma) without code changes
- Embedding model interchangeability maintained from Phase II
- Service decomposition enables independent service evolution
- Multi-tier caching reduces vector DB load by 70%
- Memory-conscious processing with configurable batch sizes
- Efficient incremental processing reduces computational waste
- Local reranking models eliminate per-request API costs ($0 vs $2/M requests)
- SearchQuery: Search request with filters, mode, and reranking options
- SearchResult: Individual result with score, metadata, and context
- SearchResponse: Complete response with results, latency, and cache status
- CacheEntry: Cache entry with TTL, access count, and expiration
- IndexingJob: Job tracking with state machine (QUEUED � RUNNING � COMPLETED/FAILED)
- DocumentIngestion: Document to be processed and indexed
- ProcessingStatus: Real-time progress tracking for indexing operations
- JobQueue: Celery queue metadata and worker allocation
- APIKey: Authentication credential with role, scopes, and rate limits
- UserRole: Permission set (viewer, contributor, admin)
- AuditLog: Immutable audit record with user attribution
- SystemHealth: Service health with component status aggregation
- VectorSearchRequest: Backend-agnostic search parameters
- VectorSearchResult: Normalized result across all backends
- VectorDocument: Document with embedding for indexing
- CollectionInfo: Collection metadata and statistics
- Feature branch:
003-review-the-proposal
- Complete isolation from Phase II codebase during development
- Controlled merge process after comprehensive validation
- Integration Testing (70%): End-to-end service workflows and API contracts
- Contract Testing (20%): Service interface compliance and OpenAPI validation
- Unit Testing (10%): Critical algorithm validation and edge cases
- RAG Testing: Ragas framework for search quality validation
- Vector Store Abstraction: Foundation for all data access with pluggable backends
- Caching Layer: L1 + L2 multi-tier caching required by Query Service
- Query Service: Read-optimized search with FastAPI and async patterns
- Index Service: Write-optimized ingestion with Celery job queue
- Admin Service: Control plane for user/key management and health checks
- Observability Stack: OpenTelemetry tracing, Prometheus metrics, Grafana dashboards
- Deployment Automation: Docker Compose and Kubernetes with Helm charts
- Zero-Downtime Strategy: Blue-green deployment with health check coordination
- API Key Authentication: SHA-256 hashed keys, never store plaintext
- Token Scope Validation: Fine-grained permissions with role-based access
- Rate Limit Enforcement: Per-API-key throttling with Redis tracking
- Audit Trails: Comprehensive logging with correlation ID tracking and immutability
- Input Validation: Pydantic v2 validation for all request payloads
- TLS/HTTPS: Encrypted communication for all external APIs
- Vector Databases: Qdrant (gRPC), Elasticsearch (HTTP), Chroma (HTTP)
- Redis: Caching (L2), job queue (Celery broker), pub/sub (cache invalidation)
- PostgreSQL: Metadata persistence for jobs, users, API keys, audit logs
- Query Service: Search operations with caching and reranking
- Index Service: Document ingestion with job queue and workers
- Admin Service: User management, API keys, health aggregation
- Service Communication: Internal REST APIs with correlation ID propagation
- Prometheus: Metrics scraping from
/metrics
endpoints
- Grafana: Dashboard visualization with pre-built panels
- Jaeger: Distributed trace collection via OpenTelemetry
- Phase II documents remain fully searchable in Phase III vector stores
- Phase II CLI maintained with automatic detection of Phase III services
- Existing configuration migrated to new multi-service architecture
- API contracts extended (not breaking) from Phase II
- Automatic detection and migration of Phase II Chroma storage
- Repository registration preserved with enhanced metadata
- Configuration schema upgrade with validation
- Vector store migration tools for Chroma � Qdrant/Elasticsearch
- P95 search latency <100ms (20x faster than Phase II)
- P50 search latency <50ms (16x faster than Phase II)
- 1000+ concurrent users (100x scale vs Phase II)
- 1000 docs/min indexing (10x faster than Phase II)
- 70%+ cache hit ratio (new capability)
- 99.9% service availability (production SLA)
- Auto-scaling within 30 seconds
- Three-service microservices architecture with independent scaling
- Multi-protocol API support (REST, gRPC deferred to Phase IV)
- Hybrid search with BGE reranking
- Multi-tier caching with event-driven invalidation
- Role-based access control with API keys
- Distributed tracing and structured logging
- Zero-downtime blue-green deployments
- Modular vector database architecture with 3 backends
- Type safety with Pydantic v2 and mypy strict mode
- OpenAPI 3.1 specifications for all APIs
- Observable operations with correlation tracking
- Resource efficiency with sustainable processing
- Comprehensive error handling with recovery hints
- Contract testing for service interfaces
Location:
/workspace/specs/003-review-the-proposal/data-model.md
Comprehensive Pydantic v2 models for:
- Query Service (SearchQuery, SearchResult, SearchResponse, CacheEntry)
- Index Service (IndexingJob, DocumentIngestion, ProcessingStatus, JobQueue)
- Admin Service (APIKey, UserRole, AuditLog, SystemHealth)
- Vector Store (VectorSearchRequest, VectorSearchResult, VectorDocument, CollectionInfo)
Location:
/workspace/specs/003-review-the-proposal/contracts/rest-api.yaml
OpenAPI 3.1 specification defining:
- Query Service endpoints (search, batch search)
- Index Service endpoints (repository indexing, job management)
- Admin Service endpoints (API keys, users, health, audit logs)
- Health check endpoints (liveness, readiness, startup)
- Authentication and rate limiting contracts
- Error response formats with recovery hints
Location:
/workspace/specs/003-review-the-proposal/contracts/vector-store-interface.py
Abstract interface contract with:
- VectorStoreInterface ABC with async methods
- Search operations (search, hybrid_search, batch_search)
- Document management (upsert, delete, delete_by_filter)
- Collection management (create, delete, get_info, list)
- Health and statistics (health_check, get_statistics)
- Factory pattern for backend selection
- Comprehensive type hints and validation
Location:
/workspace/specs/003-review-the-proposal/contracts/service-contracts.md
Service interaction patterns:
- Inter-service communication protocols (REST, correlation IDs)
- Data flow diagrams for Query, Index, and Admin services
- Cache invalidation contracts (event-driven pub/sub)
- Error handling and circuit breaker patterns
- Observability contracts (tracing, metrics, logging)
- Security contracts (authentication, authorization, rate limiting)
- SLA targets (availability, latency, throughput)
Location:
/workspace/specs/003-review-the-proposal/quickstart.md
Step-by-step deployment guide:
- Prerequisites and system requirements
- Docker Compose quick start (<10 minutes)
- Test scenarios (health checks, indexing, search, batch operations)
- Vector database selection (Qdrant, Elasticsearch, Chroma)
- Monitoring and metrics (Grafana, Prometheus)
- Troubleshooting common issues
- Production deployment (Kubernetes)
This context provides Claude Code with comprehensive understanding of Sieveo Phase III containerized microservices architecture, capabilities, and implementation requirements for effective development assistance.