Markdown Converter
Agent skill for markdown-converter
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Sign in to like and favorite skills
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Root Directory:
/Users/gadappa/GenAI/genAI/proof_reading_multiagent_system
This is the project root directory where all operations should be performed. Key characteristics:
.virtenv virtual environment directoryproof_reading_multiagent_system/ source code directorysource .virtenv/bin/activate from this directoryTarget: 8GB laptop development environment (production uses Google Cloud Run Functions)
# Add to ~/.zshrc for permanent configuration export NODE_OPTIONS="--max-old-space-size=3072" # 3GB heap limit for 8GB laptop
NODE_OPTIONS is set to 3GBThis is a Proof Reading Multiagent System built with CrewAI to automate document review and severity assessment for medical/scientific documents. The system uses two specialized agents to review document issues from Excel files and reassess their severity levels using configurable rulesets.
proof_reading_multiagent_system/src/proof_reading_multiagent_system/main.py - Contains execution functions (run, train, replay, test)proof_reading_multiagent_system/src/proof_reading_multiagent_system/crew.py - Uses @CrewBase decorator to define agents and tasksproof_reading_multiagent_system/src/proof_reading_multiagent_system/config/agents.yaml - YAML-based agent definitionsproof_reading_multiagent_system/src/proof_reading_multiagent_system/config/tasks.yaml - YAML-based task definitionsproof_reading_multiagent_system/src/proof_reading_multiagent_system/tools/ - Custom CrewAI tools extending BaseToolproof_reading_multiagent_system/src/proof_reading_multiagent_system/utils/ - Comprehensive logging with Google Cloud integrationproof_reading_multiagent_system/ ├── logs/ # Root-level log files directory │ ├── archive/ # Archived log files by date │ ├── proof_reading_system.log # Main system log file │ └── session_*.log # Session-specific log files ├── proof_reading_multiagent_system/ # Main project directory │ ├── data/ # Data files │ │ ├── benchmarks/ # Performance benchmark data │ │ ├── output/ # Generated output files │ │ └── samples/ # Sample input files for testing │ ├── docs/ # Documentation │ │ ├── api/ # API documentation │ │ ├── deployment/ # Deployment guides │ │ ├── development/ # Development documentation │ │ └── user-guides/ # User guides and tutorials │ ├── knowledge/ # CrewAI knowledge base │ │ └── user_preference.txt # User context and preferences │ ├── scripts/ # Utility scripts │ │ ├── extract_rules.py # Rule extraction utilities │ │ └── performance_benchmark.py # Performance testing │ ├── src/ # Source code │ │ └── proof_reading_multiagent_system/ # Main package │ │ ├── config/ # Agent and task configurations │ │ ├── models/ # Data models and schemas │ │ ├── tools/ # Custom CrewAI tools │ │ ├── utils/ # Utilities and logging infrastructure │ │ ├── config.yaml # Main system configuration │ │ ├── crew.py # CrewAI crew definition │ │ └── main.py # Main execution entry point │ ├── tests/ # Test suite │ │ ├── fixtures/ # Test fixtures and data │ │ ├── integration/ # Integration tests │ │ ├── performance/ # Performance tests │ │ └── unit/ # Unit tests │ ├── pyproject.toml # Project configuration and dependencies │ └── README.md # Project-specific README ├── CLAUDE.md # Claude Code instructions (this file) ├── requirements.txt # Python dependencies └── README.md # Root-level README
CRITICAL: The system is optimized to process only Medium/High severity issues for 60-80% efficiency gain.
XLSX Input → SeverityFilter → [Medium/High Issues] + [Low Issues (untouched)] ↓ ↓ Process Medium/High Only Keep Low Issues Separate ↓ ↓ Updated Medium/High Issues ← DataMergeTool ← Untouched Low Issues ↓ XLSX Output (Complete Dataset)
MANDATORY: All configuration must be self-contained within config.yaml files. The system MUST NOT depend on users manually setting environment variables.
# Required pattern for environment-independent configuration def _setup_google_cloud_environment(self) -> None: """Set up Google Cloud environment from configuration instead of requiring environment variables.""" try: gemini_config = self.config.get('gemini_config', {}) credentials_path = gemini_config.get('credentials_path') if credentials_path: # Handle relative paths relative to config file location if not os.path.isabs(credentials_path): config_dir = os.path.dirname(__file__) credentials_path = os.path.join(config_dir, credentials_path) # Set environment variable internally os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credentials_path
MANDATORY: All file and directory paths in config.yaml must use absolute paths for production stability.
/Users/gadappa/GenAI/genAI/proof_reading_multiagent_system/logs/)$HOME expansion for user directoryDocument processing context is configured in config.yaml for consistent processing:
# Document processing context (no CLI override) document_context: document_type: "Clinical Study Report" # Type of document being processed domain: "Oncology" # Medical domain/specialty urgency: "Standard" # Processing priority level # Behavioral settings (optional CLI override) behavior_config: verbose: false # Enable detailed console output by default
The system uses a config-first approach where most settings are defined in config.yaml:
# Essential usage - all context from config python -m proof_reading_multiagent_system.main --input "data/samples/issues.xlsx" # Optional overrides python -m proof_reading_multiagent_system.main --input "issues.xlsx" --output "/custom/path/result.xlsx" python -m proof_reading_multiagent_system.main --input "issues.xlsx" --verbose python -m proof_reading_multiagent_system.main --input "issues.xlsx" --quiet # Development commands run_crew train replay test
# 1. Activate virtual environment first source .virtenv/bin/activate # 2. Make code changes for a specific feature increment # 3. Stage related files strategically git add specific_files_for_feature # 4. Commit with descriptive message git commit -m "Implement [specific feature]: brief description" # 5. Continue development cycle
[Action] [Component/Area]: Brief description of what was accomplishedAdd ExcelReaderTool: implement XLSX to DocumentIssue conversionFix logging integration: resolve correlation ID threading issueUpdate config: add Gemini 2.0-flash API settingsCRITICAL: All new components must use the established logging infrastructure.
utils/ package# Correct pattern for new components from proof_reading_multiagent_system.utils import ( get_logger, log_structured, correlation_context, performance_monitor, audit_logger, log_execution_time ) # Use context managers for performance tracking with performance_monitor.track_processing_operation("operation_name") as metrics: # Processing logic here metrics.records_processed = count # Use audit logging for any severity changes audit_logger.log_severity_change(issue_id, old, new, reason, agent) # Use performance decorators for critical functions @log_execution_time("function_name") def critical_function(): pass
log_agent_decision() - Agent flagging and reassessment decisionslog_excel_operation() - File I/O operations with metadatalog_rule_application() - Rule engine decision trackinglog_api_call() - Gemini API usage and performancelog_performance() - System performance metricslog_error() - Structured error logging with contextaudit_logger.* - Compliance and change tracking@before_kickoff and @after_kickoff decorators for session loggingcrew.py logging integration pattern@CrewBase class YourCrewSystem: def __init__(self): self.logger = setup_logging() self.session_id = generate_correlation_id() @before_kickoff def setup_session_logging(self, inputs: Dict[str, Any]) -> Dict[str, Any]: with correlation_context(self.session_id): self.logger.info("Starting session", extra={'session_id': self.session_id}) return inputs @after_kickoff def finalize_session_logging(self, output: Any) -> Any: performance_monitor.log_session_summary() return output
log_error() with error context and correlation IDs# Required error handling pattern try: # Operation here pass except Exception as e: log_error( f"Operation failed: {str(e)}", error_type=type(e).__name__, operation="operation_name", correlation_id=get_correlation_id(), additional_context={"key": "value"} ) raise # Re-raise after logging
@log_execution_time decorators for performance-critical functions@agent decorator returning Agent objects@task decorator returning Task objectsProcess.sequential execution by defaultBaseTool and implement _run method with proper Pydantic schema""" Comprehensive module description following PEP 257. This module provides [brief description]. Key components include: - ClassName: Brief description of main classes - function_name(): Brief description of key functions Example: Basic usage example:: from proof_reading_multiagent_system.module_name import ClassName instance = ClassName() result = instance.method() Note: Any important notes about module usage, dependencies, or limitations. """
def process_excel_file( file_path: str, severity_filter: Optional[List[str]] = None ) -> Dict[str, Any]: """ Process Excel file and extract DocumentIssue objects with filtering. Args: file_path: Absolute path to the Excel file to process. severity_filter: Optional list of severity levels to include. Returns: Dictionary containing processed results with structure:: { "issues": List[DocumentIssue], "metadata": {"total_issues": int, "efficiency_gain": float} } Raises: FileNotFoundError: If the specified Excel file does not exist. ValueError: If the file format is invalid or missing required columns. Example: Basic usage with severity filtering:: result = process_excel_file( "/path/to/issues.xlsx", severity_filter=["High", "Medium"] ) """
from typing import Dict, List, Optional, Union, Any, Tuple from pathlib import Path from datetime import datetime # Type aliases for complex types IssueDict = Dict[str, Union[str, int, datetime]] ProcessingResult = Tuple[List[DocumentIssue], Dict[str, Any]] SeverityLevel = Literal["High", "Medium", "Low"]
crewai[tools]>=0.165.1,<1.0.0 - Core multiagent frameworkgoogle-cloud-aiplatform>=1.36.0 - VertexAI integration for Gemini 2.0-flashgoogle-cloud-logging>=3.8.0 - Google Cloud Logging for monitoring and audit trailspandas>=2.0.0 - Excel data processingopenpyxl>=3.1.0 - Excel file read/write operationspydantic>=2.0.0 - Data model validationpyyaml>=6.0 - Configuration file processing/Users/gadappa/GenAI/genAI/crewai-docs/docs/en - Local copy of official CrewAI documentation for referenceperformance_monitor.track_api_call() for all Gemini API calls# Required pattern for all Gemini API calls with performance_monitor.track_api_call("gemini") as metrics: # Make API call here result = llm.generate(prompt) metrics.tokens_used = result.token_count metrics.cost_estimate = calculate_cost(result.token_count)
knowledge/user_preference.txt contains user context (AI Engineer, San Francisco, interested in AI Agents)Note: For detailed deployment procedures, security frameworks, and extensive code examples, refer to:
CLAUDE_DETAILED_BACKUP.md - Full original documentationDEPLOYMENT_GUIDE.md - Production deployment proceduresDOCUMENTATION_PATTERNS.md - Detailed documentation examples