<h1 align="center">
<a href="https://prompts.chat">
    
Sign in to like and favorite skills
A GenAI-powered tool that scans contracts and regulatory filings for missing clauses and suggests remediation using Anthropic's Claude API.
The Automated Document Compliance Auditor is a Flask-based web application that helps organizations ensure their documents comply with various regulations such as GDPR and HIPAA. It analyzes documents to identify missing clauses and provides AI-powered suggestions for remediation using Anthropic's Claude API.

The main landing page showing the application overview and navigation options
Browse uploaded documents with filtering and sorting options
Upload new documents for compliance checking
View document content with compliance issues highlighted
View detailed compliance issues and get AI-powered suggestions
┌─────────────────────────────────────────────────────────────────┐ │ Client Browser │ └───────────────────────────────┬─────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Flask Web Server │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ Routes │───▶│ Services │───▶│ Document Parser │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ Templates │ │ Rule Engine │ │ PDF Export Service │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │ │ │ └───────────────────────────┼─────────────────────────────────────┘ │ ┌────────────────┼────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌───────────────────┐ │ MongoDB │ │ Anthropic API │ │ Cache System │ │ (Document DB) │ │ (Claude LLM) │ │ (Flask-Caching) │ └─────────────────┘ └─────────────────┘ └───────────────────┘
git clone https://github.com/sylvester-francis/Automated-Document-Compliance-Auditor.git cd Automated-Document-Compliance-Auditor
python -m venv venv # On macOS/Linux source venv/bin/activate # On Windows # venv\Scripts\activate
pip install -r requirements.txt
Make sure MongoDB is running on your system. You can install it following the official MongoDB installation guide.
mkdir -p instance touch instance/.env
Edit the
.env file and add the following configuration:
SECRET_KEY=your-secret-key MONGO_URI=mongodb://localhost:27017/compliance_auditor ANTHROPIC_API_KEY=your-anthropic-api-key USE_MOCK_LLM=False # Set to True to use mock LLM service instead of Claude API API_KEY=your-api-key # For accessing the API endpoints MAX_CONTENT_LENGTH=10485760 # Maximum file size (10MB) ALLOWED_EXTENSIONS=pdf,docx,txt # Allowed file extensions
Note: You'll need to obtain an Anthropic API key from Anthropic's website. If you don't have one, you can set
to use the mock LLM service for testing.USE_MOCK_LLM=True
python app.py
Open your browser and navigate to http://localhost:5006
The application can also be deployed using Docker for easier setup and consistent environments.
git clone https://github.com/sylvester-francis/Automated-Document-Compliance-Auditor.git cd Automated-Document-Compliance-Auditor
export ANTHROPIC_API_KEY=your_anthropic_api_key # Alternatively, to use the mock LLM service (no API key required) export USE_MOCK_LLM=True
docker-compose up -d
Open your browser and navigate to http://localhost:5006
docker build -t document-compliance-auditor .
docker run -p 5006:5006 \ -e MONGO_URI=your_mongo_uri \ -e ANTHROPIC_API_KEY=your_api_key \ -e SECRET_KEY=your_secret_key \ document-compliance-auditor
Note: When using Docker without Compose, you'll need to set up MongoDB separately and provide the correct connection URI.
Upload Documents
Browse Documents
View Document Details
Run Compliance Check
Review Compliance Issues
Export Compliance Report
All functionality is also available through the API. See the API Documentation section for details.
Automated-Document-Compliance-Auditor/ ├── app/ # Flask application │ ├── __init__.py # App initialization │ ├── config.py # Configuration settings │ ├── extensions.py # Flask extensions │ ├── models/ # Data models │ │ ├── __init__.py │ │ ├── compliance.py # Compliance models │ │ └── document.py # Document models │ ├── routes/ # View functions │ │ ├── __init__.py │ │ ├── api.py # API endpoints │ │ ├── compliance.py # Compliance checking routes │ │ ├── documents.py # Document management routes │ │ └── main.py # Main routes │ ├── services/ # Business logic │ │ ├── __init__.py │ │ ├── bulk_processor.py # Batch document processing │ │ ├── document_classifier.py # Document type classification │ │ ├── document_service.py # Document handling │ │ ├── extraction_service.py # Text extraction │ │ ├── llm_service.py # LLM integration with mock support │ │ ├── pdf_exporter.py # PDF export generation │ │ ├── rule_engine.py # Compliance rules │ │ └── seed_service.py # Data seeding │ ├── static/ # Static assets │ │ ├── css/ # Stylesheets │ │ ├── js/ # JavaScript files │ │ └── img/ # Images │ ├── templates/ # Jinja2 templates │ │ ├── base.html # Base template │ │ ├── index.html # Homepage │ │ ├── about.html # About page │ │ ├── compliance/ # Compliance templates │ │ │ ├── debug.html # Debug page │ │ │ ├── results.html # Results page │ │ │ ├── results_partial.html # HTMX partial for results │ │ │ └── suggestions_partial.html # HTMX partial for suggestions │ │ ├── components/ # Reusable UI components │ │ │ └── pagination.html # Pagination component │ │ ├── documents/ # Document templates │ │ │ ├── bulk_upload.html # Bulk upload form │ │ │ ├── list.html # Document list │ │ │ ├── list_partial.html # HTMX partial for document list │ │ │ ├── upload.html # Upload form │ │ │ └── view.html # Document viewer │ │ └── reports/ # Report templates │ │ ├── compliance_pdf.html # Compliance report template │ │ └── document_pdf.html # Document report template │ └── utils/ # Utility functions │ ├── __init__.py │ ├── background_tasks.py # Background task processing │ ├── cache.py # Caching utilities │ ├── document_extractor.py # Document extraction utilities │ ├── error_handler.py # Centralized error handling │ ├── form_validation.py # Input validation │ ├── pagination.py # Pagination utilities │ ├── pdf_export.py # PDF export utilities │ ├── pdf_utils.py # PDF utility functions │ ├── rate_limiter.py # API rate limiting │ ├── security.py # Security utilities │ └── text_processing.py # Text processing utilities ├── instance/ # Instance-specific files │ ├── uploads/ # Uploaded documents │ └── temp/ # Temporary files ├── screenshots/ # Application screenshots ├── static/ # Global static files │ └── images/ # Image assets │ └── screenshots/ # Screenshot images for documentation ├── testdocuments/ # Test document files ├── tests/ # Test suite │ ├── __init__.py │ ├── conftest.py # Test configuration │ ├── test_api.py # API tests │ ├── test_document_service.py # Document service tests │ ├── test_extraction_service.py # Extraction service tests │ ├── test_routes.py # Route tests │ ├── test_rule_engine.py # Rule engine tests │ └── test_utils.py # Utility tests ├── app.py # Application entry point ├── app.log # Application logs ├── Dockerfile # Docker configuration ├── docker-compose.yml # Docker Compose configuration ├── requirements.txt # Python dependencies └── README.md # Project documentation
This project demonstrates:
The system extracts text from various document formats (PDF, DOCX, TXT) and splits it into paragraphs for analysis. It uses PyPDF2 for PDF extraction and python-docx for DOCX files, with specialized utilities in the utils module.
The rules engine (rule_engine.py) checks documents against predefined compliance rules using:
When a compliance issue is detected, the system generates remediation suggestions using Anthropic's Claude API (llm_service.py), providing context-appropriate clause examples that would satisfy compliance requirements. A fallback mock service is integrated directly into the LLM service and can be enabled by setting USE_MOCK_LLM=True in your environment variables or .env file.
The interface provides:
Error Handling:
AppError classUser Experience:
Performance Optimization:
Security Enhancements:
Feature Additions:
Code Quality:
This project uses ruff and flake8 for code quality checks. To run these checks locally:
# Navigate to your project directory cd /Users/sylvester/Desktop/Automated-Document-Compliance-Auditor # Activate virtual environment source venv/bin/activate # Run ruff on the entire codebase ruff check . # To automatically fix some issues ruff check --fix .
# Run flake8 on the entire codebase flake8 .
This project includes a GitHub Actions workflow for continuous integration and deployment. The workflow is defined in
.github/workflows/ci-cd.yml and includes the following stages:
The CI/CD pipeline uses GitHub Container Registry (GHCR) to store Docker images, which is free for public repositories. The pipeline automatically handles authentication using GitHub Actions' built-in secrets.
If you're using the deployment step, you'll need to set up the following GitHub secrets:
DEPLOY_USER: SSH username for deployment (if using SSH deployment)DEPLOY_HOST: SSH host for deployment (if using SSH deployment)The application provides a RESTful API for programmatic access to all features. API endpoints are secured with API key authentication and rate limiting.
All API requests require an API key to be included in the request headers:
X-API-Key: your-api-key
To generate and configure an API key for the application:
python -c "import secrets; print(secrets.token_hex(32))"
.env file in the instance directory:# Create the instance directory if it doesn't exist mkdir -p instance # Add the API key to your .env file echo "API_KEY=your_generated_key_here" >> instance/.env
For security best practices:
GET /api/documents - List all documents with pagination and filteringGET /api/documents/{document_id} - Get a specific document by IDGET /api/documents/{document_id}/compliance - Get compliance information for a documentPOST /api/documents/{document_id}/check - Check compliance for a documentGET /api/documents/{document_id}/export/pdf - Export a document as PDFGET /api/documents/{document_id}/compliance/export/pdf - Export compliance report as PDFGET /api/rules - List all compliance rulesGET /api/stats - Get application statistics