Nano Banana Pro
Agent skill for nano-banana-pro
This is a Python application that parses PNC bank statement PDFs and converts them to CSV format for import into Google Sheets or Excel. The project uses reliable text-based parsing with JSON-based categorization. Enhanced coordinate-based parsing has been deprecated due to reliability issues.
Sign in to like and favorite skills
This is a Python application that parses PNC bank statement PDFs and converts them to CSV format for import into Google Sheets or Excel. The project uses reliable text-based parsing with JSON-based categorization. Enhanced coordinate-based parsing has been deprecated due to reliability issues.
source venv/bin/activate python tests/test_basic.py
# Basic parsing with categorization (recommended) python parse_statements.py --file statement.pdf --output output.csv # Process directory with monthly files python parse_statements.py --directory statements/ --output all.csv --monthly # ๐ YEAR PROCESSING MODE - Process complete year (auto-discovers directories) python parse_statements.py --year 2023 --output output/2023.csv # Year mode with cross-year boundary completion (includes January 2024 for December 2023 transactions) python parse_statements.py --year 2023 --include-next-month --output output/2023_complete.csv # Year mode with custom base path python parse_statements.py --year 2023 --base-path /path/to/statements --output 2023.csv # With summary report python parse_statements.py --directory statements/ --output all.csv --summary report.txt
src/ - Core parsing logic with JSON categorization
parsers/ - ๐๏ธ Modular parser system (NEW August 2025)
base_parser.py - Abstract interface for all bank parserspnc_statement_parser.py - Main PNC parser classpnc_patterns.py - All PNC-specific regex patternssection_extractor.py - Statement section handlingtransaction_parser.py - Core transaction parsing logiccategorization.py - JSON-based auto-categorization enginetext_utils.py - Text cleaning and merchant extraction utilitiesexperiments/ - Deprecated enhanced parsing featurestests/ - Test suitedocs/ - Documentation and analysisexamples/ - Usage examplesdata/ - Input PDFs (gitignored)output/ - Output CSVs (gitignored)The parser now supports comprehensive year processing that automatically discovers and processes files across multiple directories:
Key Features:
--year 2023 automatically finds files in PNC_Documents/2023/ and optionally PNC_Documents/2024/--include-next-month includes January 2024 statement for complete December 2023 transactions2023_complete.csv, monthly breakdowns, and summary reportsDirectory Structure Expected:
PNC_Documents/ โโโ 2023/ โ โโโ Spend_x2157_Statement_01_January_2023.pdf โ โโโ Spend_x2157_Statement_02_February_2023.pdf โ โโโ ... (all 2023 monthly statements) โโโ 2024/ โโโ Spend_x2157_Statement_01_January_2024.pdf # Contains Dec 2023 transactions โโโ ...
Usage Examples:
# Complete 2023 with automatic monthly breakdown python parse_statements.py --year 2023 --output output/2023.csv # Include next year for complete cross-year data python parse_statements.py --year 2023 --include-next-month --output output/2023.csv
๐๏ธ Modular Parser System (
src/parsers/)
BaseStatementParser abstract interface for future bank supportPNCStatementParser inherits from base with PNC-specific logicPNCPatterns centralizes all regex patternsCore Components (
src/parsers/)
SectionExtractor - Handles deposits, withdrawals, online banking sectionsTransactionParser - Core parsing logic with multi-line description supportTransactionCategorizer - JSON-based auto-categorization (reusable across banks)TextCleaner & MerchantExtractor - Text processing utilitiesDeprecated Features (
experiments/)
Always test after making changes:
python tests/test_basic.py # Core functionality python tests/test_extraneous_filtering.py # Text filtering
r'^(\d{1,2}/\d{1,2})\s+' # MM/DD at start of line
r'(\.?\d{1,3}(?:,\d{3})*\.?\d{0,2})' # Handles .14, 6,250.00, etc.
Now centralized in
TextCleaner class (src/parsers/text_utils.py):
This repository is PUBLIC on GitHub. Never include Personally Identifiable Information (PII) in:
Gitignore Protection:
PNC_Documents/ folder is gitignoredExamples of PII to Avoid:
x2157)summary used for both Path and StatementSummary)docs/PNC_Statement_Structure_Analysis.mdsrc/parsers/ for parser logic, src/ for other componentstests/# 1. Create new pattern class class BankOfAmericaPatterns: def __init__(self): self.DATE_PATTERN = re.compile(r'...') # BOA-specific patterns # 2. Create parser inheriting from base class BOAStatementParser(BaseStatementParser): def __init__(self): self.patterns = BankOfAmericaPatterns() self.categorizer = TransactionCategorizer() # Reuse # Initialize other components...
parser = PNCStatementParser() # Access patterns: parser.patterns.DATE_PATTERN # Access text cleaner: parser.transaction_parser.text_cleaner # Access categorizer: parser.categorizer
--year 2023 processes all 12 months + cross-year data automatically2023_complete.csv, 2023_complete_monthly/, etc.src/categories.json--years 2022,2023,2024)The parser uses
src/categories.json for transaction categorization:
{ "categories": { "Medical": { "patterns": ["Cleveland Clinic", "MetroHealth", "Mhs\\*Metrohealth"] } } }
Users can submit PRs to expand categories for the community!
Note to Claude: This is a financial data processing application. Always prioritize accuracy and data integrity. Test thoroughly before deploying changes. The modular parser with JSON categorization provides the most reliable results and is now designed for easy extension to other banks.
Key Benefits Achieved:
Import Changes:
from src.pnc_parser import PNCStatementParserfrom src.parsers import PNCStatementParserAccess Patterns:
parser.patterns.DATE_PATTERNparser.transaction_parser.text_cleanerparser.categorizer