This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Noren is an edge-native PII (Personally Identifiable Information) redaction and tokenization library built on Web Standards (WHATWG Streams, WebCrypto, fetch). It provides lightweight core functionality with pluggable country-specific modules for detecting and masking sensitive data.
pnpm i
pnpm build
pnpm test
# Example for noren-core
cd packages/noren-core
pnpm test
pnpm typecheck
pnpm lint
pnpm format # Auto-format
pnpm format:check # Check formatting without changing files
pnpm check
# Basic PII redaction example
node examples/basic-redact.mjs
# Tokenization example
node examples/tokenize.mjs
# Stream processing example
node examples/stream-redact.mjs examples/basic-sample.txt
# Security plugin example
node examples/security-demo.mjs
# Custom dictionary example
node examples/dictionary-demo.mjs
# Web server example (requires hono installation)
node examples/hono-server.mjs
# Development tools examples (requires noren-devtools)
node examples/benchmark-demo.mjs
node examples/evaluation-demo.mjs
# Create a changeset for your changes
pnpm changeset
# Check current changeset status
pnpm changeset:status
# Version packages (for maintainers)
pnpm changeset:version
# Publish packages (automated via GitHub Actions)
pnpm changeset:publish
- Development: Make changes to packages in feature branch (from main)
- Changeset Creation: Run
pnpm changeset
to document changes
- Pull Request: Create PR to main branch with changeset files
- PR Canary Release: Manual canary release for specific PR testing (optional)
- Stable Release: PR merge to main triggers automatic stable release
- PR Canary releases: Manual workflow_dispatch with PR number specification (
@canary
tag)
# Install PR canary version for testing
npm install @himorishige/noren-core@canary
PR Canary releases allow testing specific PR changes before merging, enabling thorough validation of new features and fixes.
The project uses pnpm workspaces with six main packages:
-
@himorishige/noren-core (
packages/noren-core/
)
- Core detection logic for global PII types (email, IP, credit card)
- Registry system for managing detectors and maskers
- Streaming capabilities using WHATWG Streams
- HMAC-based tokenization using WebCrypto API
- Confidence scoring system for detection accuracy
-
@himorishige/noren-devtools (
packages/noren-devtools/
)
- Lightweight development and testing tools
- Performance benchmarking and accuracy evaluation
- Memory monitoring and metrics collection
- Statistical analysis and reporting
-
@himorishige/noren-plugin-jp (
packages/noren-plugin-jp/
)
- Japan-specific detectors: phone numbers, postal codes, MyNumber
- Japanese context hints and masking patterns
-
@himorishige/noren-plugin-us (
packages/noren-plugin-us/
)
- US-specific detectors: phone numbers, ZIP codes, SSN
- US context hints and masking patterns
-
@himorishige/noren-plugin-security (
packages/noren-plugin-security/
)
- Security-focused detectors: JWT tokens, API keys, HTTP headers, cookies
- Technical credential detection and masking
- Cookie allowlist functionality for selective masking
-
@himorishige/noren-dict-reloader (
packages/noren-dict-reloader/
)
- ETag-based policy and dictionary hot-reloading
- Fetch API integration for dynamic updates
- Web Standards Only: No Node.js-specific APIs beyond standard globals (Node 20.10+ required)
- Stream-First: Built around WHATWG
ReadableStream
/TransformStream
for efficient processing
- Plugin Architecture: Core provides base functionality, country plugins add regional specifics
- Type Safety: Written in TypeScript with strict typing throughout
Each package has its own test suite in the
test/
directory. Tests are written in TypeScript and compiled to
dist-test/
before running with Node's built-in test runner.
- Biome for linting and formatting
- Line width: 100 characters
- Indent: 2 spaces
- Quote style: single quotes
- Semicolons: as needed (ASI-friendly)
The codebase has been optimized for better performance:
- Pre-compiled Regex Patterns: All regular expressions are now compiled at module load time instead of runtime
- Context Hints Set Optimization: Context hints are managed using
Set
for O(1) lookup performance
- Detector Pre-sorting: Detectors are sorted once during registration instead of on every detection
- Type Safety Improvements: Reduced unsafe type assertions with null-safe helper functions
- Security Enhancements: HMAC keys now require minimum 32-character length
- Bundle Size Optimization: 65% smaller bundle size (360KB → 124KB)
- Codebase Simplification: 77% code reduction (8,153 → 1,894 lines)
- Large text processing: ~1.5ms for 100 PII elements
- Repeated detection: 1000 iterations in ~7ms (0.007ms per call)
- Context hint processing: 500 iterations with 20+ hints in ~4.5ms
Updated to support all current Japanese mobile numbers:
- 060-xxxx-xxxx (mobile)
- 070-xxxx-xxxx (mobile/PHS)
- 080-xxxx-xxxx (mobile)
- 090-xxxx-xxxx (mobile)
Enhanced IPv6 detection using two-phase parser approach:
- Phase 1: Coarse pattern extraction for candidate identification
- Phase 2: Strict parsing with
parseIPv6()
function for validation
- Compressed Notation: Proper handling of
::
compression and edge cases
- Address Classification: Automatic detection of private, loopback, link-local addresses
- Performance: ~0.1ms per candidate with reduced regex complexity
- @himorishige/noren-devtools: Streamlined development and testing tools (70% size reduction)
- Performance Benchmarking: Memory monitoring and throughput analysis
- Accuracy Evaluation: Precision/recall metrics with ground truth datasets
- Metrics Collection: Performance and accuracy tracking
- Statistical Analysis: Common statistical functions for data analysis
- Confidence Scoring System: Rule-based confidence calculation (0.0-1.0)
- Detection Sensitivity Levels: Preset thresholds (strict/balanced/relaxed)
- Simplified API: Streamlined configuration options
- Token Format Upgrade: Base64URL format for better security
- Enhanced HMAC Security: Increased minimum key length to 32 characters
- Environment-aware Configuration: Automatic exclusion of test patterns in development/test environments
- AllowDenyManager Class: Fine-grained control over detection patterns
- Default Test Pattern Exclusion: Automatic handling of example.com, localhost, private IPs
- Runtime Pattern Management: Add/remove patterns dynamically via API
- Custom Lists: Support for project-specific allowlist/denylist patterns
- Two-phase Parser Approach: Coarse extraction followed by strict parsing
- Boundary Detection: Improved accuracy with proper IPv6 boundary checking
- Memory Optimization: Reduced memory growth in repeated IPv6 operations
- Classification Support: Detect private, documentation, link-local, and unique local addresses
The
examples/
directory contains comprehensive samples demonstrating various use cases:
- basic-redact.mjs: Basic PII masking with core and regional plugins
- tokenize.mjs: HMAC-based tokenization for data anonymization
- detect-dump.mjs: Detailed PII detection analysis for debugging
- stream-redact.mjs: Efficient streaming processing for large files
- security-demo.mjs: HTTP header and API token redaction
- dictionary-demo.mjs: Custom dictionaries with hot-reloading
- hono-server.mjs: Web server integration example
- The project is in beta status with stable v0.4.0 release
- Focus on maintaining Web Standards compatibility
- Prioritize stream processing capabilities for edge deployment
- Keep core lightweight, push complexity to plugins and devtools when possible
- All optimizations maintain backward compatibility
- Advanced features moved to @himorishige/noren-devtools package
- Security plugin enables protection of technical credentials (JWT, API keys, cookies)
- Custom dictionary support allows for company-specific PII patterns
- ETag-based hot-reloading enables runtime policy updates
- Confidence scoring provides fine-grained detection control
- Development tools enable efficient performance testing and evaluation