<h1 align="center">
<a href="https://prompts.chat">
Convert ChatGPT conversation exports to RAG-optimized markdown files.
Sign in to like and favorite skills
Convert ChatGPT conversation exports to RAG-optimized markdown files.
# Create environment conda env create -f environment.yml # Activate environment conda activate chatgpt-parser # Install package pip install -e .
# Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -e .
You can run the parser in two ways:
Recommended if you just want to use it quickly:
# Basic usage python -m src.cli conversations.json output/ # With options python -m src.cli conversations.json output/ --verbose python -m src.cli conversations.json output/ --overwrite python -m src.cli conversations.json output/ --quiet # Help python -m src.cli --help
If you want a
command:chatgpt-parser
# Install the package first pip install -e . # Now you can use the command directly chatgpt-parser conversations.json output/ # With options chatgpt-parser conversations.json output/ --verbose chatgpt-parser conversations.json output/ --overwrite # Help chatgpt-parser --help
Note: The
chatgpt-parser command is only available after running pip install -e .
output/ ├── 2025_01/ │ ├── 2025_01_15_Conversation-Title-path-1.md │ ├── 2025_01_15_Conversation-Title-path-2.md │ └── 2025_01_18_Another-Conversation.md ├── 2025_02/ │ └── ... ├── assets/ │ ├── 2025_01/ │ │ ├── image1.png │ │ └── image2.png │ └── 2025_02/ │ └── ... └── summary.json
Each conversation is exported as markdown with YAML frontmatter:
--- conversation_id: "uuid-abc" title: "Conversation Title" created: "2025-01-15T14:30:22Z" updated: "2025-01-16T10:15:33Z" model: "gpt-4o" branch: 1 total_branches: 2 message_count: 42 has_images: true has_code_execution: true has_web_search: false has_reasoning: false --- # Conversation Title ## User *2025-01-15 14:30:22* Message content... ## Assistant *2025-01-15 14:32:15* | Model: gpt-4o Response content...
from langchain.document_loaders import DirectoryLoader, UnstructuredMarkdownLoader from langchain.vectorstores import Chroma from langchain.embeddings import OpenAIEmbeddings # Load markdown files loader = DirectoryLoader( "output/", glob="**/*.md", loader_cls=UnstructuredMarkdownLoader ) documents = loader.load() # Create vector store embeddings = OpenAIEmbeddings() vectorstore = Chroma.from_documents(documents, embeddings) # Query results = vectorstore.similarity_search("your query here")
from llama_index import SimpleDirectoryReader, VectorStoreIndex # Load documents documents = SimpleDirectoryReader("output/", recursive=True).load_data() # Create index index = VectorStoreIndex.from_documents(documents) # Query query_engine = index.as_query_engine() response = query_engine.query("your query here")
# Direct Python execution (no installation required) python -m src.cli INPUT_FILE OUTPUT_DIR [OPTIONS] # OR after installing (pip install -e .) chatgpt-parser INPUT_FILE OUTPUT_DIR [OPTIONS] Arguments: INPUT_FILE Path to conversations.json export file OUTPUT_DIR Directory for markdown output Options: -v, --verbose Enable verbose (DEBUG) logging -q, --quiet Suppress informational output (errors only) --overwrite Overwrite existing markdown files --help Show help message and exit --version Show version and exit
The conversations.json file is corrupted or malformed. Check the file encoding (should be UTF-8) and ensure it's valid JSON.
The JSON file may be empty or all conversations failed validation. Run with
--verbose to see detailed error messages.
Conversation titles are automatically truncated to 140 characters. If you still see this error, it may be a platform-specific issue with deep directory nesting.
Check that you have write permissions in the output directory. On Windows, avoid writing to system directories.
# Run all tests pytest # Run with coverage pytest --cov=src --cov-report=html # Run specific test file pytest tests/contract/test_cli_interface.py
src/ ├── models/ # Data models (Conversation, MessageNode, Content) ├── parsers/ # JSON parsing, tree traversal, content extraction ├── writers/ # Markdown generation, asset copying, summary ├── utils/ # Utilities (logging, filename sanitization) └── cli.py # CLI entry point tests/ ├── fixtures/ # Sample conversation data ├── contract/ # CLI and output format tests ├── integration/ # Full pipeline tests └── unit/ # Component unit tests
MIT