Markdown Converter
Agent skill for markdown-converter
- **Tech & tooling**: Python 3.12+, pytest, ruff, mypy. CI defined in `.github/workflows/ci.yml`.
Sign in to like and favorite skills
.github/workflows/ci.yml.pip install -r requirements-dev.txt && pytest -q && ruff check . && mypy ..feat:, fix:, refactor:, docs:, test:). Include ! for breaking changes..github/ISSUE_TEMPLATE. Fill Context, Proposed solution, Acceptance criteria, Test plan.docs/ – Project documentation (user guides, API reference, tutorials, etc.).data/ – Data files or sample datasets (if needed for examples or tests).notebooks/ – Jupyter notebooks for exploration or tutorials (kept out of production code).src/ – Python source code in a package (use a “src” layout to avoid import issues). For example, src/<your_package>/__init__.py plus modules.tests/ – Test suite, mirroring the structure of the src/ package (e.g., tests/<module>/test_module.py).pyproject.toml (build system and project metadata), environment.yml (Conda environment spec for reproducibility), README.md, LICENSE, CHANGELOG.md, and configuration for CI (.github/workflows/ if using GitHub Actions).pyproject.toml (PEP 621). Use widely adopted libraries (e.g., pandas, NumPy, scikit-learn, matplotlib, seaborn, networkx, pm4py, openpyxl) and avoid unnecessary or unmaintained packages. Pin minimum versions if needed for compatibility, but allow flexibility for patch updates.environment.yml file to recreate the exact Conda environment for the project. Include Python 3.12 and all core libraries. This ensures collaborators (human or AI) can easily set up a matching environment with conda env create -f environment.yml.pyproject.toml or as environment.yml dev dependencies), including linters/formatters (Ruff), type-checker (Mypy), and testing tools.Use descriptive, standardized naming conventions for all identifiers. Follow PEP 8 naming styles as summarized below:
| Element | Convention | Example |
|---|---|---|
| Package/Module | short, all-lowercase (avoid underscores unless improving readability) | (module), (package) |
| Class | CapWords (PascalCase) | |
| Function/Method | snake_case (lowercase, words separated by ) | |
| Variable/Attribute | snake_case | , |
| Constant | UPPER_CASE (all caps with underscores) | |
| Internal/Private | Single leading underscore for internal use | |
Choose meaningful and concise names: for example, prefer
count_words() over cw(). Avoid overly long names; strike a balance between descriptiveness and brevity. Ensure that names are memorable and manageable for users – a user should be able to type an import or usage quickly (e.g. from datareader import DataReader). Check PyPI and GitHub to avoid name collisions before naming a new package.
except: which catches everything. For example, catch a FileNotFoundError or a custom exception, not Exception wholesale. This prevents hiding unexpected bugs. If an exception is truly unexpected, let it bubble up rather than catching it and continuing in an unknown state.ValueError with a message explaining the valid range or format. This aids both users and developers (including AI assistants) in understanding what went wrong.SimulationError for domain-specific issues). This makes exception handling more expressive for users of your package.except blocks or pass statements in exception handling. If you catch an exception, handle it or at least log it. Failing silently makes bugs hard to find. In test code, consider using pytest’s facilities to expect exceptions rather than catching them.try/finally or context managers (with statements) to ensure that resources (files, network connections, etc.) are properly closed or released even if errors occur. This prevents resource leaks and other unintended side-effects. For example, use with open('file.txt') as f: instead of manual open/close, and for acquiring locks or connections, rely on context managers when available.logging module for recording runtime information, instead of printing to stdout. Configure a top-level logger in each module and use appropriate log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL) to categorize messages. For example, use INFO for high-level progress messages, DEBUG for diagnostic details, and ERROR for exceptions or critical failures. Avoid excessive logging in tight loops (to not degrade performance), but ensure important events and errors are logged. Set a default logging configuration (e.g., via logging.basicConfig) so that logs are output to console or file as needed. Example logger setup and usage:import logging logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(name)s: %(message)s") logger = logging.getLogger(__name__) logger.info("Simulation started") # informational message logger.debug("Parameters: %r", params) # debug-level detail logger.warning("Data size is large") # warning about a potential issue
In library code, the logger should inherit settings from the application using the library; do not call
basicConfig in library modules (do it in the application entry point). This allows users to configure logging as they wish. Ensure that sensitive information (passwords, secrets) is never logged.
warnings module to alert users to conditions that aren’t exceptions but merit attention (e.g., deprecated features or suspicious usage). Issue warnings of specific categories (like DeprecationWarning, UserWarning) so users or testers can filter them. For example, to warn about a deprecated parameter, you might do: warnings.warn("param X is deprecated, use Y", DeprecationWarning). In library code, consider marking deprecations with warnings and documenting timelines for removal.-W error or in code:import warnings warnings.filterwarnings("error") # Treat all warnings as errors during tests
This will cause tests to fail if any warning is triggered, forcing the team to address it (either by fixing the cause or explicitly filtering expected warnings). You can be selective (e.g., only turn specific categories into errors) using pytest or warnings filters. During normal usage, warnings can be left as warnings, but in continuous integration they should be zero.
test_*.py (or classes in those files) and put them under the tests/ directory. Use descriptive names for test functions (e.g., test_word_counting_correctness()) to indicate what they verify. Avoid generic names like test_functionality(); instead, be specific about the behavior being tested.tests/ directory to mirror the source structure, possibly with subdirectories for unit vs integration tests if it helps organization. Example test file structure: for src/mypkg/module.py, create tests/module/test_module.py with functions test_some_behavior() covering functions of that module. This one-to-one structure makes it easy to locate tests for each part of the code.pytest --cov=mypkg to report coverage on the package). Treat coverage as a helpful metric: focus on covering critical logic, edge cases, and error conditions. Strive to include at least one test for every public function or class method. If coverage drops below the target, add tests to cover the gaps.@pytest.mark.parametrize) over writing many similar tests with different inputs.pytest -ra to show summary of skipped/xfailed tests to ensure you don’t overlook tests being skipped. A failing test should be seen as a top priority to fix (either the code or the test).pytest command-line options for efficiency, like -x to stop on first failure when debugging, or --ff (fail-fast / run last failures first) to iterate quickly. In CI, run the full suite with coverage.Good documentation is as important as the code itself. Provide multiple levels of documentation so that users (and developers) can easily understand and use the project. Documentation should cover what the code does, how to use it, and why certain decisions were made.
| Documentation | Typical location | Description |
|---|---|---|
| README | Root | Provides high-level information about the package, e.g., what it does, how to install it, and how to use it. |
| License | Root | Explains who owns the copyright to your package source and how it can be used and shared. |
| Contributing guidelines | Root | Explains how to contribute to the project. |
| Code of conduct | Root | Defines standards for how to appropriately engage with and contribute to the project and its community. |
| Changelog | Root | A chronologically ordered list of notable changes to the package over time, usually organized by version. |
| Docstrings | .py files | Text appearing as the first statement in a function, method, class, or module in Python that describes what the code does and how to use it. Accessible to users via the command. |
| Typehints | .py files | Type annotations of functions and methods that aid in development and documentation. |
| Examples | | Step-by-step, tutorial-like examples showing how the package works in more detail. |
| API reference | | An organized list of the user-facing functionality of your package (i.e., functions, classes, etc.) along with a short description of what they do and how to use them. Typically created automatically from your package's docstrings using the or tool. |
README: Include a high-level README.md at the root that gives an overview of the project – its purpose, main features, how to install it, and a quickstart example of usage. This is the first exposure for users and should be kept up-to-date especially with any major changes.
License: Include a LICENSE file at the root of the repository to specify the licensing terms for the project. Choose a license that aligns with your project's goals (e.g., MIT, Apache 2.0). GNU General Public License v3 (GPL-3): less permissive than the above licenses. Any changes made to your software must be recorded, and the complete source code of the original software and modifications of it must be made available under the same GPL-3 license.
Contributing Guidelines: If the project is open to contributions, include a
CONTRIBUTING.md file that outlines how to contribute (coding style, testing, submitting PRs). This helps set expectations for external contributors (including AI-generated contributions).
Code of Conduct: If the project is public, include a
CODE_OF_CONDUCT.md to set expectations for community behavior.
Changelog: Maintain a CHANGELOG.md, use a tool to generate one from commit history, that records all notable changes for each release. Follow Keep a Changelog conventions: categorize changes into Added, Changed, Fixed, Removed, etc., under each version heading. This helps users (and developers) see what’s new or different in each version. Prioritise the use of Conventional Commits and semantic-release, this can be automated, but ensure the generated content is clear.
Docstrings:
help() and in generated docs, so they should stand on their own.def count_words(input_file: str) -> Counter: """Count words in a text file. Words are made lowercase and punctuation is removed before counting. Parameters ---------- input_file : str Path to the text file to read. Returns ------- collections.Counter A Counter mapping words to their frequency in the text. Examples -------- >>> count_words("example.txt") Counter({'the': 10, 'and': 5, ...}) """ # function implementation...
Typehints: Ensure that function signatures include type hints (which will appear in the docs as well) – modern Python allows writing types inline (e.g.,
def func(x: int) -> str:) and this greatly improves clarity in documentation. Use mypy for Type Checking: mypy is a static type checker for Python that can catch type errors based on your annotations.
Inline Comments & Section Tags:
# explanation of why this approach is used is valuable.i = 0 # set i to zero is not useful).# --- Data Loading --- as a separator) to improve readability in long modules. This is optional but can help organize code into logical blocks.TODO or FIXME in comments, use a consistent format (all caps, often with a colon, e.g., # TODO: handle edge cases for X). Many editors/IDEs will recognize these tags, and they serve as actionable notes for future improvements. For example: # TODO: improve the algorithmic complexity here. Ensure that such tags are addressed in a timely manner or at least tracked.User Guide & Examples: In the
docs/ directory, provide a narrative documentation (if applicable) – e.g., usage guides, tutorials, or “how-to” examples demonstrating common use cases of the project. For a data science project, you might include a “getting started” tutorial notebook (placed in notebooks/). This helps new users (or new team members) understand the workflow.
Reference (API) Docs: Set up MkDocs with mkdocstrings plugin for simple projects, or Sphinx for complex projects requiring advanced features.
autodoc or autoapi extension and Napoleon extension to parse NumPy-style docstrings. Sphinx provides more advanced features, multiple output formats, and extensive customization options, but requires more setup and maintenance.Documentation in CI: Treat documentation as part of the build. Use CI to ensure that docs can be built without errors or warnings. For MkDocs, this means running
mkdocs build --strict to fail on warnings. For Sphinx, use sphinx-build -W to treat warnings as errors. Possibly deploy docs automatically (e.g., to GitHub Pages or Read the Docs). Keeping documentation live and updated ensures users always see docs matching the latest release.
feat: ..., fix: ..., docs: ..., etc., in commit messages to indicate the nature of changes. For example:
feat(parser): add support for new file format – a new feature (triggers a minor version bump).fix(api): handle null values in response – a bug fix (triggers a patch bump).perf: improve algorithm speed by 20% – a performance improvement (treated like a fix in SemVer, assuming no API change).docs: update usage examples in README – documentation changes (typically do not affect version).build: update CI pipeline for tests – changes to build process or tools.BREAKING CHANGE: in commit body – indicates an incompatible API change and forces a major version bump.
Using this consistent commit language allows automation tools to determine how to bump the version and generate changelog entries. It also makes commit history more readable. Even if not every contributor strictly follows it, aim to enforce this in PR reviews (and it can be auto-formatted by git hooks or a commit-linting tool).pyproject.toml (or wherever it’s defined), update CHANGELOG.md with the new version and changes, then commit with a message like build: release vX.Y.Z and create a Git tag for the version (e.g., git tag -a vX.Y.Z -m "Release vX.Y.Z").$ python -m build # builds both sdist (.tar.gz) and wheel (.whl) into the dist/ folder
pyproject.toml config for build instructions (PEP 517/518).pip install build) and a tool like wheel installed.dist/. Inspect them (you can even test install locally with pip install dist/yourpkg-X.Y.Z-py3-none-any.whl to verify).$ twine upload dist/*
twine upload --repository testpypi dist/* to publish to TestPyPI, install from there to smoke test, then upload to PyPI. Automate as much as possible to avoid mistakes (you might include a makefile or script for release).pyproject.toml, generate or update the changelog, commit those changes, tag the release, and even publish the new package to PyPI, all in one go.v2.0.0). This makes it easy for anyone (or any tool) to checkout code for a specific release. It’s also required for tools like PSR to detect the last release. Push tags to the remote repo. In GitHub, consider creating Releases which can tie to tags and automatically generate tarballs, etc. If using semantic-release, it will handle tagging; otherwise remember to do it manually.main (or master) branch where the latest development happens and perhaps stable branches for major versions if you need to backport fixes. Decide on a branching strategy (Git Flow, GitHub Flow, etc.) appropriate for your team. For most projects, a simple approach is: feature branches -> pull requests -> merge into main -> tag releases on main. When a new version is tagged, that triggers the release process. Document this process for contributors.Leverage CI/CD to maintain code quality and automate the release pipeline. This not only saves time but also ensures consistency.
Continuous Integration (CI): Set up continuous integration (e.g., GitHub Actions) to run on each push and pull request. The CI workflow should at minimum:
conda env create -f environment.yml.pytest --cov=yourpkg and perhaps fail if coverage drops below 90%. Configure CI to fail on any test failure or unhandled warning (use -W error as discussed) to keep quality high..github/workflows/ci.yml for GitHub Actions). Use a matrix if you want to test multiple Python versions or platforms, but given this project requires 3.12+, you might just use one unless library compatibility needs testing. Use conda on CI if your dependencies include non-Python packages (since conda can handle those). A sample GitHub Actions CI might trigger on pull requests to the main branch and include steps as above.Continuous Deployment (CD): Automate the release deployment so that publishing a new version is less error-prone:
main (and maybe only if version in pyproject was bumped or a commit message indicates release) to execute semantic-release version (which will bump version and tag) and semantic-release publish (which will upload to PyPI). This job will need the PyPI token configured as a secret (exposed as an env var like PYPI_TOKEN). It should also have Git credentials to push the version bump commit and tag back to the repo. The py-pkgs-cookiecutter provides an example: a CI/CD workflow with separate CI job (tests/docs) and CD job that triggers after CI on main branch push, running PSR to handle release.CHANGELOG.md based on commits). If you prefer a manual release process, you might skip automation, but then you should document the manual steps clearly to avoid mistakes.Badges: It’s a good practice to add badges to your README for build status, coverage, docs, PyPI version, etc. They provide at-a-glance status. For instance, a GitHub Actions badge for the CI status, a Codecov badge for coverage percentage, a PyPI version badge to show the latest release, and a license badge. While not required, they reinforce the presence of CI/CD and quality checks for anyone viewing the project.
By setting up robust CI/CD, you create a feedback loop where code style violations, failing tests, or insufficient coverage are caught early, and releases become a non-event (simply merging code with proper messages triggers a release). This allows both human developers and AI assistants to contribute with confidence, knowing that the automated checks will enforce the guidelines and that deployment is handled consistently.
Write code that is efficient, but only after ensuring correctness and clarity. Premature or unnecessary optimization can obfuscate code without real benefit. Use these guidelines to optimize wisely:
%timeit in IPython for micro-benchmarks and cProfile or line_profiler for larger sections. Often you may be surprised which part of the code is slow. Let data guide optimizations – this ensures you focus on the true hot spots and can quantify improvement.sum(), any(), max()) and standard library modules (like itertools, functools) which are optimized in C. In short, delegate work to optimized libraries whenever possible (linear algebra to NumPy/SciPy, data manipulations to pandas, etc.).functools.lru_cache or by storing intermediate results). However, always weigh if the added complexity is justified by a real performance need. Document any non-obvious optimizations so that contributors understand the motivation.concurrent.futures or multiprocessing for CPU-bound tasks, or asyncio for IO-bound tasks. But be mindful of the complexities (e.g., the GIL for CPU-bound tasks – NumPy releases the GIL in many cases, though). For large data, consider chunking or streaming to avoid high memory usage. Again, only implement these if a clear performance bottleneck is identified.float32 vs float64 where appropriate, categoricals for strings in pandas, etc., to reduce footprint). Free or dereference large objects if they are no longer needed (though Python’s GC will handle most, sometimes explicit del can help large data in long-running processes).Before merging code (or releasing), use this checklist to ensure all guidelines have been met. This helps maintain quality over time:
pass on error)? Are specific exceptions caught instead of blanket exceptions? If new errors could occur, are they handled or explicitly propagated? Check that no bare except or overly broad catches were introduced.pytest)? Is test coverage still >= 90% (check the coverage report)? If any test was changed, verify that it was due to intended changes in behavior.-W error to see if any warning pops up – if so, address the cause or explicitly filter it if it’s harmless and unavoidable. No new deprecation warnings or resource warnings should be present.pyproject.toml and environment.yml.CHANGELOG.md with a summary of changes under “Unreleased” or the new version section. If using automation, ensure the commit messages will allow the changelog to be generated correctly.make html or equivalent to build docs locally – verify no warnings or errors in doc generation, and that new content is included and well-formatted.pip install . or similar) works from scratch. This catches any missing files or packaging issues before release.Only after this checklist is satisfied should code be merged and/or released. By rigorously following these steps, you maintain a high-quality codebase. This discipline also trains AI coding assistants to align with these practices – for instance, an AI suggesting code will tend to produce docstrings and tests if it “sees” that as the norm in the repository.
Each of the above points contributes to a project that is maintainable, reliable, and easy to use. By enforcing these guidelines, you ensure that both human contributors and AI assistants (like GitHub Copilot) produce code that is consistent with the project’s standards, resulting in a smoother collaboration and a healthier codebase overall.