<h1 align="center">
<a href="https://prompts.chat">
The project differentiates between 3 levels of contributors:
Sign in to like and favorite skills
The project differentiates between 3 levels of contributors:
[!IMPORTANT] This project does not accept pull requests that are fully or predominantly AI-generated. AI tools may be utilized solely in an assistive capacity.
Detailed information regarding permissible and restricted uses of AI can be found in the AGENTS.md file.
Code that is initially generated by AI and subsequently edited will still be considered AI-generated. AI assistance is permissible only when the majority of the code is authored by a human contributor, with AI employed exclusively for corrections or to expand on verbose modifications that the contributor has already conceptualized (e.g., generating repeated lines with minor variations).
If AI is used to generate any portion of the code, contributors must adhere to the following requirements:
For more info, please refer to the AGENTS.md file.
Before submitting your PR:
llama-perplexity and llama-bench)ggml source, run the test-backend-ops tool to check whether different backend implementations of the ggml operators produce consistent results (this requires access to at least two different ggml backends)ggml operator or added a new one, add the corresponding test cases to test-backend-opsAfter submitting your PR:
master to get maintainers attention<module> : <commit title> (#<issue_number>). For example: utils : fix typo in utils.py (#1234)<module> from here: https://github.com/ggml-org/llama.cpp/wiki/ModulesMaintainers reserve the right to decline review or close pull requests for any reason, particularly under any of the following conditions:
Avoid adding third-party dependencies, extra files, extra headers, etc.
Always consider cross-compatibility with other operating systems and architectures
Avoid fancy-looking modern STL constructs, use basic
for loops, avoid templates, keep it simple
Vertical alignment makes things more readable and easier to batch edit
Clean-up any trailing whitespaces, use 4 spaces for indentation, brackets on the same line,
void * ptr, int & a
Use sized integer types such as
int32_t in the public API, e.g. size_t may also be appropriate for allocation sizes or byte offsets
Declare structs with
struct foo {} instead of typedef struct foo {} foo
struct and enum keyword whenever they are not necessary// OK llama_context * ctx; const llama_rope_type rope_type; // not OK struct llama_context * ctx; const enum llama_rope_type rope_type;
(NOTE: this guideline is yet to be applied to the
codebase. New code should follow this guideline.)llama.cpp
Try to follow the existing patterns in the code (indentation, spaces, etc.). In case of doubt use
clang-format (from clang-tools v15+) to format the added code
For anything not covered in the current guidelines, refer to the C++ Core Guidelines
Tensors store data in row-major order. We refer to dimension 0 as columns, 1 as rows, 2 as matrices
Matrix multiplication is unconventional:
means $C^T = A B^T \Leftrightarrow C = B A^T.$C = ggml_mul_mat(ctx, A, B)

Use
snake_case for function, variable and type names
Naming usually optimizes for longest common prefix (see https://github.com/ggml-org/ggml/pull/302#discussion_r1243240963)
// not OK int small_number; int big_number; // OK int number_small; int number_big;
Enum values are always in upper case and prefixed with the enum name
enum llama_vocab_type { LLAMA_VOCAB_TYPE_NONE = 0, LLAMA_VOCAB_TYPE_SPM = 1, LLAMA_VOCAB_TYPE_BPE = 2, LLAMA_VOCAB_TYPE_WPM = 3, LLAMA_VOCAB_TYPE_UGM = 4, LLAMA_VOCAB_TYPE_RWKV = 5, };
The general naming pattern is
<class>_<method>, with <method> being <action>_<noun>
llama_model_init(); // class: "llama_model", method: "init" llama_sampler_chain_remove(); // class: "llama_sampler_chain", method: "remove" llama_sampler_get_seed(); // class: "llama_sampler", method: "get_seed" llama_set_embeddings(); // class: "llama_context", method: "set_embeddings" llama_n_threads(); // class: "llama_context", method: "n_threads" llama_adapter_lora_free(); // class: "llama_adapter_lora", method: "free"
get <action> can be omitted<noun> can be omitted if not necessary_context suffix of the <class> is optional. Use it to disambiguate symbols when neededinit/free for constructor/destructor <action>Use the
_t suffix when a type is supposed to be opaque to the user - it's not relevant to them if it is a struct or anything else
typedef struct llama_context * llama_context_t; enum llama_pooling_type llama_pooling_type(const llama_context_t ctx);
(NOTE: this guideline is yet to be applied to the
codebase. New code should follow this guideline)llama.cpp
C/C++ filenames are all lowercase with dashes. Headers use the
.h extension. Source files use the .c or .cpp extension
Python filenames are all lowercase with underscores
(TODO: abbreviations usage)
(TODO: add guidelines with examples and apply them to the codebase)
#ifdef FOO #endif // FOO
Existing code should have designated collaborators and/or maintainers specified in the CODEOWNERS file reponsible for:
When adding or modifying a large piece of code:
New code should follow the guidelines (coding, naming, etc.) outlined in this document. Exceptions are allowed in isolated, backend-specific parts of the code that do not interface directly with the
ggml interfaces.
(NOTE: for legacy reasons, existing code is not required to follow this guideline)
The Github issues, PRs and discussions contain a lot of information that can be useful to get familiar with the codebase. For convenience, some of the more important information is referenced from Github projects: