Matheel Documentation

Matheel is a Python package and CLI for source-code similarity. It combines semantic embeddings, lexical similarity, chunking, preprocessing, and code evaluation metrics in one workflow.

Matheel is organized around a simple flow:

  1. preprocess or normalize the code
  2. optionally chunk it
  3. encode it or score it lexically
  4. add code-aware metrics when needed
  5. compare one pair or rank a whole archive

Start with the usage guide for installation, optional extras, quick checks, demos, and example links.

Guides

Demos and Examples

Suggested Reading Order

  1. Start with Usage.
  2. Read Vectors and routing to choose the embedding path.
  3. Add Chunking and Preprocessing if you need code-aware shaping before scoring.
  4. Add Lexical metrics and baselines and Code metrics if you want hybrid scoring.
  5. Use Datasets and evaluation for labeled pair and retrieval datasets.
  6. Use Visualization and Leaderboard for inspectable benchmark artifacts.
  7. Run the reproducible benchmark demo for a small auditable workflow.
  8. Use Custom algorithms for project-specific scorers.
  9. Use Comparison suite for repeatable multi-run experiments.
  10. Use the contribution guides for algorithms and datasets before opening benchmark-facing PRs.