Matheel Documentation
Matheel is a Python package and CLI for source-code similarity. It combines semantic embeddings, lexical similarity, chunking, preprocessing, and code evaluation metrics in one workflow.
Matheel is organized around a simple flow:
- preprocess or normalize the code
- optionally chunk it
- encode it or score it lexically
- add code-aware metrics when needed
- compare one pair or rank a whole archive
Start with the usage guide for installation, optional extras, quick checks, demos, and example links.
Guides
- Usage
- Preprocessing
- Chunking
- Vectors and routing
- Lexical metrics and baselines
- Code metrics
- Scoring and calibration
- Datasets and evaluation
- Visualization
- Leaderboard
- Reproducible benchmark demo
- Custom algorithms
- Comparison suite
- Contributing algorithms
- Contributing datasets
- Development
Demos and Examples
- Hugging Face Space demo
- Core workflows Colab notebook
- Dataset workflows Colab notebook
- Custom algorithms Colab notebook
- Gradio Colab notebook
- Visualization and leaderboard Colab notebook
- Examples folder
Suggested Reading Order
- Start with Usage.
- Read Vectors and routing to choose the embedding path.
- Add Chunking and Preprocessing if you need code-aware shaping before scoring.
- Add Lexical metrics and baselines and Code metrics if you want hybrid scoring.
- Use Datasets and evaluation for labeled pair and retrieval datasets.
- Use Visualization and Leaderboard for inspectable benchmark artifacts.
- Run the reproducible benchmark demo for a small auditable workflow.
- Use Custom algorithms for project-specific scorers.
- Use Comparison suite for repeatable multi-run experiments.
- Use the contribution guides for algorithms and datasets before opening benchmark-facing PRs.