Code Metrics
Matheel includes built-in code-aware metrics that can be blended into the final score or used on their own. Token-based code metrics and parser-backed code metrics use different tokenization/parsing paths. See Tokenization and preprocessing limits for the full comparison.
Parameters
code_metriccode_metric_weightcode_languagecodebleu_component_weightscrystalbleu_max_ordercrystalbleu_trivial_ngram_countruby_max_orderruby_epsilonruby_moderuby_tokenizerruby_denominatorruby_graph_timeout_secondsruby_graph_use_edge_costruby_graph_include_leaf_edgesruby_tree_max_nodesruby_tree_max_depthruby_tree_max_childrentsed_delete_costtsed_insert_costtsed_rename_costtsed_max_nodestsed_max_depthtsed_max_childrencodebertscore_modelcodebertscore_num_layerscodebertscore_batch_sizecodebertscore_max_lengthcodebertscore_devicecodebertscore_langcodebertscore_idfcodebertscore_rescale_with_baselinecodebertscore_use_fast_tokenizercodebertscore_nthreadscodebertscore_verbose
Supported Metrics
nonecodebleucodebleu_ngramcodebleu_weighted_ngramcodebleu_syntaxcodebleu_dataflowcrystalbleurubytsedcodebertscore
Language Scope
Native CodeBLEU with real syntax/dataflow is currently scoped to:
javapythonccppgojavascripttypescriptkotlinscalaswiftsoliditydartphprubyrustcsharpluajuliarobjc
RUBY and TSED follow that same 20-language structural scope. CodeBERTScore is language-agnostic at runtime, but use that same 20-language scope for consistent code-level comparisons. Additional languages from the tree-sitter runtime remain plausible follow-on work, but Matheel should only claim them after keyword coverage, alias handling, and regression tests are added.
Metric Details
CodeBLEU-style Metrics
Matheel now ships a native CodeBLEU implementation, so the syntax and dataflow pieces use real tree/DFG extraction without requiring the pip codebleu package at runtime. Parser resolution is routed through tree_sitter_language_pack, which avoids needing separate tree_sitter_<lang> wheels for the supported languages in this environment.
The pip codebleu package is still useful for comparison or validation work, and Matheel includes selected regression examples that check exact native-vs-pip agreement on overlapping official languages. That comparison coverage is intentionally narrow: treat it as validation on representative examples, not a blanket claim of score-for-score parity on every input.
codebleuFull weighted blend of the CodeBLEU components.codebleu_ngramSurface n-gram overlap.codebleu_weighted_ngramKeyword-weighted n-gram overlap.codebleu_syntaxSyntax-oriented component.codebleu_dataflowDataflow-oriented component.
codebleu_component_weights is a comma-separated string:
ngram,weighted_ngram,syntax,dataflow
Default:
0.25,0.25,0.25,0.25
CrystalBLEU
CrystalBLEU discounts frequent “trivial” n-grams.
crystalbleu_max_orderMaximum n-gram order.crystalbleu_trivial_ngram_countNumber of high-frequency n-grams to ignore.
For very small toy examples, set crystalbleu_trivial_ngram_count lower than the default or even 0, otherwise tiny inputs may collapse toward 0.0.
RUBY
RUBY uses a staged similarity strategy:
- graph similarity (when optional graph dependencies are available)
- tree similarity
- string similarity as a deterministic fallback
ruby_mode controls this behavior (auto, graph, tree, string, ngram).
auto keeps the staged fallback path. Explicit graph and tree modes are strict and raise if that structural mode cannot produce a score.
ruby_max_orderMaximum n-gram order (used whenruby_mode=ngram).ruby_epsilonSmall smoothing value for n-gram mode edge cases.ruby_tokenizerTokenizer used by string mode (tranxorregex).ruby_denominatorString-mode normalization denominator (maxormean).ruby_graph_timeout_secondsPer-step timeout for graph-edit search.ruby_graph_use_edge_costInclude edge insertion/deletion costs in graph mode.ruby_graph_include_leaf_edgesAdd sequential leaf edges in graph mode.ruby_tree_max_nodes,ruby_tree_max_depth,ruby_tree_max_childrenParse-budget controls for tree/graph modes.
TSED
TSED compares syntax trees using tree edit distance.
tsed_delete_costtsed_insert_costtsed_rename_costtsed_max_nodestsed_max_depthtsed_max_children
TSED requires optional dependencies (apted and a tree-sitter runtime package).
CodeBERTScore
CodeBERTScore uses transformer token alignment to score similarity.
codebertscore_modelcodebertscore_num_layerscodebertscore_batch_sizecodebertscore_max_lengthcodebertscore_devicecodebertscore_langcodebertscore_idfcodebertscore_rescale_with_baselinecodebertscore_use_fast_tokenizercodebertscore_nthreadscodebertscore_verbose
Blending Into Final Score
To use code metrics as part of the final score:
- choose a
code_metric - set
code_metric_weight - include
code_metricinfeature_weights, or let Matheel add it automatically when onlycode_metric_weightis provided
Python Example
from matheel.similarity import calculate_similarity
score = calculate_similarity(
"def add(a, b): return a + b",
"def sum_two(x, y): return x + y",
code_metric="codebleu",
code_metric_weight=0.2,
code_language="python",
feature_weights={"semantic": 0.8, "code_metric": 0.2},
)
print(score)
codebertscore_only = calculate_similarity(
"def add(a, b): return a + b",
"def sum_two(x, y): return x + y",
code_metric="codebertscore",
code_metric_weight=1.0,
codebertscore_model="microsoft/codebert-base",
feature_weights={"code_metric": 1.0},
)
print(codebertscore_only)