API Reference
Basic Matching
fuzzybunny.levenshtein(s1, s2)
Source code in src/fuzzybunny/_fuzzybunny.pyi
fuzzybunny.partial_ratio(s1, s2)
Source code in src/fuzzybunny/_fuzzybunny.pyi
fuzzybunny.jaccard(s1, s2)
Source code in src/fuzzybunny/_fuzzybunny.pyi
fuzzybunny.token_sort(s1, s2)
Source code in src/fuzzybunny/_fuzzybunny.pyi
fuzzybunny.token_set(s1, s2)
Source code in src/fuzzybunny/_fuzzybunny.pyi
fuzzybunny.qratio(s1, s2)
Source code in src/fuzzybunny/_fuzzybunny.pyi
fuzzybunny.wratio(s1, s2)
Source code in src/fuzzybunny/_fuzzybunny.pyi
Ranking
fuzzybunny.rank(query, candidates, scorer='levenshtein', mode='full', process=True, threshold=0.0, top_n=-1, weights=None)
Ranks a list of candidates based on their similarity to a query string.
This is the primary function for finding the best matches in a collection. It supports multiple scoring algorithms, threshold filtering, and integrated string normalization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
The string to search for. |
required |
candidates
|
CandidatesType
|
A collection of strings to search through. Can be a list, pandas.Series, or numpy.ndarray. |
required |
scorer
|
Union[str, Callable[[str, str], float]]
|
The similarity algorithm to use. Options include:
- |
'levenshtein'
|
mode
|
str
|
Matching mode.
- |
'full'
|
process
|
bool
|
If True, applies normalization (lowercasing, punctuation removal) before matching. |
True
|
threshold
|
float
|
Minimum score (0.0 to 1.0) for a candidate to be included in the results. |
0.0
|
top_n
|
int
|
Maximum number of results to return. Use -1 for all matches. |
-1
|
weights
|
Dict[str, float]
|
Dictionary of weights for the |
None
|
Returns:
| Type | Description |
|---|---|
List[Tuple[str, float]]
|
A list of tuples containing (matched_string, similarity_score), |
List[Tuple[str, float]]
|
sorted by score in descending order. |
Examples:
>>> import fuzzybunny
>>> fuzzybunny.rank("apple", ["apple pie", "banana", "apricot"])
[('apple pie', 0.5555555555555556), ('apricot', 0.42857142857142855)]
>>> # Partial matching
>>> fuzzybunny.rank("apple", ["apple pie"], mode="partial")
[('apple pie', 1.0)]
Source code in src/fuzzybunny/__init__.py
fuzzybunny.batch_match(queries, candidates, scorer='levenshtein', mode='full', process=True, threshold=0.0, top_n=-1, weights=None)
Efficiently matches multiple queries against a collection of candidates.
Utilizes multi-threading (OpenMP) and internal string normalization caching to provide high-performance batch processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries
|
QueriesType
|
A collection of strings to match. |
required |
candidates
|
CandidatesType
|
A collection of target strings to search through. |
required |
scorer
|
Union[str, Callable[[str, str], float]]
|
See |
'levenshtein'
|
mode
|
str
|
See |
'full'
|
process
|
bool
|
See |
True
|
threshold
|
float
|
See |
0.0
|
top_n
|
int
|
Maximum number of results per query. |
-1
|
weights
|
Dict[str, float]
|
See |
None
|
Returns:
| Type | Description |
|---|---|
List[List[Tuple[str, float]]]
|
A list of result lists, where each inner list corresponds to a query. |
Note
This function is significantly faster than calling rank in a loop
for large datasets due to parallelization and reduced overhead.
Source code in src/fuzzybunny/__init__.py
Utilities
fuzzybunny.benchmark.benchmark(query, candidates, scorers=None, n_runs=5)
Benchmark different scorers on a given query and set of candidates. Returns a dictionary with timing results.
Source code in src/fuzzybunny/benchmark.py
fuzzybunny.benchmark.benchmark_batch(queries, candidates, scorer='levenshtein', n_runs=3)
Benchmark batch_match performance.