Basic Usage
FuzzyBunny provides a simple and intuitive API for fuzzy string matching.
Individual Scorers
The library offers several algorithms to compare strings:
import fuzzybunny
# Levenshtein Distance
score = fuzzybunny.levenshtein("kitten", "sitting")
# 0.5714...
# Token Sort Ratio
# Good for strings with the same words but in different orders
score = fuzzybunny.token_sort("apple banana", "banana apple")
# 1.0
# Jaccard Similarity
# Good for comparing sets of tokens
score = fuzzybunny.jaccard("apple banana cherry", "banana apple")
# 0.666...
Ranking Candidates
To find the best matches from a list of strings, use the rank function:
candidates = ["apple pie", "banana bread", "cherry tart", "apple turnover"]
# Find top 2 matches for "apple"
results = fuzzybunny.rank("apple", candidates, top_n=2)
# [('apple pie', 0.55), ('apple turnover', 0.35)]
Partial Matching
If you want to find if a query exists as a substring of a candidate, use mode="partial":
# Standard rank (full match)
res_full = fuzzybunny.rank("apple", ["apple pie"], mode="full")
# Score will be ~0.55
# Partial rank (substring match)
res_partial = fuzzybunny.rank("apple", ["apple pie"], mode="partial")
# Score will be 1.0 because "apple" is exactly in "apple pie"
Normalization
By default, FuzzyBunny normalizes strings by lowercasing and removing punctuation. You can disable this by passing process=False: