Advanced Scoring and Performance
FuzzyBunny provides several advanced tools for performance and custom matching needs.
WRatio (Weighted Similarity Ratio)
WRatio is the recommended general-purpose matcher. It combines several algorithms using heuristics to provide a more "intuitive" similarity score.
import fuzzybunny
# Matches well even with different word orders and lengths
score = fuzzybunny.wratio("fuzzy bunny", "bunny fuzzy!!!")
# 1.0 (Token sort/set will match and WRatio will pick the best)
High-Performance Batch Matching
When comparing many queries against a common candidate set, batch_match is the most efficient choice.
It provides two major optimizations over calling rank in a loop:
1. Multi-threading (OpenMP): Automatically distributes work across all CPU cores.
2. Normalization Caching: Normalizes the candidate set only once per batch.
import fuzzybunny
queries = ["apple", "banana", "cherry"]
candidates = ["apple pie", "banana bread", "cherry tart", "apple turnover"]
# Parallel matching
results = fuzzybunny.batch_match(queries, candidates, top_n=2)
# Results is a list where each element matches the corresponding query
for i, res in enumerate(results):
print(f"Results for {queries[i]}: {res}")
Custom Python Scorers
You can pass a custom Python function as the scorer argument.
Performance
Custom Python scorers are significantly slower than C++ scorers because they must acquire the Python Global Interpreter Lock (GIL) for every comparison.
def my_custom_scorer(s1, s2):
# Your custom logic here
# Return a score between 0.0 and 1.0
return 1.0 if s1[0] == s2[0] else 0.0
results = fuzzybunny.rank("apple", ["apricot", "banana"], scorer=my_custom_scorer)
Integration with Pandas and NumPy
FuzzyBunny integrates directly with common data science tools: