Jaro-Winkler Similarity Calculator

String 1

String 2

Case-insensitive Trim whitespace Winkler scaling factor p Max prefix length ℓ Boost threshold

Similarity scores

How Jaro got there

…

Character match map

Jaro counts a character as matching if the same character appears in the other string within a window of ⌊max(|s₁|,|s₂|)/2⌋ − 1 positions. Matches kept in the same order are good; matches that change order count as half a "transposition." Below: prefix-bonus chars share a position-0 alignment, matched chars are in-window, transposed chars are out of order, grey is unmatched.

About these metrics

The Jaro formula

For strings s₁, s₂ with m matches and t transpositions (each pair-out-of-order counts as one transposition):

jaro = ⅓ · ( m/|s₁|  +  m/|s₂|  +  (m − t/2)/m )    if m > 0
     = 0                                                if m = 0

The match window is ⌊max(|s₁|,|s₂|)/2⌋ − 1. Inside that window, a character in s₁ can match the same character in s₂ only once.

The Winkler boost

Winkler observed that human-entered names get the start of the surname right far more often than the end. The Winkler variant adds a bonus proportional to the length of the common prefix (up to ℓ chars, conventionally 4):

jaro_winkler = jaro + ℓ_common · p · (1 − jaro)

where p is the scaling factor (≤ 0.25 to keep the score in [0, 1]; the conventional value is p = 0.1). Some implementations only apply the boost if jaro ≥ threshold (often 0.7) — toggle the threshold above.

When to use Jaro-Winkler vs the alternatives

Metric	Best for	Watch out for
Jaro	Symmetric short-string similarity (≤ ~10 chars).	Ignores order beyond a small window; long strings drift toward 1.
Jaro-Winkler	Person names, surnames, business names. Heavily used in U.S. Census record linkage.	Same-prefix typos are scored more leniently than same-suffix typos.
Levenshtein	Free-text typo correction, autocorrect, DNA. Counts insert/delete/sub operations.	Score grows with length — normalise by max length.
Damerau-Levenshtein	Like Levenshtein but counts a swap as one op — better for human typing.	Slightly slower.
Dice on bigrams	Document-length fuzzy match, code clone detection.	Less sensitive to character order than Jaro.

Reference scores (case-sensitive, p = 0.1, ℓ = 4)

jaro("MARTHA", "MARHTA")   = 0.9444…  ;  winkler = 0.9611…
jaro("DWAYNE", "DUANE")    = 0.8222…  ;  winkler = 0.8400…
jaro("DIXON",  "DICKSONX") = 0.7666…  ;  winkler = 0.8133…