← All Tools

Jaro-Winkler Similarity Calculator

Compute Matthew Jaro's Jaro similarity (1989) and William Winkler's Jaro-Winkler refinement (1990) — the string-matching metrics designed for short proper nouns, typo correction, and record linkage. The Winkler tweak gives extra credit for sharing a common prefix (up to 4 chars), which is why it dominates name-matching in census and address-cleaning systems.

Similarity scores

How Jaro got there

Character match map

Jaro counts a character as matching if the same character appears in the other string within a window of ⌊max(|s₁|,|s₂|)/2⌋ − 1 positions. Matches kept in the same order are good; matches that change order count as half a "transposition." Below: prefix-bonus chars share a position-0 alignment, matched chars are in-window, transposed chars are out of order, grey is unmatched.

About these metrics

The Jaro formula

For strings s₁, s₂ with m matches and t transpositions (each pair-out-of-order counts as one transposition):

jaro = ⅓ · ( m/|s₁|  +  m/|s₂|  +  (m − t/2)/m )    if m > 0
     = 0                                                if m = 0

The match window is ⌊max(|s₁|,|s₂|)/2⌋ − 1. Inside that window, a character in s₁ can match the same character in s₂ only once.

The Winkler boost

Winkler observed that human-entered names get the start of the surname right far more often than the end. The Winkler variant adds a bonus proportional to the length of the common prefix (up to chars, conventionally 4):

jaro_winkler = jaro + ℓ_common · p · (1 − jaro)

where p is the scaling factor (≤ 0.25 to keep the score in [0, 1]; the conventional value is p = 0.1). Some implementations only apply the boost if jaro ≥ threshold (often 0.7) — toggle the threshold above.

When to use Jaro-Winkler vs the alternatives
MetricBest forWatch out for
JaroSymmetric short-string similarity (≤ ~10 chars).Ignores order beyond a small window; long strings drift toward 1.
Jaro-WinklerPerson names, surnames, business names. Heavily used in U.S. Census record linkage.Same-prefix typos are scored more leniently than same-suffix typos.
LevenshteinFree-text typo correction, autocorrect, DNA. Counts insert/delete/sub operations.Score grows with length — normalise by max length.
Damerau-LevenshteinLike Levenshtein but counts a swap as one op — better for human typing.Slightly slower.
Dice on bigramsDocument-length fuzzy match, code clone detection.Less sensitive to character order than Jaro.
Reference scores (case-sensitive, p = 0.1, ℓ = 4)
jaro("MARTHA", "MARHTA")   = 0.9444…  ;  winkler = 0.9611…
jaro("DWAYNE", "DUANE")    = 0.8222…  ;  winkler = 0.8400…
jaro("DIXON",  "DICKSONX") = 0.7666…  ;  winkler = 0.8133…