Skip to content

jlegare/LevenshteinToolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LevenshteinToolkit

Build Status Coverage Status

This is a sandbox of Levenshtein distance implementations.

using LevenshteinToolkit

distance_matrix("Saturday", "Sunday")
# ==> 3

distance_row("Saturday", "Sunday")
# ==> 3

The distance_matrix function will compute the Levenshtein distance between two strings by building a matrix of distances between the prefixes. The costs associated with deletion, insertion, and substitution can be tailored using the keyword arguments. The distance_row function is similar, but only keeps two matrix rows, thereby reducing the memory footprint.

The nfa function builds a non-deterministic finite automaton (NFA) that recognizes the set of words that are within a given Levenshtein distance from a word:

nfa_automaton = nfa("food", 1)

The draw function can be used to generate a GraphViz input file:

open(f -> draw(f, nfa_automaton), "nfa.dot", "w")

The resulting file can be rendered using the dot command-line tool:

dot -Tsvg nfa.dot -o nfa.svg

The dfa function builds a deterministic finite automaton (DFA) from an NFA.

dfa_automaton = dfa(nfa_automaton)

As for the case of NFAs, the draw function can be used to generate a GraphViz input file.

The check function can be used to run a DFA against a string:

check(dfa_automaton, "food")
# ==> true
check(dfa_automaton, "xfood")
# ==> true ... we're one edit away.
check(dfa_automaton, "xfoodx")
# ==> false ... we're two edits away.