minimum deletion

🏠

This is a dynamic programming question, very similar to levenshtein distance. Given two strings, e.g frog and dog, determine how many characters need to be deleted for the words to match.

 1from functools import lru_cache
 2
 3
 4@lru_cache(None)
 5def dist(a, b):
 6    if a == "" or b == "":
 7        return len(b) or len(a)
 8    if a[-1] == b[-1]:
 9        return dist(a[:-1], b[:-1])
10    return min(dist(a[:-1], b), dist(a, b[:-1])) + 1
11
12
13print(dist("dog", "frog"))
 1"""
 2DP, with deletions until min distance is found
 3Time: O(M*N) 
 4Space: O(mn)
 5O(1)
 6
 7   '' H E A T
 8''  0 1 2 3 4
 9H   1 0 1 0 0
10I   2 0 0 0 0 
11T   3 0 0 0 
12
13"""
14
15def deletion_distance(a, b):
16  def dp(a, b):
17    if (a, b) in mem:
18      return mem[(a, b)]
19    if a == '' or b == '':
20      mem[(a, b)] = len(a) or len(b)
21      return mem[(a, b)]
22    if a[-1] == b[-1]:
23      mem[(a, b)] = dp(a[:-1], b[:-1])
24      return mem[(a, b)]
25    mem[(a, b)] = min(dp(a, b[:-1]), dp(a[:-1], b)) + 1
26    return mem[(a, b)]
27  mem = {}  
28  return dp(a, b)
29
30
31
32
33"""
34Deletion Distance
35
36The deletion distance of two strings is the minimum number of characters you need to delete in the two strings in order to get the same string. For instance, the deletion distance between "heat" and "hit" is 3:
37By deleting 'e' and 'a' in "heat", and 'i' in "hit", we get the string "ht" in both cases.
38We cannot get the same string from both strings by deleting 2 letters or fewer.
39
40Given the strings str1 and str2, write an efficient function deletionDistance that returns the deletion distance between them. Explain how your function works, and analyze its time and space complexities.
41
42STR1 = HEAT
43STR2 = HIT
44ANS - 3
45
46   '' H E A T
47''  0 1 2 3 4
48H   1 0 1 2 3
49I   2 1 2 3 4
50T   3 2 3 4 3
51
52base case:
53empty string - don't delete - 0
54
55if chars match:
56dp[i][j] = dp[i-1][j-1]
57
58if chars mismatch:
59dp[i][j] = min(dp[i][j-1], dp[i-1][j]) + 1
60"""
61
62
63def deletion_distance(str1, str2):
64    dp = [[0 for _ in range(len(str2)+1)] for _ in range(len(str1)+1)]
65
66    for i in range(len(str1)+1):
67        dp[i][0] = i
68
69    for j in range(len(str2)+1):
70        dp[0][j] = j
71
72    for i in range(1, len(str1)+1):
73        for j in range(1, len(str2)+1):
74            if (str1[i-1] == str2[j-1]):
75                dp[i][j] = dp[i-1][j-1]
76            else:
77                dp[i][j] = min(dp[i][j-1], dp[i-1][j]) + 1
78
79    return dp[len(str1)][len(str2)]

See also:

levenshtein distance