Python – The longest public substring matrix

The longest public substring matrix… here is a solution to the problem.

The longest public substring matrix

I’m new to Python and am struggling to create a matrix representing the longest public substring. I’m looking for results like this: LCS matrix

So far, here’s my code.

def compute_lcs(X, Y):
    m = len(X)
    n = len(Y)
# An (m) times (n) matrix
    matrix = [[0] * (n) for _ in range(m)]
    for i in range(1, m):
        for j in range(1, n):
            if X[i] == Y[j]: 
                if i == 0 or j == 0:
                    matrix[i][j] = 1
            else:
                matrix[i][j] = matrix[i-1][j-1]+1
        else:
            matrix[i][j] = 0
    return matrix

b = compute_lcs('AACTGGCAG','TACGCTGGA')
for y in b:
    print (y)

Current Output:
[0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 1, 1, 1, 1, 1, 1, 0]
[0, 1, 0, 2, 0, 2, 2, 2, 0]
[0, 1, 2, 1, 3, 0, 3, 3, 0]
[0, 1, 2, 0, 2, 4, 0, 0, 0]
[0, 1, 2, 0, 1, 3, 0, 0, 0]
[0, 1, 0, 3, 0, 2, 4, 1, 0]
[0, 0, 2, 1, 4, 1, 3, 5, 0]
[0, 1, 1, 0, 2, 5, 0, 0, 0]

Expected Output:
[0, 0, 0, 1, 0, 0, 0, 0, 0]
[1, 1, 0, 0, 0, 0, 0, 1, 0]
[0, 0, 2, 0, 0, 0, 0, 1, 0]
[0, 0, 0, 0, 1, 1, 0, 0, 1]
[0, 0, 1, 0, 0, 0, 2, 0, 0]
[0, 0, 0, 2, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 3, 1, 0, 0, 0]
[0, 0, 0, 0, 1, 4, 0, 0, 1]
[1, 1, 0, 0, 0, 0, 0, 1, 0]

But my result is a matrix that shows incorrect values. When I manually calculate the matrix, the correct output looks like this: Correct output.

Thank you all

Solution

First, for clarity, < a href="https://en.wikipedia.org/wiki/Longest_common_subsequence_problem" rel="noreferrer noopener nofollow" > longest common subsequence problem with longest common substring is not the same problem. It is the latter that you are trying to address; It is better not to confuse the two.

Second, your else branch is not aligned under the appropriate if.
Whenever the string matches X[i] == Y[j], if the index i or j is 0, we set the matrix element to 1, because i-1 or j-1 gives -1 at 0 (unfortunately, this is also the index of the last item in Python) which is not what we want, otherwise we increase the higher index i, j > 1.

Third, your loop should start at 0 instead of 1, because we start with the first character indexed 0 in the string:

def compute_lcs(X, Y):
   m = len(X)
   n = len(Y)
   # An (m) times (n) matrix
   matrix = [[0] * n for _ in range(m)]
   for i in range(0, m):
      for j in range(0, n):
          if X[i] == Y[j]: 
              if i == 0 or j == 0:
                  matrix[i][j] = 1
              else:
                  matrix[i][j] = matrix[i-1][j-1]+1
          else:
              matrix[i][j] = 0
  return matrix

To get the exact matrix shown in the expected output, you should swap the order or transpose matrix of the parameters before printing. Note, however, that these are not required (swap or transpose) and are only for formatting purposes:

b = compute_lcs('TACGCTGGA', 'AACTGGCAG')
for y in b:
    print (y)


[0, 0, 0, 1, 0, 0, 0, 0, 0]
[1, 1, 0, 0, 0, 0, 0, 1, 0]
[0, 0, 2, 0, 0, 0, 1, 0, 0]
[0, 0, 0, 0, 1, 1, 0, 0, 1]
[0, 0, <b>1</b>, 0, 0, 0, 2, 0, 0]
[0, 0, 0, <b>2</b>, 0, 0, 0, 0, 0]
[0, 0, 0, 0, <b>3</b>, 1, 0, 0, 1]
[0, 0, 0, 0, 1, <b>4</b>, 0, 0, 1]
[1, 1, 0, 0, 0, 0, 0, 1, 0]

Related Problems and Solutions