The algorithm for finding palindromic substrings described in my previous blog post has two major problems. First, if I want to find the palindromes in a string of substantial length, it will take a normal computer many years to calculate the result. I didn't mention the second problem explicitly in my previous blog post, but it is pretty obvious. The number of palindromic substrings returned by the algorithm is at least as large as the length of the string, since all single letters are palindromes, and there might be many more. If the string contains a palindrome of length $1,000,000$, then the algorithm will also return the palindrome of length $999,998$ obtained by removing the first and last element of the million letter palindrome, the palindromes of length $999,996$, $999,994$, and so on. This implies that the total number of palindromes occurring in a long string is huge, and I can easily drown in the palindromes returned.

Of the series of palindromes of length $1,000,000$, $999,998$, $999,996$ etc, all palindromes have the same *center*, and all shorter palindromes can be derived from the longest palindrome by removing equally many characters from the front and from the end. In the string "yabadabadoo", the palindromes around the center at the second occurrence of a 'b', are "b", "aba", and "dabad". "dabad" is the *maximal* palindrome around this center, since its extension "adabado" is not a palindrome. Maximality does not imply it is the longest palindromic substring, it only is the maximal palindrome around a particular center in the string. For example, in "yabadabadoo", "abadaba" is a longer palindrome, but with a different center. A center position is either on a letter, as in "dabad" on 'b', or in between two letters, as in "oo", where the center is in between the two o's. The string "yabadabadoo" has $23$ centers: one on each letter (of which there are eleven), one before each letter (another eleven), and one after the last letter of the string. If I assign $0$ to the center before the first letter, $1$ to the center on the first letter, $2$ to the center after the first letter, and so on, the maximal palindrome "dabad" has center $13$, and the maximal palindrome "abadaba" $9$. For a string of length $n$, there are $2n+1$ center positions in the string. Since we can derive all shorter palindromes around a center from the maximal palindrome around the center, it is sufficient to calculate the maximal palindromes around the centers in a string. So instead of calculating all palindromic substrings of a string, I calculate all maximal palindromic substrings of a string.

There are just as many maximal palindromes in a string as there are center positions, so there still are quite a few maximal palindromes. But the number of maximal palindromes might be substantially lower than the number of (not necessarily maximal) palindromic substrings in a string. Take for example the string of length $n$ just containing the character 'a'. The number of maximal palindromes is equal to the number of center positions in the string, $2n+1$. The total number of palindromes in the string is the number of substrings of the string, since all substrings consist of only 'a's, and hence all are palindromes. Since `substrings`

is defined as `concat . map tails . inits`

, I can calculate the total number of substrings if I know how many initial and final substrings appear in a string of length $n$. Function `inits`

returns $n+1$ substrings, of length $0...n$, respectively. Similarly, for a string of length $n$, function `tails`

returns $n+1$ substrings, also of length $0...n$. So the total number of substrings is:
\[ \sum\limits_{i=0}^{n} (i+1) = \frac{n(n+1)}{2} + n+1 = \frac{(n+2)(n+1)}{2} = \frac{n^2}{2}+\frac{3}{2}n+1\]
So in the string "aaaaaaaaaa" of length $10$, there are $21$ maximal palindromes, and $66$ palindromic substrings.

If I know the center and the length of a palindrome, I can recover the palindrome if I have the original string. For example, the maximal palindrome of length $5$ around center $13$ in the string "yabadabadoo" is the string "dabad", and the maximal palindrome of length $7$ around center $9$ is the string "abadaba". So to find maximal palindromes, it suffices to find their center and length.

Given a `string`

, a `center`

within the string, and a `lengthp`

denoting the length of the maximal palindrome in the `string`

around the `center`

, I can use a Haskell program to check if the length is indeed the length of the maximal palindrome around the center in the string. To determine the substring denoted by a center and a length, I first split the string into the initial part of length `div (center-lengthp) 2`

, the part before the maximal palindrome, and the following `lengthp`

letters. Function `div`

divides its first argument by its second argument, throwing away the remainder. Function `splitAt`

takes a positive integer $n$ and a list, and splits the list into two lists. The first list contains the first $n$ letters, and the second list contains the rest of the letters. The `lengthp`

letters after `before`

should form a palindrome. The letters around it, the `last`

letter of the part of the string before it and the `head`

of the string after it, should be different to make it maximal. Functions `last`

and `head`

are predefined functions which return the last or head element of a list, respectively. The palindrome is maximal too if at least one of the strings before or after it is empty.

> maximalPalindrome string center lengthp = > let (before,rest) = splitAt > (div (center-lengthp) 2) > string > (p,after) = splitAt lengthp rest > in length p == lengthp > && odd center == odd lengthp > && palindrome p > && ( null before > || null after > || last before /= head after > )The function

`maximalPalindrome`

uses two sanity checks. It first checks that there indeed exists a substring of length `lengthp`

around `center`

by means of `length p == lengthp`

, where function `length`

is a predefined function that returns the length of a list. The other sanity check checks that the maximal palindrome around an even center has even length, and a maximal palindrome around an odd center has odd length, by means of `odd center == odd lengthp`

, which equals `True`

only if both `center`

and `lengthp`

are odd or both are even. Function `odd`

is a predefined function that determines whether or not a number is odd. The logic or operator `||`

returns `True`

if one of its arguments is `True`

. If its left-hand side argument is `True`

it doesn't evaluate its right-hand argument, which is just as well for the above definition, since evaluating `last before`

if `before`

is empty (`null before`

) would lead to a undefined value.
In the next blog post I will show how to calculate the length of all maximal palindromes in a string.

Thanks for ones marvelous posting! I seriously enjoyed reading

ReplyDeleteit, you could be a great author. I will make sure

to bookmark your blog and will often come back in the future.

I want to encourage you to continue your great posts, have

a nice holiday weekend!