String Algorithms in C: Naive, KMP, Rabin-Karp, Trie & Suffix Array/Tree with Code Examples

1. What is pattern matching in strings?

Pattern matching involves finding all occurrences (or any occurrence) of a pattern string within a text string. It’s widely used in text processing, search engines, and bioinformatics.

2. What is the Naive Pattern Matching algorithm?

The Naive algorithm checks for the pattern at each position in the text by comparing characters one by one, sliding the pattern over the text.

3. Can you give an example of Naive Pattern Matching in C?

Example: Finding all occurrences of pattern "AABA" in text "AABAACAADAABAAABAA".

#include <stdio.h>
#include <string.h>

void naivePatternMatch(char* text, char* pattern) {
    int n = strlen(text), m = strlen(pattern);
    if (m == 0 || n < m) {
        printf("Invalid input\n");
        return;
    }
    for (int i = 0; i <= n - m; i++) {
        int j;
        for (j = 0; j < m; j++) {
            if (text[i + j] != pattern[j]) break;
        }
        if (j == m) printf("Pattern found at index %d\n", i);
    }
}

int main() {
    char text[] = "AABAACAADAABAAABAA";
    char pattern[] = "AABA";
    naivePatternMatch(text, pattern); // Output: Pattern found at index 0, 9, 13
    return 0;
}
      

4. What is the time complexity of Naive Pattern Matching?

Time Complexity: O(n * m) (n = text length, m = pattern length).

Space Complexity: O(1).

5. What are the advantages and disadvantages of Naive Pattern Matching?

6. What is the KMP Algorithm?

The Knuth-Morris-Pratt (KMP) algorithm is an efficient pattern matching algorithm that uses a longest prefix-suffix (LPS) array to skip redundant comparisons, avoiding backtracking in the text.

7. How does KMP work?

8. Can you give an example of KMP in C?

Example: Finding all occurrences of pattern "AABA" in text "AABAACAADAABAAABAA".

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void computeLPS(char* pattern, int m, int* lps) {
    int len = 0, i = 1;
    lps[0] = 0; // First element is always 0
    while (i < m) {
        if (pattern[i] == pattern[len]) {
            len++;
            lps[i] = len;
            i++;
        } else {
            if (len != 0) len = lps[len - 1]; // Fall back to previous prefix
            else lps[i] = 0, i++;
        }
    }
}

void kmpPatternMatch(char* text, char* pattern) {
    int n = strlen(text), m = strlen(pattern);
    if (m == 0 || n < m) {
        printf("Invalid input\n");
        return;
    }
    int* lps = (int*)malloc(m * sizeof(int));
    if (!lps) {
        printf("Memory allocation failed\n");
        return;
    }
    computeLPS(pattern, m, lps);
    
    int i = 0, j = 0;
    while (i < n) {
        if (pattern[j] == text[i]) {
            i++;
            j++;
        }
        if (j == m) {
            printf("Pattern found at index %d\n", i - j);
            j = lps[j - 1];
        } else if (i < n && pattern[j] != text[i]) {
            if (j != 0) j = lps[j - 1];
            else i++;
        }
    }
    free(lps);
}

int main() {
    char text[] = "AABAACAADAABAAABAA";
    char pattern[] = "AABA";
    kmpPatternMatch(text, pattern); // Output: Pattern found at index 0, 9, 13
    return 0;
}
      

9. What is the time complexity of KMP?

Time Complexity: O(n + m) (n = text length, m = pattern length).

Space Complexity: O(m) (LPS array).

10. What are the advantages and disadvantages of KMP?

11. What is the Rabin-Karp Algorithm?

Rabin-Karp is a pattern matching algorithm that uses hashing to compare the pattern with substrings of the text, reducing character comparisons.

12. How does Rabin-Karp work?

13. Can you give an example of Rabin-Karp in C?

Example: Finding all occurrences of pattern "AABA" in text "AABAACAADAABAAABAA".

#include <stdio.h>
#include <string.h>
#define d 256 // Number of characters in input alphabet
#define q 101 // A prime number for modulo

void rabinKarp(char* text, char* pattern) {
    int n = strlen(text), m = strlen(pattern);
    if (m == 0 || n < m) {
        printf("Invalid input\n");
        return;
    }
    int i, j, p = 0, t = 0, h = 1;
    
    // Calculate h = pow(d, m-1) % q
    for (i = 0; i < m - 1; i++) h = (h * d) % q;
    
    // Calculate hash for pattern and first window
    for (i = 0; i < m; i++) {
        p = (d * p + pattern[i]) % q;
        t = (d * t + text[i]) % q;
    }
    
    // Slide pattern over text
    for (i = 0; i <= n - m; i++) {
        if (p == t) { // Check characters if hashes match
            for (j = 0; j < m; j++) {
                if (text[i + j] != pattern[j]) break;
            }
            if (j == m) printf("Pattern found at index %d\n", i);
        }
        if (i < n - m) { // Update hash for next window
            t = (d * (t - text[i] * h) + text[i + m]) % q;
            if (t < 0) t += q;
        }
    }
}

int main() {
    char text[] = "AABAACAADAABAAABAA";
    char pattern[] = "AABA";
    rabinKarp(text, pattern); // Output: Pattern found at index 0, 9, 13
    return 0;
}
      

14. What is the time complexity of Rabin-Karp?

Time Complexity:

Space Complexity: O(1).

15. What are the advantages and disadvantages of Rabin-Karp?

16. How do Naive, KMP, and Rabin-Karp compare?

17. What are Trie-based problems?

Trie-based problems involve using a Trie data structure to solve string-related tasks, leveraging its prefix-based structure for efficient operations like searching, autocomplete, or pattern matching.

18. What are common Trie-based problems?

19. Can you give an example of a Trie-based problem (Longest Common Prefix) in C?

Example: Finding the longest common prefix for words {"flower", "flow", "flight"}.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ALPHABET_SIZE 26

struct TrieNode {
    struct TrieNode* children[ALPHABET_SIZE];
    int count; // Number of words passing through node
    bool isEndOfWord;
};

struct TrieNode* createNode() {
    struct TrieNode* node = (struct TrieNode*)malloc(sizeof(struct TrieNode));
    if (!node) {
        printf("Memory allocation failed\n");
        exit(1);
    }
    node->count = 0;
    node->isEndOfWord = false;
    for (int i = 0; i < ALPHABET_SIZE; i++) node->children[i] = NULL;
    return node;
}

void insert(struct TrieNode* root, char* word) {
    struct TrieNode* curr = root;
    for (int i = 0; word[i]; i++) {
        int index = word[i] - 'a';
        if (!curr->children[index]) curr->children[index] = createNode();
        curr = curr->children[index];
        curr->count++;
    }
    curr->isEndOfWord = true;
}

void freeTrie(struct TrieNode* root) {
    if (!root) return;
    for (int i = 0; i < ALPHABET_SIZE; i++) {
        freeTrie(root->children[i]);
    }
    free(root);
}

void longestCommonPrefix(struct TrieNode* root, char* result, int n) {
    struct TrieNode* curr = root;
    int index = 0;
    while (curr) {
        int singleChild = -1;
        if (curr->count != n) break; // Stop if not all words share this prefix
        for (int i = 0; i < ALPHABET_SIZE; i++) {
            if (curr->children[i]) {
                if (singleChild == -1) singleChild = i;
                else break; // More than one child
            }
        }
        if (singleChild == -1) break;
        result[index++] = 'a' + singleChild;
        curr = curr->children[singleChild];
    }
    result[index] = '\0';
}

int main() {
    char* words[] = {"flower", "flow", "flight"};
    int n = 3;
    struct TrieNode* root = createNode();
    for (int i = 0; i < n; i++) insert(root, words[i]);
    char result[100];
    longestCommonPrefix(root, result, n);
    printf("Longest Common Prefix: %s\n", result); // Output: fl
    freeTrie(root); // Clean up memory
    return 0;
}
      

20. What is the time complexity of Trie-based operations?

21. What are the advantages of Tries for string problems?

22. What is a Suffix Array?

A Suffix Array is an array of integers representing the starting indices of all suffixes of a string, sorted in lexicographical order. It’s used for efficient string matching and substring queries.

23. How is a Suffix Array constructed?

24. What are the applications of Suffix Arrays?

25. What is the time and space complexity of Suffix Arrays?

26. What is a Suffix Tree?

A Suffix Tree is a compressed trie containing all suffixes of a string, where each path from the root to a leaf represents a suffix. Edges store substrings, reducing space usage.

27. How is a Suffix Tree constructed?

28. What are the applications of Suffix Trees?

29. What is the time and space complexity of Suffix Trees?

30. What are the advantages and disadvantages of Suffix Arrays vs. Suffix Trees?

31. Why are Suffix Arrays/Trees considered advanced?

They handle complex string problems efficiently, but their construction (e.g., Ukkonen’s algorithm) and applications (e.g., LCP array) require a deep understanding of string properties and optimization.