String Manipulation

TL;DR

Character frequency counting with hash maps enables anagram detection and character-based operations in O(n) time. String validation checks format against rules systematically. Pattern matching locates substrings efficiently. String transformation rearranges or modifies content. Two pointers work on strings like arrays. Understanding ASCII values, character encoding, and efficient string operations is crucial for string algorithm problems.

Core Concepts

String Parsing and Validation

Parsing breaks strings into components following specific rules. Example: parsing "123-456-7890" into phone number components.

Validation checks if string satisfies constraints:

Valid parentheses: Balanced and properly nested
Valid number: Correct format (digits, decimal point, sign)
Valid email: Username, @, domain, extension

Validation typically uses character-by-character state checking or regular expressions.

Anagram Detection

Two strings are anagrams if they contain same characters with same frequencies, possibly rearranged.

Approach 1 - Sorting: Sort both strings, compare. O(n log n) time.

Approach 2 - Character frequency: Count character frequencies in both strings, compare counts. O(n) time, O(1) space (26 letters max).

Character frequency hash map: Tracks count of each character. Increment for first string, decrement for second. All zero at end = anagrams.

Pattern Matching

Substring search: Locate exact pattern in text.

Naive: Check every position O(nm) where n is text length, m is pattern length
KMP (Knuth-Morris-Pratt): Use failure function to avoid redundant comparisons O(n+m)
Boyer-Moore: Skip characters based on pattern O(n/m) average case

Regular expressions: Flexible pattern matching using regex syntax. More powerful but slower than optimized string algorithms.

String Transformation

Reversal: Reverse entire string or parts. Two pointers from ends, swap inward.

Case conversion: Transform upper/lower case.

Character replacement: Substitute characters matching condition.

String compression: Encode repeated sequences: "aaabba" becomes "a3b2a1".

Character Frequency Operations

Counting characters: Hash map stores count of each character.

Unique characters: Set stores unique characters, size is unique count.

Most frequent character: Track max count while iterating.

Character sorting: Sort string by character frequency or alphabetically.

Key Algorithms and Techniques

Anagram Detection

Hash map approach: count characters in both strings, compare. O(n) time, O(1) space.

Group Anagrams

Hash map where key is sorted characters, value is list of anagrams. O(nk log k) where k is average string length.

Longest Palindrome

Character frequency approach: if character count is even, use all. If odd, use all but one. One odd character can be in middle.

Valid Parentheses

Stack tracks open parentheses. For each close, check matches top of stack. O(n) time, O(n) space.

String Compression

Iterate tracking consecutive characters, build compressed string. O(n) time, O(n) space.

Practical Example

Python
Java
TypeScript

# Anagram detection
def is_anagram(s1, s2):
    if len(s1) != len(s2):
        return False
    char_count = {}
    for char in s1:
        char_count[char] = char_count.get(char, 0) + 1
    for char in s2:
        if char not in char_count:
            return False
        char_count[char] -= 1
        if char_count[char] < 0:
            return False
    return all(count == 0 for count in char_count.values())

# Group anagrams
def group_anagrams(strings):
    anagram_map = {}
    for s in strings:
        key = ''.join(sorted(s))
        if key not in anagram_map:
            anagram_map[key] = []
        anagram_map[key].append(s)
    return list(anagram_map.values())

# Valid parentheses
def is_valid_parentheses(s):
    stack = []
    pairs = {'(': ')', '[': ']', '{': '}'}
    for char in s:
        if char in pairs:
            stack.append(char)
        elif char in pairs.values():
            if not stack or pairs[stack.pop()] != char:
                return False
    return len(stack) == 0

# String compression
def compress_string(s):
    if not s:
        return ""
    compressed = []
    count = 1
    for i in range(1, len(s)):
        if s[i] == s[i - 1]:
            count += 1
        else:
            compressed.append(s[i - 1] + str(count))
            count = 1
    compressed.append(s[-1] + str(count))
    return ''.join(compressed)

# Valid number
def is_valid_number(s):
    s = s.strip()
    if not s or (s[0] not in '+-0123456789'):
        return False
    has_dot = has_e = False
    for i, char in enumerate(s):
        if char in '+-':
            if i != 0:
                return False
        elif char == '.':
            if has_dot or has_e:
                return False
            has_dot = True
        elif char == 'e' or char == 'E':
            if has_e or i == 0 or i == len(s) - 1:
                return False
            has_e = True
        elif not char.isdigit():
            return False
    return True

# Test cases
print(is_anagram("listen", "silent"))  # True
print(group_anagrams(["eat", "tea", "ate", "bat", "tab"]))
print(is_valid_parentheses("()[]{()}"))  # True
print(compress_string("aaabbacc"))  # a3b2a1c2
print(is_valid_number("3.14"))  # True

import java.util.*;

public class StringManipulation {
    // Anagram detection
    static boolean isAnagram(String s1, String s2) {
        if (s1.length() != s2.length()) return false;
        Map<Character, Integer> charCount = new HashMap<>();
        for (char c : s1.toCharArray()) {
            charCount.put(c, charCount.getOrDefault(c, 0) + 1);
        }
        for (char c : s2.toCharArray()) {
            if (!charCount.containsKey(c)) return false;
            charCount.put(c, charCount.get(c) - 1);
            if (charCount.get(c) < 0) return false;
        }
        return charCount.values().stream().allMatch(count -> count == 0);
    }

    // Group anagrams
    static List<List<String>> groupAnagrams(String[] strings) {
        Map<String, List<String>> anagramMap = new HashMap<>();
        for (String s : strings) {
            char[] chars = s.toCharArray();
            Arrays.sort(chars);
            String key = new String(chars);
            anagramMap.computeIfAbsent(key, k -> new ArrayList<>()).add(s);
        }
        return new ArrayList<>(anagramMap.values());
    }

    // Valid parentheses
    static boolean isValidParentheses(String s) {
        Stack<Character> stack = new Stack<>();
        Map<Character, Character> pairs = new HashMap<>();
        pairs.put('(', ')');
        pairs.put('[', ']');
        pairs.put('{', '}');

        for (char c : s.toCharArray()) {
            if (pairs.containsKey(c)) {
                stack.push(c);
            } else if (stack.isEmpty() || pairs.get(stack.pop()) != c) {
                return false;
            }
        }
        return stack.isEmpty();
    }

    // String compression
    static String compressString(String s) {
        if (s == null || s.isEmpty()) return "";
        StringBuilder compressed = new StringBuilder();
        int count = 1;
        for (int i = 1; i < s.length(); i++) {
            if (s.charAt(i) == s.charAt(i - 1)) {
                count++;
            } else {
                compressed.append(s.charAt(i - 1)).append(count);
                count = 1;
            }
        }
        compressed.append(s.charAt(s.length() - 1)).append(count);
        return compressed.toString();
    }

    public static void main(String[] args) {
        System.out.println(isAnagram("listen", "silent"));
        System.out.println(isValidParentheses("()[]{()}"));
        System.out.println(compressString("aaabbacc"));
    }
}

// Anagram detection
function isAnagram(s1: string, s2: string): boolean {
    if (s1.length !== s2.length) return false;
    const charCount = new Map<string, number>();
    for (const char of s1) {
        charCount.set(char, (charCount.get(char) || 0) + 1);
    }
    for (const char of s2) {
        if (!charCount.has(char)) return false;
        charCount.set(char, charCount.get(char)! - 1);
        if (charCount.get(char)! < 0) return false;
    }
    return Array.from(charCount.values()).every(count => count === 0);
}

// Group anagrams
function groupAnagrams(strings: string[]): string[][] {
    const anagramMap = new Map<string, string[]>();
    for (const s of strings) {
        const key = s.split('').sort().join('');
        if (!anagramMap.has(key)) {
            anagramMap.set(key, []);
        }
        anagramMap.get(key)!.push(s);
    }
    return Array.from(anagramMap.values());
}

// Valid parentheses
function isValidParentheses(s: string): boolean {
    const stack: string[] = [];
    const pairs: { [key: string]: string } = {
        '(': ')',
        '[': ']',
        '{': '}'
    };

    for (const char of s) {
        if (char in pairs) {
            stack.push(char);
        } else {
            if (stack.length === 0 || pairs[stack.pop()!] !== char) {
                return false;
            }
        }
    }
    return stack.length === 0;
}

// String compression
function compressString(s: string): string {
    if (!s) return "";
    const compressed: string[] = [];
    let count = 1;
    for (let i = 1; i < s.length; i++) {
        if (s[i] === s[i - 1]) {
            count++;
        } else {
            compressed.push(s[i - 1] + count);
            count = 1;
        }
    }
    compressed.push(s[s.length - 1] + count);
    return compressed.join('');
}

// Test cases
console.log(isAnagram("listen", "silent"));
console.log(groupAnagrams(["eat", "tea", "ate", "bat", "tab"]));
console.log(isValidParentheses("()[]{()}"));
console.log(compressString("aaabbacc"));

Common Pitfalls

warning

Not handling empty strings: Edge case of empty input breaks many algorithms. Always check length before processing.

Incorrect anagram check: Counting characters in only one direction misses problems. Count both strings and compare counts, or verify all counts zero at end.

Parentheses validation errors: Forgetting to check stack not empty before popping causes crashes. Check both that pair types match.

Not normalizing input: Case sensitivity matters. Normalize to lowercase/uppercase before comparison unless case matters for problem.

Index out of bounds: String operations with indices need careful bounds checking. Off-by-one errors are common.

Inefficient repeated concatenation: String concatenation in loop is O(n²) in many languages. Use StringBuilder/array and join instead.

Self-Check

How does character frequency hash map detect anagrams? Why is it O(n) instead of O(n log n)?
Explain the valid parentheses algorithm. Why must you check stack isn't empty before popping?
How would you find longest substring without repeating characters? What data structure tracks current window?

One Takeaway

info

String problems often boil down to character frequency counting and careful index management. Mastering hash maps for character tracking, two pointers for traversal, and stack for balanced delimiters solves most string algorithm problems efficiently. Always consider edge cases like empty strings, special characters, and case sensitivity.

References

Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to Algorithms (3rd ed.). MIT Press.
McDowell, G. L. (2015). Cracking the Coding Interview (6th ed.). CareerCup.

String Manipulation

TL;DR​

Core Concepts​

String Parsing and Validation​

Anagram Detection​

Pattern Matching​

String Transformation​

Character Frequency Operations​

Key Algorithms and Techniques​

Anagram Detection​

Group Anagrams​

Longest Palindrome​

Valid Parentheses​

String Compression​

Practical Example​

Common Pitfalls​

Self-Check​

One Takeaway​

References​

TL;DR

Core Concepts

String Parsing and Validation

Anagram Detection

Pattern Matching

String Transformation

Character Frequency Operations

Key Algorithms and Techniques

Anagram Detection

Group Anagrams

Longest Palindrome

Valid Parentheses

String Compression

Practical Example

Common Pitfalls

Self-Check

One Takeaway

References