Set Operations
TL;DR
Sets efficiently store unique elements enabling O(1) lookups. Union combines all unique elements from multiple sets in O(m+n) time. Intersection finds common elements by iterating smaller set and checking membership in larger. Difference returns elements in first set but not second. Duplicate detection uses sets to track seen elements—first encounter goes to set, second encounter identifies duplicate. These fundamental operations power numerous algorithmic patterns and data structure problems.
Core Concepts
Union, Intersection, and Difference
Union (A ∪ B) combines all unique elements from both sets. Implementation: create new set, add all elements from A, then all from B (duplicates automatically eliminated). Time O(m+n) where m, n are set sizes. Result contains union of both.
Intersection (A ∩ B) finds elements in both sets. Optimize by iterating smaller set, checking existence in larger. Time O(min(m,n)). Result contains only common elements.
Difference (A - B) contains elements in A but not in B. Iterate A, exclude elements in B. Time O(m). Result asymmetric; A - B differs from B - A.
Symmetric difference (A ⊕ B) contains elements in either A or B but not both. Compute (A - B) ∪ (B - A). Time O(m+n). Mathematically (A ∪ B) - (A ∩ B) yields same result.
Subset and Superset Relationships
Subset (A ⊆ B) means all elements of A exist in B. Check by verifying every A element in B. O(|A|) time for membership checks.
Proper subset (A ⊂ B) is subset but A ≠ B. Check subset and |A| < |B|.
Superset (A ⊇ B) means A contains all B elements. Equivalent to B being subset of A.
Power set generates all 2^n possible subsets of n-element set. Recursive approach: power set of 3 builds from power set of 2 by either including or excluding 3.
Duplicate Detection and Unique Elements
Set-based duplicate detection tracks seen elements. First occurrence: add to set. Second occurrence: identified as duplicate. O(n) time, O(n) space.
XOR-based detection works for array containing one unique element and rest duplicates. XOR all elements; duplicates cancel, leaving unique. O(n) time, O(1) space but only works for specific cases.
Frequency map counts occurrences. Elements with count > 1 are duplicates. Enables counting duplicates and finding first duplicate.
Remove duplicates maintains insertion order using set to track seen, only keeping first occurrence.
Set-Based Filtering
Filter by membership removes elements not in reference set. Iterate collection, keep only elements in set.
Find missing elements when expecting range 1 to n. Create set from input, check range for missing values.
Common elements from multiple collections uses intersection repeatedly or checks membership in all sets.
Distinct elements from collection simply converts to set then back to collection.
Key Algorithms and Techniques
Union Algorithm
Create new set, add all elements from both sets. Duplicates automatically handled.
Intersection with Optimization
Iterate smaller set, check membership in larger. Early termination possible.
Difference Computation
Iterate first set, exclude elements in second set.
Duplicate Detection Loop
Single pass maintaining seen set; second encounter identifies duplicate.
Subset Verification
Check if all elements of potential subset exist in candidate superset.
Practical Example
- Python
- Java
- TypeScript
# Set operations
def set_union(a, b):
"""Union of two sets: A ∪ B."""
return set(a) | set(b)
def set_intersection(a, b):
"""Intersection of two sets: A ∩ B."""
return set(a) & set(b)
def set_difference(a, b):
"""Difference of two sets: A - B."""
return set(a) - set(b)
def symmetric_difference(a, b):
"""Symmetric difference: elements in either but not both."""
return set(a) ^ set(b)
# Subset and superset checks
def is_subset(a, b):
"""Check if A is subset of B."""
return set(a).issubset(set(b))
def is_superset(a, b):
"""Check if A is superset of B."""
return set(a).issuperset(set(b))
# Duplicate detection
def find_duplicates(arr):
"""Find all duplicate elements in array."""
seen = set()
duplicates = set()
for num in arr:
if num in seen:
duplicates.add(num)
seen.add(num)
return list(duplicates)
def first_duplicate(arr):
"""Find first duplicate by appearance order."""
seen = set()
for num in arr:
if num in seen:
return num
seen.add(num)
return None
def remove_duplicates_ordered(arr):
"""Remove duplicates while preserving order."""
seen = set()
result = []
for num in arr:
if num not in seen:
seen.add(num)
result.append(num)
return result
# Find unique element (all others appear twice)
def find_unique_element(arr):
"""Find element appearing once when rest appear twice."""
unique_set = set()
for num in arr:
if num in unique_set:
unique_set.remove(num)
else:
unique_set.add(num)
return unique_set.pop()
# Find missing numbers
def find_missing(arr, n):
"""Find missing numbers in range 1 to n."""
num_set = set(arr)
return [i for i in range(1, n + 1) if i not in num_set]
# Test cases
print(f"Union [1,2,3] and [3,4,5]: {set_union([1,2,3], [3,4,5])}")
print(f"Intersection [1,2,3] and [3,4,5]: {set_intersection([1,2,3], [3,4,5])}")
print(f"Difference [1,2,3] and [3,4,5]: {set_difference([1,2,3], [3,4,5])}")
print(f"Find duplicates [1,2,2,3,3,3]: {find_duplicates([1,2,2,3,3,3])}")
print(f"First duplicate [1,3,2,1]: {first_duplicate([1,3,2,1])}")
print(f"Remove duplicates [1,2,2,3,1]: {remove_duplicates_ordered([1,2,2,3,1])}")
print(f"Find unique [1,1,2,2,3]: {find_unique_element([1,1,2,2,3])}")
print(f"Is [1,2] subset of [1,2,3]? {is_subset([1,2], [1,2,3])}")
import java.util.*;
public class SetOperations {
// Set union
static Set<Integer> setUnion(int[] a, int[] b) {
Set<Integer> result = new HashSet<>(Arrays.asList(
Arrays.stream(a).boxed().toArray(Integer[]::new)));
for (int num : b) {
result.add(num);
}
return result;
}
// Set intersection
static Set<Integer> setIntersection(int[] a, int[] b) {
Set<Integer> setA = new HashSet<>(
Arrays.asList(Arrays.stream(a).boxed().toArray(Integer[]::new)));
Set<Integer> setB = new HashSet<>(
Arrays.asList(Arrays.stream(b).boxed().toArray(Integer[]::new)));
setA.retainAll(setB);
return setA;
}
// Set difference
static Set<Integer> setDifference(int[] a, int[] b) {
Set<Integer> result = new HashSet<>(
Arrays.asList(Arrays.stream(a).boxed().toArray(Integer[]::new)));
Set<Integer> setB = new HashSet<>(
Arrays.asList(Arrays.stream(b).boxed().toArray(Integer[]::new)));
result.removeAll(setB);
return result;
}
// Find duplicates
static Set<Integer> findDuplicates(int[] arr) {
Set<Integer> seen = new HashSet<>();
Set<Integer> duplicates = new HashSet<>();
for (int num : arr) {
if (!seen.add(num)) {
duplicates.add(num);
}
}
return duplicates;
}
// First duplicate
static Integer firstDuplicate(int[] arr) {
Set<Integer> seen = new HashSet<>();
for (int num : arr) {
if (!seen.add(num)) {
return num;
}
}
return null;
}
// Remove duplicates preserving order
static List<Integer> removeDuplicatesOrdered(int[] arr) {
Set<Integer> seen = new HashSet<>();
List<Integer> result = new ArrayList<>();
for (int num : arr) {
if (seen.add(num)) {
result.add(num);
}
}
return result;
}
// Find unique element (all others appear twice)
static int findUniqueElement(int[] arr) {
Set<Integer> uniqueSet = new HashSet<>();
for (int num : arr) {
if (!uniqueSet.add(num)) {
uniqueSet.remove(num);
}
}
return uniqueSet.iterator().next();
}
// Check subset
static boolean isSubset(int[] a, int[] b) {
Set<Integer> setB = new HashSet<>(
Arrays.asList(Arrays.stream(b).boxed().toArray(Integer[]::new)));
for (int num : a) {
if (!setB.contains(num)) {
return false;
}
}
return true;
}
public static void main(String[] args) {
System.out.println("Union: " + setUnion(new int[]{1,2,3}, new int[]{3,4,5}));
System.out.println("Intersection: " + setIntersection(new int[]{1,2,3}, new int[]{3,4,5}));
System.out.println("Duplicates: " + findDuplicates(new int[]{1,2,2,3,3,3}));
System.out.println("First duplicate: " + firstDuplicate(new int[]{1,3,2,1}));
System.out.println("Remove duplicates: " + removeDuplicatesOrdered(new int[]{1,2,2,3,1}));
}
}
// Set operations
function setUnion(a: number[], b: number[]): Set<number> {
return new Set([...a, ...b]);
}
function setIntersection(a: number[], b: number[]): Set<number> {
const setB = new Set(b);
return new Set(a.filter(x => setB.has(x)));
}
function setDifference(a: number[], b: number[]): Set<number> {
const setB = new Set(b);
return new Set(a.filter(x => !setB.has(x)));
}
function symmetricDifference(a: number[], b: number[]): Set<number> {
const setA = new Set(a);
const setB = new Set(b);
const result = new Set<number>();
for (const x of setA) {
if (!setB.has(x)) result.add(x);
}
for (const x of setB) {
if (!setA.has(x)) result.add(x);
}
return result;
}
// Find duplicates
function findDuplicates(arr: number[]): number[] {
const seen = new Set<number>();
const duplicates = new Set<number>();
for (const num of arr) {
if (seen.has(num)) {
duplicates.add(num);
} else {
seen.add(num);
}
}
return [...duplicates];
}
// Remove duplicates preserving order
function removeDuplicatesOrdered(arr: number[]): number[] {
const seen = new Set<number>();
const result: number[] = [];
for (const num of arr) {
if (!seen.has(num)) {
seen.add(num);
result.push(num);
}
}
return result;
}
// Find unique (all others appear twice)
function findUnique(arr: number[]): number {
const unique = new Set<number>();
for (const num of arr) {
if (unique.has(num)) {
unique.delete(num);
} else {
unique.add(num);
}
}
return [...unique][0];
}
// Test
console.log("Union:", setUnion([1, 2, 3], [3, 4, 5]));
console.log("Intersection:", setIntersection([1, 2, 3], [3, 4, 5]));
console.log("Duplicates:", findDuplicates([1, 2, 2, 3, 3, 3]));
console.log("Remove duplicates:", removeDuplicatesOrdered([1, 2, 2, 3, 1]));
console.log("Find unique:", findUnique([1, 1, 2, 2, 3]));
Common Pitfalls
Modifying set during iteration: Removing elements while iterating causes skipped elements. Create new set or use copy.
Intersection optimization ignored: Iterating larger set then checking smaller is slower. Always iterate the smaller set.
Duplicate detection with first occurrence tracking missing: Store first occurrence index to find earliest duplicate, not just identify existence.
Symmetric difference vs regular difference confusion: A ⊕ B ≠ A - B. Symmetric difference includes B's unique elements too.
Order loss in set operations: Sets don't preserve order. For order-preserving operations, track seen with insertion order (Python 3.7+ dicts, Java LinkedHashSet).
Self-Check
- Explain union, intersection, and difference operations. How do complexity and correctness differ?
- How do sets detect duplicates in linear time? What is the space trade-off?
- Why optimize intersection by iterating the smaller set? Calculate complexity for both approaches.
One Takeaway
Sets provide O(1) membership checking enabling efficient union, intersection, and difference operations. Duplicate detection via sets transforms O(n²) naive checks to O(n). These fundamental operations underpin filtering, grouping, and comparison tasks across algorithms.
References
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to Algorithms (3rd ed.). MIT Press.
- McDowell, G. L. (2015). Cracking the Coding Interview (6th ed.). CareerCup.