Social and Communication: Feeds, Fan-Out, and Content Moderation

Real-time social graphs, efficient feed generation, and scale-safe content moderation

TL;DR

Social networks show feeds (timeline of posts from followed users). Fan-out on write (pre-compute when post created): post flows to all followers' feeds immediately. Fast reads, expensive writes (celebrity's post fans-out to 100M followers). Fan-out on read (compute when requested): slower reads, cheap writes. Real-time messaging uses WebSockets, polling, or push notifications. Content moderation uses rules + ML (hate speech, CSAM, spam). Notifications (push, email, in-app) inform users of activity. Abuse prevention flags spam, coordinated inauthentic behavior.

Learning Objectives

Design feed generation systems
Choose fan-out on write vs read strategies
Implement real-time messaging (WebSockets)
Build content moderation pipelines
Design notification systems
Detect and prevent abuse

Motivating Scenario

You launch a social network. User posts photo, instantly visible to 100,000 followers. Each follower's feed must refresh within 500ms. Two approaches: fan-out on write (push post to all followers' caches immediately) or fan-out on read (fetch followers' posts when user opens feed). Fan-out on write is fast (reads instant) but expensive (100M followers = 100M cache updates). Fan-out on read is cheaper but slow (fetch 1000 posts, sort, rank). Hybrid: use fan-out on write for normal users, fan-out on read for celebrities (too expensive to fan-out).

Core Concepts

Social systems prioritize engagement (speed, relevance) and safety (moderation, abuse prevention):

Feed: Chronological or ranked list of posts from followed users.

Fan-Out on Write (FOW): When user posts, immediately write to all followers' feed caches. Fast reads.

Fan-Out on Read (FOR): When user opens feed, fetch all followed users' posts, rank, return. Fast writes.

Social Graph: Who follows whom. Used to find followers and feed targets.

Real-Time Messaging: Instant communication (chat, DMs) via WebSocket, polling, or push.

Content Moderation: ML + rules to detect hate speech, CSAM, spam, and automatically remove/flag.

Notifications: Inform users of activity (new follower, comment, mention) via push/email/in-app.

Ranking Algorithm: What posts to show first (recency, engagement, user interests).

Social network: Feed generation, real-time messaging, moderation

Key Concepts

Timeline: Chronological feed (most recent first).

Algorithmic Feed: Ranked by engagement, relevance. More engaging (higher conversion).

Follower Graph: Adjacency list or matrix. Used for fan-out.

Lazy Materialization: Don't pre-compute, generate on demand (more flexible).

Abuse Patterns: Spam (repetitive posts), harassment (targeted hate), coordinated inauthentic behavior (bot networks).

Patterns and Pitfalls

Design Review Checklist

Self-Check

Fan-out on write vs read? FOW: fast reads, expensive writes. FOR: cheap writes, slow reads. Choose based on celebrity/normal user ratio.
How to handle celebrity (100M followers)? Can't FOW (100M writes). Use FOR or hybrid (FOW only to online followers).
What's content moderation? Detect violations (hate speech, CSAM) via rules + ML. Automated remove obvious, escalate uncertain to humans.

info

One Takeaway: Social platforms scale on engagement, but must moderate to prevent toxicity. Fast moderation (< 5 minutes) is critical.

Next Steps

Feed Ranking: Recommendation systems, ML models (collaborative filtering, content-based)
Real-Time Messaging: Socket.IO, Firebase, Pusher
Content Moderation: Perspective API, Crisp, AWS Rekognition for CSAM
Analytics: Spark, Kafka for event tracking
Spam Detection: Anomaly detection, network analysis, bot detection

Detailed Implementation Strategies

Fan-Out on Write Implementation

class FanOutOnWriteService:
    """Fan-out when user posts."""
    def __init__(self, social_graph_repo, feed_cache, notification_service):
        self.social_graph = social_graph_repo
        self.cache = feed_cache
        self.notifications = notification_service

    def post_published(self, post: Post):
        """Called when user publishes a post."""
        # Get all followers
        followers = self.social_graph.get_followers(post.user_id)

        # Batch writes to cache for efficiency
        cache_updates = []
        for follower_id in followers:
            cache_key = f"feed:{follower_id}"
            cache_updates.append((cache_key, post))

        # Write all at once (pipeline in Redis)
        self.cache.batch_add_to_list(cache_updates)

        # Send notifications asynchronously
        for follower_id in followers:
            self.notifications.notify_new_post(follower_id, post)

        # Track metrics
        metrics.counter('posts.published.fanout_count', len(followers))

Pros: Feed reads are instant (cache hit). User sees posts immediately.

Cons: Heavy load on celebrity posts (millions of followers). May need to fall back to fan-out on read for celebrities.

Fan-Out on Read Implementation

class FanOutOnReadService:
    """Compute feed when user opens app."""
    def __init__(self, social_graph_repo, post_repo, ranking_service):
        self.social_graph = social_graph_repo
        self.posts = post_repo
        self.ranking = ranking_service

    def get_feed(self, user_id: str, limit: int = 20) -> List[Post]:
        """Compute feed dynamically."""
        # Get following list
        following = self.social_graph.get_following(user_id)

        # Fetch recent posts from all following
        recent_posts = self.posts.get_recent_from_users(
            following, limit=limit * 3  # Get extra for ranking
        )

        # Rank posts (engagement, relevance, recency)
        ranked = self.ranking.rank_posts(recent_posts, user_id)

        # Return top N
        return ranked[:limit]

Pros: Works for celebrities (no expensive fan-out). Flexible ranking.

Cons: Feed computation is slower (200-500ms). Requires efficient database queries.

Hybrid Approach

class HybridFanOutService:
    """Use FOW for normal users, FOR for celebrities."""
    def __init__(self, social_graph, cache, post_repo, ranking):
        self.social_graph = social_graph
        self.cache = cache
        self.posts = post_repo
        self.ranking = ranking
        self.celebrity_threshold = 100000  # If >100k followers, use FOR

    def post_published(self, post: Post):
        """Publish post using optimal strategy."""
        follower_count = self.social_graph.get_follower_count(post.user_id)

        if follower_count > self.celebrity_threshold:
            # Celebrity: don't fan-out, will be discovered in FOR
            self._mark_for_discovery(post)
        else:
            # Normal user: fan-out to followers
            self._fanout_on_write(post)

    def get_feed(self, user_id: str) -> List[Post]:
        """Get feed using cached + discovered posts."""
        # Cached posts from followed non-celebrities
        cached = self.cache.get_feed(user_id)

        # Posts from followed celebrities (computed)
        following = self.social_graph.get_following(user_id)
        celebrity_posts = self.posts.get_recent_from_users(
            [u for u in following if self.is_celebrity(u)],
            limit=5
        )

        # Merge and rank
        all_posts = cached + celebrity_posts
        return self.ranking.rank_posts(all_posts, user_id)[:20]

Real-Time Messaging (Chat, DMs)

WebSocket-based instant messaging:

class RealtimeMessagingService:
    """WebSocket-based instant messaging."""
    def __init__(self, message_store, user_presence):
        self.messages = message_store
        self.presence = user_presence
        self.ws_connections = {}  # user_id -> [WebSocket]

    async def send_message(self, from_user: str, to_user: str, text: str):
        """Send DM with real-time delivery."""
        # Create message
        msg = Message(from_id=from_user, to_id=to_user, text=text, timestamp=now())

        # Persist
        await self.messages.save(msg)

        # If recipient online, push immediately
        if self.presence.is_online(to_user):
            for ws in self.ws_connections.get(to_user, []):
                await ws.send(json.dumps({
                    'type': 'new_message',
                    'from': from_user,
                    'text': text,
                    'timestamp': msg.timestamp
                }))
        else:
            # Offline: will fetch when they log in
            await self.messages.mark_unread(to_user, msg.id)

    async def subscribe_to_dms(self, user_id: str, ws: WebSocket):
        """WebSocket connection for DMs."""
        if user_id not in self.ws_connections:
            self.ws_connections[user_id] = []
        self.ws_connections[user_id].append(ws)

        # Mark user online
        await self.presence.mark_online(user_id)

        try:
            async for message in ws:
                # Process incoming messages
                data = json.loads(message)
                await self.send_message(user_id, data['to_user'], data['text'])
        finally:
            self.ws_connections[user_id].remove(ws)
            if not self.ws_connections[user_id]:
                await self.presence.mark_offline(user_id)

Content Moderation Pipeline

Multi-layered moderation:

class ContentModerationService:
    """Rules + ML + human review."""
    def __init__(self, rules_engine, ml_model, human_review_queue):
        self.rules = rules_engine
        self.ml = ml_model
        self.queue = human_review_queue

    async def moderate_post(self, post: Post) -> ModerationDecision:
        """Moderate post with multi-layer pipeline."""
        # Layer 1: Rule-based (spam, known CSAM hashes, etc.)
        rule_decision = self.rules.evaluate(post.content)
        if rule_decision.action == 'remove':
            return ModerationDecision(action='remove', reason='rule_violation', confidence=1.0)

        # Layer 2: ML model (hate speech, harassment, etc.)
        ml_result = await self.ml.predict(post.content)
        if ml_result.score > 0.9:  # High confidence
            if ml_result.category in ['hate_speech', 'violence']:
                return ModerationDecision(action='remove', reason=ml_result.category, confidence=ml_result.score)

        if ml_result.score > 0.7:  # Medium confidence
            # Human review
            await self.queue.enqueue({
                'post_id': post.id,
                'content': post.content,
                'reason': ml_result.category,
                'confidence': ml_result.score
            })
            # Meanwhile, suppress visibility
            return ModerationDecision(action='suppress', reason='pending_review')

        # Layer 3: Automated checks pass, post approved
        return ModerationDecision(action='approve', reason='passed_all_checks')

Abuse Detection Patterns

Detecting spam, coordinated behavior, bot networks:

class AbuseDetectionService:
    """Detect spam, bots, coordinated inauthentic behavior."""
    def __init__(self, user_repo, post_repo, graph_repo):
        self.users = user_repo
        self.posts = post_repo
        self.graph = graph_repo

    def detect_spam_account(self, user_id: str) -> AbuseScore:
        """Detect if account is a spam bot."""
        score = 0

        # Signal 1: Very high posting rate (e.g., >100 posts/day)
        post_count = self.posts.count_posts_today(user_id)
        if post_count > 100:
            score += 0.3

        # Signal 2: Low engagement ratio (posts many, gets few likes)
        avg_engagement = self.posts.get_avg_engagement(user_id)
        if avg_engagement < 0.01:  # < 1% of followers engage
            score += 0.2

        # Signal 3: New account with high activity
        user = self.users.get(user_id)
        if user.age_days < 7 and post_count > 20:
            score += 0.3

        # Signal 4: Follows/unfollows many users rapidly
        follow_churn = self.graph.get_follow_churn(user_id)
        if follow_churn > 500 / 3600:  # > 500 follows/hour
            score += 0.2

        return AbuseScore(user_id=user_id, spam_score=score)

    def detect_coordinated_behavior(self, user_ids: List[str]) -> bool:
        """Detect if group of accounts act in coordinated manner."""
        # Check if they post same content within short time window
        content_hashes = {}
        for user_id in user_ids:
            posts = self.posts.get_recent(user_id, limit=10)
            for post in posts:
                content_hash = hash(post.content)
                if content_hash in content_hashes:
                    content_hashes[content_hash] += 1
                else:
                    content_hashes[content_hash] = 1

        # If >50% of posts are duplicated, likely coordinated
        if max(content_hashes.values()) > len(user_ids) * 0.5:
            return True

        return False

Feed Ranking Algorithms

What determines feed order (chronological vs algorithmic):

class FeedRankingService:
    """Rank posts by engagement, relevance, recency."""
    def __init__(self, interaction_repo, user_graph):
        self.interactions = interaction_repo
        self.graph = user_graph

    def rank_posts(self, posts: List[Post], viewer_id: str) -> List[Post]:
        """Rank posts using multi-factor score."""
        scores = []

        for post in posts:
            score = self._calculate_score(post, viewer_id)
            scores.append((post, score))

        # Sort by score descending
        return [post for post, score in sorted(scores, key=lambda x: x[1], reverse=True)]

    def _calculate_score(self, post: Post, viewer_id: str) -> float:
        score = 0.0

        # Factor 1: Recency (recent posts scored higher)
        age_hours = (now() - post.created_at).total_seconds() / 3600
        recency = 1.0 / (1.0 + age_hours)  # Decay over time
        score += recency * 0.3

        # Factor 2: Engagement (likes, comments)
        engagement = post.likes + post.comments * 2  # Comments weighted more
        score += min(engagement / 1000, 1.0) * 0.3  # Cap at 1.0

        # Factor 3: Social relevance (from accounts you interact with often)
        follower = self.graph.get_user(post.author_id)
        if self.interactions.is_frequent_interactor(viewer_id, post.author_id):
            score += 0.2
        else:
            score += 0.05

        # Factor 4: Content quality (posts with images/videos rank higher)
        if post.media_type in ['image', 'video']:
            score += 0.15

        return score

Notification System Design

Push, email, in-app notifications:

class NotificationService:
    """Multi-channel notification delivery."""
    def __init__(self, push_service, email_service, db):
        self.push = push_service
        self.email = email_service
        self.db = db

    async def notify_new_follower(self, user_id: str, follower_id: str):
        """Notify user of new follower."""
        # Get user preferences
        prefs = self.db.get_notification_preferences(user_id)

        # Push notification (if enabled, real-time)
        if prefs.push_enabled:
            await self.push.send(user_id, {
                'title': f'{follower_id} started following you',
                'action': 'open_profile',
                'target': follower_id
            })

        # Email digest (batch, send once daily)
        if prefs.email_digest:
            await self.db.add_to_digest(user_id, 'new_follower', {
                'follower_id': follower_id,
                'timestamp': now()
            })

        # In-app notification (always shown)
        await self.db.create_notification(
            user_id=user_id,
            type='new_follower',
            data={'follower_id': follower_id},
            created_at=now()
        )

References

Xing, L., et al. (2011). "Comprehensive Vertex-Based Neighborhood Aggregation." (Social graphs) ↗️
Facebook Engineering: "The Timeline Architecture." ↗️
Twitter Engineering: "Real-Time Messaging." ↗️
"Large-Scale Social Graph Processing" — Research paper on scalable social network algorithms
Amazon SQS/SNS: "Building Real-Time Applications"
Google Cloud Pub/Sub: "Message-Driven Architectures"

Social and Communication: Feeds, Fan-Out, and Content Moderation

TL;DR​

Learning Objectives​

Motivating Scenario​

Core Concepts​

Key Concepts​

Patterns and Pitfalls​

Patterns and Pitfalls

Design Review Checklist​

Self-Check​

Next Steps​

Detailed Implementation Strategies​

Fan-Out on Write Implementation​

Fan-Out on Read Implementation​

Hybrid Approach​

Real-Time Messaging (Chat, DMs)​

Content Moderation Pipeline​

Abuse Detection Patterns​

Feed Ranking Algorithms​

Notification System Design​

References​

TL;DR

Learning Objectives

Motivating Scenario

Core Concepts

Key Concepts

Patterns and Pitfalls

Design Review Checklist

Self-Check

Next Steps

Detailed Implementation Strategies

Fan-Out on Write Implementation

Fan-Out on Read Implementation

Hybrid Approach

Real-Time Messaging (Chat, DMs)

Content Moderation Pipeline

Abuse Detection Patterns

Feed Ranking Algorithms

Notification System Design

References