Reclaiming Attention with AI-Powered Personal Feed

Architecting with AI-Native Precision

At a Glance

Multi-Modal RAG Pipeline
Processing Video, Audio & Text
High-Dimensional Vector Search
Semantic Understanding & Embeddings
Agentic Data Orchestration
Autonomous Scraping & Processing Pipeline
Autonomous Personalization
Self-Optimizing Feedback Loops

The Challenge

CutFluff Application Screenshot

Modern algorithms prioritize engagement over value, trapping users in a cycle of infinite scroll and FOMO (fear of missing out). The objective was to build a system that respects the user's time: a "signal-in-the-noise" machine that filters the web's vastness into a respectful, high-value feed.

The Solution

HexOcean built an AI-powered curated feed engineCutFluff.com — that learns from user feedback (“likes”) and automatically filters irrelevant content—across multiple platforms—while generating concise summaries.

The result is a clean, highly curated feed where every item earns its place.

Key Capabilities:

  • Video-to-Text Intelligence: Automatically transcribes and summarizes YouTube content, turning 30-minute videos into 30-second text insights.
  • AI-generated Summaries: Aggregates and summarizes thousands of articles and comments (Reddit, Hacker News) to deliver concise, high-value content.
  • Autonomous Feedback-Driven Personalization: A "set & forget" engine that evolves with user taste—no manual feed grooming required.
  • Multi-source Aggregation: For example: social media, YouTube, news publishers, etc.
  • Anti-Doomscrolling Architecture: Designed to break engagement loops by delivering finite, batched updates rather than an infinite feed.

Under the Hood: The Tech Stack

To achieve high-fidelity personalization, we moved beyond basic API wrappers and architected a sophisticated Machine Learning pipeline.

Core AI & Language Models

  • Smart Summarization: LLMs automatically generate concise bullet-point summaries of long-form articles, videos, and threads.
  • Semantic De-Duplication: The system identifies when multiple sources cover the same story (even with different headlines) to prevent feed clutter.
  • HyDE (Hypothetical Document Embeddings): We utilize HyDE to generate synthetic "ideal articles" based on user queries, significantly improving retrieval accuracy for niche interests.
  • Video Intelligence: Integration of OpenAI Whisper allows the platform to transcribe audio, and analyze content for relevance without requiring user playback.

Vector Search & Data Science

  • Utilized text-embedding-3-large (3072 dimensions) for state-of-the-art content similarity matching.
  • Deployed pgvector (PostgreSQL) database for production-grade, real-time similarity search at scale.
  • Implemented UMAP for dimensionality reduction and Hierarchical Clustering (scikit-learn) to group user interests into coherent, human-readable themes (e.g., "Machine Learning," "Startup News").
  • Applied Jina Reranker v2, a cross-encoder model that refines candidate items to ensure the final feed is hyper-relevant.

Intelligent Content Extraction

  • Crawl4ai: AI-powered web scraping with content quality filtering to extract clean article text.
  • yt-dlp + FFmpeg: A cost-optimized pipeline for video audio extraction.

The Outcome

CutFluff confirms that a rigorous engineering foundation is the deciding factor in transforming raw model outputs into a reliable, high-performance product. We successfully transformed raw, unstructured noise into a precise, self-optimizing intelligence engine.

This project validates HexOcean’s ability to orchestrate sophisticated, multi-modal AI systems—combining vector search, LLMs, and autonomous agents—into a stable, production-grade asset that drives genuine user engagement.

Ready to build your signal-in-the-noise solution?

Platform Screenshots

CutFluff Application Screenshot
CutFluff Feed Interface
CutFluff Technical Architecture

Speak to a Human Professional

Get a free consultation within 24 hours