At a Glance
The Challenge
Modern algorithms prioritize engagement over value, trapping users in a cycle of infinite scroll and FOMO (fear of missing out). The objective was to build a system that respects the user's time: a "signal-in-the-noise" machine that filters the web's vastness into a respectful, high-value feed.
The Solution
HexOcean built an AI-powered curated feed engine — CutFluff.com — that learns from user feedback (“likes”) and automatically filters irrelevant content—across multiple platforms—while generating concise summaries.
The result is a clean, highly curated feed where every item earns its place.
Key Capabilities:
- Video-to-Text Intelligence: Automatically transcribes and summarizes YouTube content, turning 30-minute videos into 30-second text insights.
- AI-generated Summaries: Aggregates and summarizes thousands of articles and comments (Reddit, Hacker News) to deliver concise, high-value content.
- Autonomous Feedback-Driven Personalization: A "set & forget" engine that evolves with user taste—no manual feed grooming required.
- Multi-source Aggregation: For example: social media, YouTube, news publishers, etc.
- Anti-Doomscrolling Architecture: Designed to break engagement loops by delivering finite, batched updates rather than an infinite feed.
Under the Hood: The Tech Stack
To achieve high-fidelity personalization, we moved beyond basic API wrappers and architected a sophisticated Machine Learning pipeline.
Core AI & Language Models
- Smart Summarization: LLMs automatically generate concise bullet-point summaries of long-form articles, videos, and threads.
- Semantic De-Duplication: The system identifies when multiple sources cover the same story (even with different headlines) to prevent feed clutter.
- HyDE (Hypothetical Document Embeddings): We utilize HyDE to generate synthetic "ideal articles" based on user queries, significantly improving retrieval accuracy for niche interests.
- Video Intelligence: Integration of OpenAI Whisper allows the platform to transcribe audio, and analyze content for relevance without requiring user playback.
Vector Search & Data Science
- Utilized text-embedding-3-large (3072 dimensions) for state-of-the-art content similarity matching.
- Deployed pgvector (PostgreSQL) database for production-grade, real-time similarity search at scale.
- Implemented UMAP for dimensionality reduction and Hierarchical Clustering (scikit-learn) to group user interests into coherent, human-readable themes (e.g., "Machine Learning," "Startup News").
- Applied Jina Reranker v2, a cross-encoder model that refines candidate items to ensure the final feed is hyper-relevant.
Intelligent Content Extraction
- Crawl4ai: AI-powered web scraping with content quality filtering to extract clean article text.
- yt-dlp + FFmpeg: A cost-optimized pipeline for video audio extraction.
The Outcome
CutFluff confirms that a rigorous engineering foundation is the deciding factor in transforming raw model outputs into a reliable, high-performance product. We successfully transformed raw, unstructured noise into a precise, self-optimizing intelligence engine.
This project validates HexOcean’s ability to orchestrate sophisticated, multi-modal AI systems—combining vector search, LLMs, and autonomous agents—into a stable, production-grade asset that drives genuine user engagement.
Ready to build your signal-in-the-noise solution?
Platform Screenshots
Speak to a Human Professional
Get a free consultation within 24 hours