Building Software & AI for Growing Businesses

Reclaiming Attention with AI-Powered Personal Feed

Architecting with AI-Native Precision

At a Glance

Multi-Modal RAG Pipeline

Processing Video, Audio & Text

High-Dimensional Vector Search

Semantic Understanding & Embeddings

Agentic Data Orchestration

Autonomous Scraping & Processing Pipeline

Autonomous Personalization

Self-Optimizing Feedback Loops

The Challenge

Modern algorithms prioritize engagement over value, trapping users in a cycle of infinite scroll and FOMO (fear of missing out). The objective was to build a system that respects the user's time: a "signal-in-the-noise" machine that filters the web's vastness into a respectful, high-value feed.

The Solution

HexOcean built an AI-powered curated feed engine — CutFluff.com — that learns from user feedback (“likes”) and automatically filters irrelevant content—across multiple platforms—while generating concise summaries.

The result is a clean, highly curated feed where every item earns its place.

Key Capabilities:

Video-to-Text Intelligence: Automatically transcribes and summarizes YouTube content, turning 30-minute videos into 30-second text insights.
AI-generated Summaries: Aggregates and summarizes thousands of articles and comments (Reddit, Hacker News) to deliver concise, high-value content.
Autonomous Feedback-Driven Personalization: A "set & forget" engine that evolves with user taste—no manual feed grooming required.
Multi-source Aggregation: For example: social media, YouTube, news publishers, etc.
Anti-Doomscrolling Architecture: Designed to break engagement loops by delivering finite, batched updates rather than an infinite feed.

Under the Hood: The Tech Stack

To achieve high-fidelity personalization, we moved beyond basic API wrappers and architected a sophisticated Machine Learning pipeline.

Core AI & Language Models

Smart Summarization: LLMs automatically generate concise bullet-point summaries of long-form articles, videos, and threads.
Semantic De-Duplication: The system identifies when multiple sources cover the same story (even with different headlines) to prevent feed clutter.
HyDE (Hypothetical Document Embeddings): We utilize HyDE to generate synthetic "ideal articles" based on user queries, significantly improving retrieval accuracy for niche interests.
Video Intelligence: Integration of OpenAI Whisper allows the platform to transcribe audio, and analyze content for relevance without requiring user playback.

Vector Search & Data Science

Utilized text-embedding-3-large (3072 dimensions) for state-of-the-art content similarity matching.
Deployed pgvector (PostgreSQL) database for production-grade, real-time similarity search at scale.
Implemented UMAP for dimensionality reduction and Hierarchical Clustering (scikit-learn) to group user interests into coherent, human-readable themes (e.g., "Machine Learning," "Startup News").
Applied Jina Reranker v2, a cross-encoder model that refines candidate items to ensure the final feed is hyper-relevant.

Intelligent Content Extraction

Crawl4ai: AI-powered web scraping with content quality filtering to extract clean article text.
yt-dlp + FFmpeg: A cost-optimized pipeline for video audio extraction.

The Outcome

CutFluff confirms that a rigorous engineering foundation is the deciding factor in transforming raw model outputs into a reliable, high-performance product. We successfully transformed raw, unstructured noise into a precise, self-optimizing intelligence engine.

This project validates HexOcean’s ability to orchestrate sophisticated, multi-modal AI systems—combining vector search, LLMs, and autonomous agents—into a stable, production-grade asset that drives genuine user engagement.

Ready to build your signal-in-the-noise solution?