Technical overview

How WordCut works

WordCut started as one question: how much of video editing can be automated without taking creative control away from the editor?

ReactTypeScriptNext.jsFastAPIPythonWhisper / GroqGPT-4oFFmpegRemotionDocker

The pipeline

Click any step to expand

01

Upload

Drop in a video and say what you want. That one message becomes the starting point for the whole pipeline.

multipart/form-dataexpand →
02

Intent parsing

This is where messy human language gets turned into a clean list of steps the system can actually execute.

POST /parse-intentexpand →
03G

Transcription

Once every word has a timestamp, the video becomes searchable, cuttable, and editable through language.

Whisper large-v3-turboexpand →
04

Semantic segmentation

This was one of the most interesting parts to build. The system finds moments based on meaning, not just silence or scene changes.

POST /processexpand →
05FF

Video processing

This is where I realized the hard part was not the AI. FFmpeg has to make the edit real.

FFmpeg + OpenCVexpand →
06

Render & export

Everything the user built in the editor has to line up exactly with what comes out the other end as an MP4.

POST /exportexpand →

Architecture

Frontend

React 19Next.js 15TypeScriptCanvas API

This is where the user feels the product: chat, preview, timeline, subtitles, and controls. The challenge was making AI automation still feel editable and personal, not like a black box.

Backend orchestration

FastAPIPython 3.11OpenAI SDKGroq SDKBoto3

The backend acts like the conductor. It figures out which step runs next, keeps track of files, calls the AI models, and hands work off to the video tools at the right time.

Video workers

FFmpegRemotionChromiumOpenCVlibx264

This layer does the heavy lifting. FFmpeg and Remotion turn decisions into actual frames, clips, subtitles, and exports. Most of the production pain lived here.

What was hard

The engineering challenges that aren't obvious from the outside

Video pipeline reliability

I thought the AI would be the hardest part. It wasn't. The hard part was making FFmpeg reliably cut, reframe, concatenate, and burn subtitles across formats, resolutions, codecs, and edge cases I didn't know existed.

Timeline state coherence

The editor is constantly balancing multiple truths at once: what the user sees in the preview, what the timeline stores, what the backend has processed, and what will actually come out in the export.

Preview vs. final render parity

The canvas preview has to feel instant. The export has to be accurate. Matching those two worlds, an HTML5 canvas and FFmpeg's ASS subtitle engine, was harder than it looks because they handle text layout differently.

Infrastructure constraints

Rendering video is expensive. Remotion is powerful, but running Chromium inside a small container taught me a lot about memory limits, OOM kills, and what production deployment actually costs.

Why it matters

WordCut is not about replacing editors. It is about removing the repetitive parts so creators can spend more time on taste, pacing, and story. Building it made me realize that the future of creative tools is not just automation. It is giving people faster ways to stay in control.