Word Timestamp to Subtitles

Generate clean subtitles from word-level timestamps with professional segmentation controls, then export polished SRT or VTT instantly.

Drop your word timestamp JSON here

or click to browse from your device

JSON
Subtitle Settings
Max chars / line
ch
Professional range: 32-42
Lines per cue
1 for social, 2 for long-form
Reading speed
cps
Characters per second
Min cue duration
ms
Short cues are extended
Max cue duration
ms
Long cues split faster
Gap between cues
ms
Keeps transitions clean
Prefer sentence boundaries
Break after . ? ! whenever duration allows
Allow comma-level splits
Use comma and semicolon breaks when a cue gets too long
View accepted JSON sample
[
  {"text":"Hello","start":0.12,"end":0.44},
  {"text":"everyone,","start":0.44,"end":0.93},
  {"text":"welcome.","start":0.93,"end":1.40}
]

// Also supported:
// {"words":[{"word":"Hello","start":120,"end":440}]}
// {"results":{"channels":[{"alternatives":[{"words":[...]}]}]}}

JSON Validation and Sample

Paste timestamp JSON to validate it before generation. The validator flags missing start/end fields by word index.

Generating...

Generation Complete

ASR JSON Ready

Supports common word arrays from AssemblyAI, Whisper-style output, and nested words objects.

Professional Logic Controls

Tune line length, cue duration, reading speed, punctuation splits, and cue gap to match delivery style.

Private by Design

Generation runs in-browser. Your uploaded timestamp file is not sent to external conversion APIs.

Built for Professional Subtitle Drafting

Punctuation-Aware Splits

Prioritizes sentence boundaries and optional comma breaks so cues read naturally on screen.

Timing Constraints

Applies minimum and maximum cue duration with configurable cue gaps to prevent collisions and flashes.

Two Output Targets

Generate clean SRT or VTT from the same timestamp source without reformatting manually.

Accepted Timestamp Inputs

Upload JSON where each word includes start and end timing. The tool auto-detects seconds or milliseconds.

ARRAY

Direct Word List

[{"text":"Hello","start":0.12,"end":0.44}] or [{"word":"Hi","start":120,"end":360}].

NESTED

Nested ASR Output

Finds nested words arrays inside channels, alternatives, segments, or result objects.

UNITS

Seconds or Milliseconds

Understands numeric seconds, numeric milliseconds, and strings like 00:00:12.340 or 120ms.

What Is a Word-Level Timestamp?

A word-level timestamp links each token to start and end time. This is common in Whisper and modern STT pipelines.

SYNC

Per-Word Timing

Each subtitle cue is built from accurate token timing, not guessed phrase timing.

WHISPER

Whisper and STT Ready

Supports Whisper-style words arrays and nested ASR outputs from common speech-to-text providers.

QC

Validation Before Export

Use the built-in validator to catch missing timing values before generating SRT/VTT files.

JSON Schema Example

Recommended minimum schema for robust subtitle generation:

SCHEMA

Required Fields

[{"text":"Hello","start":0.12,"end":0.44}] where start and end are either seconds or milliseconds.

ALT KEYS

Also Accepted

word, start_time, end_time, duration, and nested words arrays.

EXPORT

Output

Generate subtitle files ready for editors, social clips, and long-form video in SRT or VTT.

Frequently Asked Questions

What should my JSON look like?

Each word should have text plus timing fields. Common keys are text or word, with start and end.

Does this work with Whisper and STT word timestamps?

Yes. The tool supports direct word arrays and nested outputs that include per-word timing from Whisper and common STT providers.

Can I generate one-line subtitles only?

Yes. Set “Lines per cue” to 1 and the generator will keep each cue on a single line.

Does it preserve word timings from my file?

Yes. Cue start and end are derived from word timings, then refined with your min/max duration and gap settings.

Which subtitle formats can I export?

You can export the generated subtitles as SRT or VTT.