Standards for accurate voice annotation with timestamps and speaker diarization
The goal of transcription is to produce an accurate, time-aligned text representation of spoken audio with clear speaker identification. Each segment should capture a single speaker's turn with precise start and end timestamps.
Each annotation segment follows this structure:
| Component | Format | Example |
|---|---|---|
| Start Time | HH:MM:SS,mmm |
00:01:23,456 |
| End Time | HH:MM:SS,mmm |
00:01:35,789 |
| Speaker Label | [Name] |
[Martha] or [Speaker 1] |
| Text | Verbatim transcription | The spoken words... |
[Martha]), otherwise use [Speaker 1], [Speaker 2], etc.[chuckles], [noise], [sighs][inaudible] or [unclear] for unintelligible speech-- or ... to indicate cut-off speechBelow is a properly formatted example showing timestamps, speaker labels, and transcription conventions:
[Martha] Hedwig, part one. Hi, and welcome to "The Real Weird Sisters." I'm Martha. [Alice] And I'm Alice. [Martha] And today, we're here. The day has finally arrived. We are here for our very first character study of our queen, Hedwig. [Alice] Hold on. Sorry. I'm just gonna pause it 'cause it does feel like my input is pretty quiet on Audacity. Like, when I spoke, it was really low-looking. So let me just see if I can get it higher. [Martha] Okay. [Alice] Okay. [Martha] I can always turn it up if I need to. [Alice] Okay. [noise] All right. [Alice] Yes, the queen, Hedwig. We are so excited. I think this is going to be the episode that we've all been waiting for. And there's so much to talk about that we, we figured we, we really shouldn't cram it all into one episode. [Alice] Definitely not. [Alice] We haven't done her justice at all. [Martha] Mm-hmm. [Alice] ... in the appropriate head space to talk about this amazing person. I mean- [Martha] And I [chuckles]- [Alice] ... bird.
| Scenario | Convention | Example |
|---|---|---|
| Laughter | Bracketed annotation | [chuckles], [laughs], [both laugh] |
| Sounds | Bracketed annotation | [sighs], [noise], [clears throat] |
| Interrupted speech | Hyphen at end | And I was thinking- |
| Trailing off | Ellipsis | I don't know... |
| Continuing after interruption | Ellipsis at start | ... and that's why I think |
| Filler words | Include as spoken | Um, uh, like, you know |
| Emphasis | Quotation marks or italics context | "very" important |