macOS · v1.1

Aloud

Menu-bar voice input. Tap Fn to start talking, tap again to stop, and the recognized text is injected straight into the focused field. The backend is Volcano/Doubao streaming ASR — it handles mixed Chinese-English and code terms, with an optional LLM pass that only fixes obvious mis-hears, never rewrites.

Download for macOS .dmg · Apple Silicon First time? Read the setup guide →

Tap Fn, talk — text lands in the focused field, live

Fn toggle

Tap Fn to start recording, tap again to stop. 90-second hard cap as a safety net; text commits the moment you stop.

Live text

A capsule overlay types text word-by-word as you speak — you see the recognition live, not after you've finished. Live waveform; on stop it injects and restores your full clipboard (images, files, rich text included).

Esc to cancel

Realize mid-recording you misspoke? Hit Esc to cancel outright, nothing gets injected, nothing is left behind, and your clipboard stays untouched.

Mixed lang

Volcano/Doubao streaming ASR 2.0, automatic Chinese-English code-switching, sharper on technical terms than system dictation.

LLM fixup

An optional Doubao seed-lite pass that fixes only obvious speech mis-recognition — no polishing, no rewriting. Can be turned off.

Local

Credentials stored on-machine, triggered locally, recognition goes straight to Volcano with no third-party relay.

Requirements

macOS 14 Sonoma or later
Apple Silicon (M-series)
A Volcano Engine account — provision Doubao streaming ASR yourself, put AppID / Access Token into the app's settings
Microphone + Accessibility permission (required to monitor the Fn key and inject text)

This build is unsigned and unnotarized. On first launch macOS may say it's "damaged" — that's Gatekeeper blocking an unsigned download, not actual damage. Move Aloud to Applications, then run xattr -dr com.apple.quarantine /Applications/Aloud.app in Terminal and open it normally. It's an early tool I built for myself: no CI, no code signing, no auto-updater.

Aloud

Log in

Sign up