Options tutorial series is live — start here

macOS · v1.0

Aloud

Menu-bar voice input. Tap Fn to start talking, tap again to stop, and the recognized text is injected straight into the focused field. The backend is Volcano/Doubao streaming ASR — it handles mixed Chinese-English and code terms, with an optional LLM pass that only fixes obvious mis-hears, never rewrites.

Aloud capsule overlay typing recognized speech into a Mail compose window, live
Tap Fn, talk — text lands in the focused field, live
Fn toggle

Tap Fn to start recording, tap again to stop. 90-second hard cap as a safety net; text commits the moment you stop.

Live text

A capsule overlay types text word-by-word as you speak — you see the recognition live, not after you've finished. Live waveform; on stop it injects and restores your clipboard.

Mixed lang

Volcano/Doubao streaming ASR 2.0, automatic Chinese-English code-switching, sharper on technical terms than system dictation.

LLM fixup

An optional Doubao seed-lite pass that fixes only obvious speech mis-recognition — no polishing, no rewriting. Can be turned off.

Local

Credentials stored on-machine, triggered locally, recognition goes straight to Volcano with no third-party relay.

Requirements

  • macOS 14 Sonoma or later
  • Apple Silicon (M-series)
  • A Volcano Engine account — provision Doubao streaming ASR yourself, put AppID / Access Token into the app's settings
  • Microphone + Accessibility permission (required to monitor the Fn key and inject text)

This build is unsigned and unnotarized. On first launch macOS may say it's "damaged" — that's Gatekeeper blocking an unsigned download, not actual damage. Move Aloud to Applications, then run xattr -dr com.apple.quarantine /Applications/Aloud.app in Terminal and open it normally. It's an early tool I built for myself: no CI, no code signing, no auto-updater.