macOS · v1.0
Aloud
Menu-bar voice input. Tap Fn to start talking, tap again to stop, and the recognized text is injected straight into the focused field. The backend is Volcano/Doubao streaming ASR — it handles mixed Chinese-English and code terms, with an optional LLM pass that only fixes obvious mis-hears, never rewrites.
Tap Fn to start recording, tap again to stop. 90-second hard cap as a safety net; text commits the moment you stop.
A capsule overlay types text word-by-word as you speak — you see the recognition live, not after you've finished. Live waveform; on stop it injects and restores your clipboard.
Volcano/Doubao streaming ASR 2.0, automatic Chinese-English code-switching, sharper on technical terms than system dictation.
An optional Doubao seed-lite pass that fixes only obvious speech mis-recognition — no polishing, no rewriting. Can be turned off.
Credentials stored on-machine, triggered locally, recognition goes straight to Volcano with no third-party relay.
Requirements
- macOS 14 Sonoma or later
- Apple Silicon (M-series)
- A Volcano Engine account — provision Doubao streaming ASR yourself, put AppID / Access Token into the app's settings
- Microphone + Accessibility permission (required to monitor the Fn key and inject text)
This build is unsigned and unnotarized. On first launch macOS may say it's "damaged" — that's Gatekeeper blocking an unsigned download, not actual damage. Move Aloud to Applications, then run xattr -dr com.apple.quarantine /Applications/Aloud.app in Terminal and open it normally. It's an early tool I built for myself: no CI, no code signing, no auto-updater.