Options tutorial series is live — start here

Setup

Getting Aloud running

Aloud ships no recognition of its own — it calls Doubao streaming ASR through your own Volcano Engine account. Three steps: provision the service and get credentials, grant system permissions, and (optional) set a term dictionary. Without step one the tool does nothing at all.

Step 1 · Required

Provision Doubao, get AppID / Access Token

  1. Sign in to the Volcano Engine console and search for "豆包语音" (Doubao Voice) or open "智能语音" (Intelligent Speech).
  2. Create an Application and provision the 语音识别大模型 (Speech Recognition Large Model) service. It must be the large-model / streaming 2.0 service, not the older small-model one — Aloud uses 2.0, and the wrong service returns a 403.
  3. On the application detail page, grab two values: AppID and Access Token.
  4. Open the Aloud menu-bar icon → Voice Engine Settings… and put the two values into App ID and Access Token under the "豆包流式语音识别(必填)" section, then click Save.
  5. Tap Fn and say something. If text comes out, you're set.

Getting a 403 / "not provisioned"

The error usually says "service not provisioned," but the real cause is almost always that you provisioned the small-model service instead of large-model streaming 2.0. Go back to the console, confirm the service is "语音识别大模型", wait a few minutes for it to take effect, and retry. Wrong credentials only cause an auth failure, not a 403.

Step 2 · Required

System permissions

Aloud is unsigned, and it needs to monitor the Fn key, inject text into other apps, and record from the microphone. Miss any of the three and it won't work.

  • First launch: a double-click gets blocked. Right-click Aloud.app → Open → Open again; or go to System Settings → Privacy & Security and use the "Open Anyway" line near the bottom.
  • Microphone: System Settings → Privacy & Security → Microphone, toggle Aloud on.
  • Accessibility: System Settings → Privacy & Security → Accessibility, toggle Aloud on. Both the Fn-key monitoring and injecting text into the focused field depend on this — without it, pressing Fn does nothing.

Restart Aloud once after changing permissions for the cleanest result.

Step 3 · Optional

Term dictionary

Technical words, names, and product names get heard as homophones. The term dictionary feeds these to Doubao before recognition — more reliable than letting an LLM guess after the fact, and without the extra few seconds of latency.

  • Voice Engine Settings… → the "热词" (hot words) box under "术语词库", one term per line, e.g. Kubernetes, Pydantic, idempotent, the names and projects you say often.
  • Roughly 100 entries cap; anything beyond is trimmed. Pick the high-frequency words most often misheard — don't pad the list.
  • Stored locally, sent inline to Doubao at recognition time. It does not upload to a cloud word table or route through any third party.
  • It and LLM correction are two layers: the dictionary works before recognition (more accurate, zero latency), the LLM works after as a backstop (fixes obvious mis-hears). Both on is best; you can also run the dictionary alone and turn the LLM off.

Haven't downloaded yet? Back to the Aloud download page. If it breaks, [email protected].