博客

111篇文章

MikuTools 最新文章:工具教程、产品更新、AI 工具实践和工程笔记。

  1. Og Image

    MiniMax Music Cover: a $0.15 cover that keeps the melody

    MiniMax Music Cover does one thing well: it takes a song you already have and reimagines it in a different style while preserving the melody. Flat /bin/zsh.15 per generation. No subscription required.

    8 分钟阅读
  2. Og Image

    TTS in an accessibility plan: where it helps, where it stops

    Adding text-to-speech to a website or app looks like an accessibility win. Sometimes it is. Sometimes it is a checkbox that distracts from the real accessibility work. Here is the honest map of where TTS helps and where it does not.

    10 分钟阅读
  3. Og Image

    Text-to-speech pricing math for long scripts and small teams

    TTS is priced by the character. The pricing looks small until your scripts get long. Here is the cost math for typical projects, podcasts, audiobooks, courses, marketing libraries, and the levers that change the bottom line for a small team.

    10 分钟阅读
  4. Og Image

    A five-minute casting worksheet for picking your TTS voice

    A short worksheet that shortlists your TTS voice in five minutes by answering five small questions about the script, the audience, and the medium. Designed for first-time users who do not want to scroll a sixty-voice catalog without a plan.

    7 分钟阅读
  5. Og Image

    Word-level timestamps and what to actually build with them

    The word-timestamp option on the text-to-speech tool returns the start and end of every word alongside the audio. That metadata is the difference between a player that just plays audio and a player that does follow-along reading, language learning, jump-to-paragraph navigation, and accessibility narration that earns its place in the page.

    9 分钟阅读
  6. Og Image

    Speed and clarity: how fast can you push synthetic narration

    The speed slider on a text-to-speech tool runs from 0.5 to 4.0. Above 1.5 the audio degrades in ways that are not obvious until your listener tells you. Here is the honest band where each speed setting works and where it stops working.

    7 分钟阅读
  7. Og Image

    Mandarin text-to-speech without breaking the tones

    Mandarin TTS reads most prose acceptably and stumbles in predictable places. Tone sandhi rules, polyphonic characters, and code-switched English are where modern models still fail. Here is the practical guide for Mandarin scripts that need to actually sound right.

    10 分钟阅读
  8. Og Image

    EU AI Act, ACX, and the disclosure rules every TTS user should know in 2026

    Synthetic voice is now regulated. The EU AI Act puts hard disclosure obligations on AI-generated audio starting August 2026, with penalties up to 15 million euros. ACX still bans AI narration for general distribution. Here is what changes, when, and how to comply without panic.

    10 分钟阅读
  9. Og Image

    E-learning narration: where TTS holds up and where it doesn't

    Modern text-to-speech reaches statistical parity with human narrators on learning outcomes for some content types. For others, it underperforms in measurable ways. Here is the field guide for L&D teams choosing between synthetic narration, hired narrators, and the in-house host.

    8 分钟阅读
  10. Og Image

    Podcast intros and outros without a voice actor on call

    A practical recipe for producing a clean podcast intro, outro, and mid-roll bumpers using synthetic voice. The whole package, ready to drop into your editor, in under thirty minutes per episode template.

    8 分钟阅读
  11. Og Image

    Use AI text-to-speech for the audiobook draft, not the final

    Synthetic narration is good enough to write a whole book through end-to-end. ACX still does not accept it for distribution. Here is how indie authors should actually use TTS in 2026 — as a draft tool, not a finished narrator.

    8 分钟阅读