Contents | Start | End | Previous: Appendix E: Speech Markup Reference | Next: Appendix G: Alphabet Description Reference

APPENDIX F: SPEECH PROFILE REFERENCE

A speech profile is a collection of properties related to generating speech audio via a text-to-speech engine; there is always one default profile, and you can create further profiles and associate them with configurations via the Speech profile configuration option.

Speech profiles determine the operation of immediate narration, previewing of editor content, and creation of audio files.

Formats

These properties specify the output and intermediate source formats to be used.

Generate speech WAV files

Enables or disables WAV (or AIFF on Mac) audio file generation for speech output. In fact, these audio files are always generated, but will be deleted afterwards if this option is disabled.

Generate speech MP3 files

Enables or disables MP3 audio file generation for speech output.

Speech MP3 bit rate

The MP3 bit rate in kbs. 128kbs is the default bit rate, but for speech audio you can reduce it to 48kbs without significant loss of quality and your files will be smaller. A value of (auto) uses the value specified in Preferences.

Speech source format

The format that Jutoh submits to the text-to-speech system. A value of (auto) uses the best method for the chosen speech engine.

Keep source files

Keeps the generated source files after audio file generation.

Inject SSML

Injects SSML speech markup into other formats, currently Epub and text. This is useful for submitting marked-up Epub or text files to ElevenLabs for narration, for example.

Speech archive

These properties control if and how speech archives should be generated. Speech archives allow customers to create speech audio on their own computers.

Generate speech archive

Generates a speech archive (.sparch) file for distributing the speech source files.

Generate portable archive

If checked, creates an archive that can be used to generate speech on any platform supported by Jutoh. If this option is cleared, only the selected speech format will be generated, and the speech archive may be platform-dependent.

Speech properties

These properties control important aspects of speech output such as the text-to-speech engine to be used and the initial voice.

Speech engine

The speech engine to use. A value of (auto) uses the value specified in Preferences.

Speech voice

The voice to use. A value of (auto) uses the value specified in Preferences.

Speech voice variant

The voice variant. A value of (auto) uses the value specified in Preferences.

Speech speed

The speech speed expressed as a percentage. A value of (auto) uses the value specified in Preferences.

Speech volume

The speech volume expressed as a percentage. A value of (auto) uses the value specified in Preferences. This option is ignored for Apple Speech Manager when generating files.

Speech pitch

The speech pitch expressed as a percentage. A value of (auto) uses the value specified in Preferences. This option is ignored for Apple Speech Manager when generating files.

Options

These properties control various behaviours, such as highlighting text during narration of editor content.

Highlight text

Highlights text as it is being read, where supported by the speech engine. Note that the editor undo history is cleared before and after narration if highlighting is enabled.

Highlight background colour

The highlight background colour.

Paragraph pause duration

The after-paragraph pause duration in milliseconds. This is for SAPI only since it does not have a paragraph construct. The default is 500.

Emulation

Specifies which XML tags to emulate by transforming text, to work around weaknesses in speech engines. Specify (all) to perform all relevant emulation, (none) to perform no emulation, or a comma-separated list of keywords. Available keywords are say-as, say-as.characters, say-as.digits, say-as.telephone.

Lexicons

These properties relate to how lexicons are used during speech output.

Lexicon tags

Comma-delimited tags to match lexicons that should be included. If no tags are specified, all lexicons match.

Lexicon alphabets

Comma-delimited alphabet(s) to use in generated lexicons or inline pronunciations. Use wildcards if needed.

Inline pronunciations

Replaces words in the speech source files from lexicons, using specified phonemes or aliases (‘sounds-like’ pronunciations). This can be done in addition to generating lexicons if necessary. Please note that for text-to-speech, currently Jutoh only supports inline pronunciations, and does not load generated lexicon files.

PLS lexicons

Saves lexicons in PLS lexicon format when generating SSML or Epub 3.

CereVoice lexicon

Saves lexicons in CereVoice lexicon format when generating SSML.

CereVoice abbreviations

Saves lexicons in CereVoice abbreviations format when generating SSML.

Alias string tables

Comma-delimited string table names to define aliases that will be expanded inline. Use wildcards if needed.

Speech enhancements

These properties control how extra text is inserted into the audio in order to clarify the content.

Bullet list item prefix

Text to insert in front of unordered list items.

Numbered list item prefix

Text to insert in front of numbered list items.

Use image alt text

Inserts image alternative text.

Use table descriptions

Inserts table descriptions.

Table row prefix

Text to insert in front of table rows. If this is specified, the table row number will also be read. This will be suppressed if the table’s Role property is set to ‘presentation’.

Table column prefix

Text to insert in front of table columns. If this is specified, the table column number will also be read. This will be suppressed if the table’s Role property is set to ‘presentation’.

Contents | Start | End | Previous: Appendix E: Speech Markup Reference | Next: Appendix G: Alphabet Description Reference