Contents | Start | End | Previous: Appendix E: Speech Markup Reference | Next: Appendix G: Alphabet Description Reference
A speech profile is a collection of properties related to generating speech audio via a text-to-speech engine; there is always one default profile, and you can create further profiles and associate them with configurations via the Speech profile configuration option.
Speech profiles determine the operation of immediate narration, previewing of editor content, and creation of audio files.
These properties specify the output and intermediate source formats to be used.
Generate speech WAV files
Enables or disables WAV (or AIFF on Mac) audio file generation for speech output. In fact, these audio files are always generated, but will be deleted afterwards if this option is disabled.
Generate speech MP3 files
Enables or disables MP3 audio file generation for speech output.
Speech MP3 bit rate
The MP3 bit rate in kbs. 128kbs is the default bit rate, but for speech audio you can reduce it to 48kbs without significant loss of quality and your files will be smaller. A value of (auto) uses the value specified in Preferences.
Speech source format
The format that Jutoh submits to the text-to-speech system. A value of (auto) uses the best method for the chosen speech engine.
Keep source files
Keeps the generated source files after audio file generation.
These properties control if and how speech archives should be generated. Speech archives allow customers to create speech audio on their own computers.
Generate speech archive
Generates a speech archive (.sparch) file for distributing the speech source files.
Generate portable archive
If checked, creates an archive that can be used to generate speech on any platform supported by Jutoh. If this option is cleared, only the selected speech format will be generated, and the speech archive may be platform-dependent.
These properties control important aspects of speech output such as the text-to-speech engine to be used and the initial voice.
Speech engine
The speech engine to use. A value of (auto) uses the value specified in Preferences.
Speech voice
The voice to use. A value of (auto) uses the value specified in Preferences.
Speech voice variant
The voice variant. A value of (auto) uses the value specified in Preferences.
Speech speed
The speech speed expressed as a percentage. A value of (auto) uses the value specified in Preferences.
Speech volume
The speech volume expressed as a percentage. A value of (auto) uses the value specified in Preferences. This option is ignored for Apple Speech Manager when generating files.
Speech pitch
The speech pitch expressed as a percentage. A value of (auto) uses the value specified in Preferences. This option is ignored for Apple Speech Manager when generating files.
These properties control various behaviours, such as highlighting text during narration of editor content.
Highlight text
Highlights text as it is being read, where supported by the speech engine. Note that the editor undo history is cleared before and after narration if highlighting is enabled.
Highlight background colour
The highlight background colour.
Paragraph pause duration
The after-paragraph pause duration in milliseconds. This is for SAPI only since it does not have a paragraph construct. The default is 500.
Emulation
Specifies which XML tags to emulate by transforming text, to work around weaknesses in speech engines. Specify (all) to perform all relevant emulation, (none) to perform no emulation, or a comma-separated list of keywords. Available keywords are say-as, say-as.characters, say-as.digits, say-as.telephone.
These properties relate to how lexicons are used during speech output.
Lexicon tags
Comma-delimited tags to match lexicons that should be included. If no tags are specified, all lexicons match.
Lexicon alphabets
Comma-delimited alphabet(s) to use in generated lexicons or inline pronunciations. Use wildcards if needed.
Inline pronunciations
Replaces words in the speech source files from lexicons, using specified phonemes or aliases (‘sounds-like’ pronunciations). This can be done in addition to generating lexicons if necessary. Please note that for text-to-speech, currently Jutoh only supports inline pronunciations, and does not load generated lexicon files.
PLS lexicons
Saves lexicons in PLS lexicon format when generating SSML or Epub 3.
CereVoice lexicon
Saves lexicons in CereVoice lexicon format when generating SSML.
CereVoice abbreviations
Saves lexicons in CereVoice abbreviations format when generating SSML.
These properties control how extra text is inserted into the audio in order to clarify the content.
Bullet list item prefix
Text to insert in front of unordered list items.
Numbered list item prefix
Text to insert in front of numbered list items.
Use image alt text
Inserts image alternative text.
Use table descriptions
Inserts table descriptions.
Table row prefix
Text to insert in front of table rows. If this is specified, the table row number will also be read. This will be suppressed if the table’s Role property is set to ‘presentation’.
Table column prefix
Text to insert in front of table columns. If this is specified, the table column number will also be read. This will be suppressed if the table’s Role property is set to ‘presentation’.
Contents | Start | End | Previous: Appendix E: Speech Markup Reference | Next: Appendix G: Alphabet Description Reference