1 |
Version 3.01h mar jun 8 18:32:35 MEST 1999 |
2 |
M M BBBB RRRR OOO L A |
3 |
MM MM B B R R O O L A A |
4 |
M M M M B B R R O O L A A |
5 |
M M M BBBB RRR O O L AAAAAAA |
6 |
M M B B R R O O L A A |
7 |
M M B B R R O O L A A |
8 |
M M BBBBB R R OOO LLLLL A A |
9 |
|
10 |
-------------------------------------------------------------- |
11 |
Table of Contents |
12 |
-------------------------------------------------------------- |
13 |
|
14 |
1.0 License |
15 |
2.0 A brief description of the MBROLA software |
16 |
3.0 Distribution |
17 |
4.0 Installation, and Tests |
18 |
5.0 Format of input and output files - Limitations |
19 |
6.0 Joining the MBROLA project as a user |
20 |
7.0 Joining the MBROLA project as database provider |
21 |
8.0 Acknowledgments |
22 |
9.0 Contacting the author |
23 |
|
24 |
-------------------------------------------------------------- |
25 |
1.0 License |
26 |
-------------------------------------------------------------- |
27 |
|
28 |
This program and object code is being provided to "you", the licensee, |
29 |
by Thierry Dutoit, the "author", under the following license, which |
30 |
applies to any program, object code or other work which contains a |
31 |
notice placed by the copyright holder saying it may be distributed |
32 |
under the terms of this license. The "program", below, refers to any |
33 |
such program, object code or work. |
34 |
|
35 |
By obtaining, using and/or copying this program, you agree that you |
36 |
have read, understood, and will comply with these terms and |
37 |
conditions: |
38 |
|
39 |
Terms and conditions for the distribution of the program |
40 |
-------------------------------------------------------- |
41 |
|
42 |
This program may not be sold or incorporated into any product which is |
43 |
sold without prior permission from the author. |
44 |
|
45 |
When no charge is made, this program may be copied and distributed |
46 |
freely, provided that this notice is copied and distributed with |
47 |
it. Each time you redistribute the program (or any work based on the |
48 |
program), the recipient automatically receives a license from the |
49 |
original licensor to copy or distribute the program subject to these |
50 |
terms and conditions. You may not impose any further restrictions on |
51 |
the recipients' exercise of the rights granted herein. You are not |
52 |
responsible for enforcing compliance by third parties to this License. |
53 |
|
54 |
If you wish to incorporate the program into other free programs whose |
55 |
distribution conditions are different, write to the author to ask for |
56 |
permission. |
57 |
|
58 |
If, as a consequence of a court judgment or allegation of patent |
59 |
infringement or for any other reason (not limited to patent issues), |
60 |
conditions are imposed on you (whether by court order, agreement or |
61 |
otherwise) that contradict the conditions of this license, they do not |
62 |
excuse you from the conditions of this license. If you cannot |
63 |
distribute so as to satisfy simultaneously your obligations under this |
64 |
license and any other pertinent obligations, then as a consequence you |
65 |
may not distribute the program at all. For example, if a patent |
66 |
license would not permit royalty-free redistribution of the program by |
67 |
all those who receive copies directly or indirectly through you, then |
68 |
the only way you could satisfy both it and this license would be to |
69 |
refrain entirely from distribution of the program. |
70 |
|
71 |
Terms and conditions on the use of the program |
72 |
---------------------------------------------- |
73 |
|
74 |
Permission is granted to use this software for non-commercial, |
75 |
non-military purposes, with and only with the voice and language |
76 |
databases made available by the author from the MBROLA project www |
77 |
homepage: |
78 |
|
79 |
http://tcts.fpms.ac.be/synthesis |
80 |
|
81 |
In return, the author asks you to mention the MBROLA reference paper: |
82 |
|
83 |
T. DUTOIT, V. PAGEL, N. PIERRET, F. BATAILLE, O. VAN DER VRECKEN |
84 |
"The MBROLA Project: Towards a Set of High-Quality Speech |
85 |
Synthesizers Free of Use for Non-Commercial Purposes" |
86 |
Proc. ICSLP'96, Philadelphia, vol. 3, pp. 1393-1396. |
87 |
|
88 |
or, for a more general reference to Text-To-Speech synthesis, the |
89 |
book: |
90 |
|
91 |
An Introduction to Text-To-Speech Synthesis, |
92 |
T. DUTOIT, Kluwer Academic Publishers, Dordrecht |
93 |
Hardbound, ISBN 0-7923-4498-7 |
94 |
April 1997, 312 pp. |
95 |
|
96 |
in any scientific publication referring to work for which this program |
97 |
has been used. |
98 |
|
99 |
Disclaimer |
100 |
---------- |
101 |
|
102 |
THIS SOFTWARE CARRIES NO WARRANTY, EXPRESSED OR IMPLIED. THE USER |
103 |
ASSUMES ALL RISKS, KNOWN OR UNKNOWN, DIRECT OR INDIRECT, WHICH INVOLVE |
104 |
THIS SOFTWARE IN ANY WAY. IN PARTICULAR, THE AUTHOR DOES NOT TAKE ANY |
105 |
COMMITMENT IN VIEW OF ANY POSSIBLE THIRD PARTY RIGHTS. |
106 |
|
107 |
-------------------------------------------------------------- |
108 |
2.0 A brief description of MBROLA |
109 |
-------------------------------------------------------------- |
110 |
|
111 |
MBROLA is a speech synthesizer based on the concatenation of |
112 |
diphones. It takes a list of phonemes as input, together with prosodic |
113 |
information (duration of phonemes and a piecewise linear description |
114 |
of pitch), and produces speech samples on 16 bits (linear), at the |
115 |
sampling frequency of the diphone database. |
116 |
|
117 |
It is therefore NOT a Text-To-Speech (TTS) synthesizer, since it does |
118 |
not accept raw text as input. In order to obtain a full TTS system, |
119 |
you need to use this synthesizer in combination with a text processing |
120 |
system that produces phonetic and prosodic commands. |
121 |
|
122 |
We maintain a web page with pointers to such freely available systems: |
123 |
|
124 |
http://tcts.fpms.ac.be/synthesis/mbrtts.html |
125 |
|
126 |
This software is the heart of the MBROLA project, the aim of which is |
127 |
to obtain a set a speech synthesizers for as many languages as |
128 |
possible, free of use for non-commercial applications. |
129 |
|
130 |
The terms of this project can be summarized as follows : |
131 |
|
132 |
After some official agreement between the author of this software and |
133 |
the owner of a diphone database, the database is processed by the |
134 |
author and adapted to the mbrola format, for free. The resulting |
135 |
mbrola diphone database is made available for non-commercial use as |
136 |
part of the MBROLA project. Commercial rights on the mbrola database |
137 |
remain with the database provider, for exclusive use with the mbrola |
138 |
software. |
139 |
|
140 |
The ultimate goal of this project is to boost up academic research on |
141 |
speech synthesis, and particularly on prosody generation, known as one |
142 |
of the biggest challenges taken up by Text-To-Speech synthesizers for |
143 |
the years to come. If you want to provide a database to the mbrola |
144 |
project, write first to mbrola@tcts.fpms.ac.be |
145 |
|
146 |
More details can be found at the MBROLA project homepage : |
147 |
|
148 |
http://tcts.fpms.ac.be/synthesis |
149 |
|
150 |
The synthesizer uses a synthesis method known itself as MBROLA. |
151 |
|
152 |
-------------------------------------------------------------- |
153 |
3.0 Distribution |
154 |
-------------------------------------------------------------- |
155 |
|
156 |
This distribution of mbrola contains the following files : |
157 |
|
158 |
mbrola.exe or mbrola: An executable file of the synthesizer itself |
159 |
(depends on the computer supposed to run it) |
160 |
readme.txt : This file |
161 |
|
162 |
As such, it requires an MBROLA language/voice database to run |
163 |
properly. American English, Arabic, Brazilian Portuguese, Breton, |
164 |
British English, Croatian, Dutch, Estonian, French, German, Greek, |
165 |
Mexican Spanish, Romanian, Spanish and Swedish voices are made |
166 |
available. Additional languages and voices will be available in the |
167 |
context of the MBROLA project. |
168 |
|
169 |
Please consult the MBROLA project homepage to get the voices: |
170 |
|
171 |
http://tcts.fpms.ac.be/synthesis |
172 |
|
173 |
-------------------------------------------------------------- |
174 |
4.0 Installation and Tests |
175 |
-------------------------------------------------------------- |
176 |
|
177 |
The following computers/OS are currently supported : |
178 |
|
179 |
SUN Sparc 5/S5R4 (Solaris2.4) |
180 |
HPUX9.0 and HPUX10.0 |
181 |
VAX/VMS V6.2 (V5.5-2 won't work) |
182 |
DECALPHA(AXP)/VMS 6.2 |
183 |
AlphaStation 200 4/233 |
184 |
AlphaStation 200 4/166 |
185 |
IBM RS6000 Aix 4.12 |
186 |
PC486/DOS6 (but other PCs/DOSs should do, too) |
187 |
PC486/Windows 3.1 |
188 |
PC486/Windows 95 |
189 |
PC-Pentium/Windows 98 |
190 |
PC-Pentium/Windows NT |
191 |
PC/LINUX 1.2.11 |
192 |
PC/LINUX Redhat6.2 |
193 |
PCPentium120/Solaris2.4 |
194 |
OS/2 |
195 |
BeBox |
196 |
BeOs (PPC,i386) |
197 |
Macintosh |
198 |
Sun Ultra1/SuSE Linux 7.0 |
199 |
|
200 |
Please send acknowledgement when mbrola works on a machine not listed |
201 |
here. A special DLL version is distributed for PC/Windows to allow |
202 |
direct audio output; check on the Mbrola site the Mbrolatools package. |
203 |
|
204 |
See the MBROLA Homepage if your computer or OS is not supported yet. |
205 |
|
206 |
Assuming you have copied the right .zip file, create a directory |
207 |
mbrola (although this is not critical), copy the mbrXXX.zip file into |
208 |
it (in which XXX stands for a version number), and unzip the file: |
209 |
|
210 |
unzip mbrXXX.zip (or pkunzip on PC/DOS) |
211 |
|
212 |
You are now ready to synthesize your first words.... |
213 |
|
214 |
First try: mbrola |
215 |
|
216 |
to see the terms and conditions on the use of this software. |
217 |
|
218 |
Then try: mbrola -h |
219 |
|
220 |
to get some help on how to use the software: |
221 |
|
222 |
> USAGE: ./synth [COMMAND LINE OPTIONS] database pho_file+ output_file |
223 |
> |
224 |
>A - instead of pho_file or output_file means stdin or stdout |
225 |
>Extension of output_file ( raw, au, wav, aiff ) tells the wanted audio format |
226 |
> |
227 |
> Options can be any of the following: |
228 |
> -i = display the database information if any |
229 |
> -e = IGNORE fatal errors on unkown diphone |
230 |
> -c CC = set COMMENT char (escape sequence in pho files) |
231 |
> -F FC = set FLUSH command name |
232 |
> -v VR = VOLUME ratio, float ratio applied to ouput samples |
233 |
> -f FR = FREQ ratio, float ratio applied to pitch points |
234 |
> -t TR = TIME ratio, float ratio applied to phone durations |
235 |
> -l VF = VOICE freq, target freq for voice quality |
236 |
> -R RL = Phoneme RENAME list of the form a A b B ... |
237 |
> -C CL = Phoneme CLONE list of the form a A b B ... |
238 |
> |
239 |
> -I IF = Initialization file containing one command per line |
240 |
> CLONE, RENAME, VOICE, TIME, FREQ, VOLUME, FLUSH, COMMENT, |
241 |
> and IGNORE are available |
242 |
|
243 |
Now in order to go further, you need to get a version of an MBROLA |
244 |
language/voice database from the MBROLA project homepage. Let us |
245 |
assume you have copied the FR1 database and referred to |
246 |
the accompanying fr1.txt file for its installation. |
247 |
|
248 |
Then try: mbrola fr1/fr1 fr1/TEST/bonjour.pho bonjour.wav |
249 |
|
250 |
it uses the format: |
251 |
|
252 |
mbrola diphone_database command_file1 command_file2 ... output_file |
253 |
|
254 |
and creates a sound file for the word 'bonjour' ( Hello !). |
255 |
|
256 |
Basically the output file is composed of signed integer numbers on 16 |
257 |
bits, corresponding to samples at the sampling frequency of the MBROLA |
258 |
voice/language database (16 kHz for the diphone database supplied by |
259 |
the author of MBROLA : Fr1). MBROLA can produce different audio file |
260 |
formats: .au, .wav, .aiff, .aif, and .raw files depending on the |
261 |
ouput_file extension. If the extension is not recognized, the format |
262 |
is RAW (no header). We recommand .wav for Windows, and .au for Unix |
263 |
platforms. |
264 |
|
265 |
To display information about the phoneme set used by the database, |
266 |
type: |
267 |
mbrola -i fr1/fr1 |
268 |
|
269 |
It displays the phonetic alphabet as well as copyright information |
270 |
about the database. |
271 |
|
272 |
Option -e makes Mbrola ignore wrong or missing diphone sequences |
273 |
(replaced by silence) which can be quite useful when debugging your |
274 |
TTS. Equivallent to "IGNORE" directive in the initialization file (N.B |
275 |
replace the obsolete ;;E=OFF , unsupported in .pho file). |
276 |
|
277 |
Optional parameters let you shorten or lengthen synthetic speech and |
278 |
transpose it by providing optional time and frequency ratios: |
279 |
|
280 |
mbrola -t 1.2 -f 0.8 fr1/fr1 TEST/bonjour.pho bonjour.wav |
281 |
|
282 |
or its equivalent in the initialization file: |
283 |
|
284 |
TIME 1.2 |
285 |
FREQ 0.8 |
286 |
|
287 |
for instance, will result in a RIFF Wav file bonjour.wav 1.2 times |
288 |
longer than the previous one (slower rate), and containing speech in |
289 |
which all fundamental frequency values have been multiplied by 0.8 |
290 |
(sounds lower). |
291 |
|
292 |
You can also set the values of these coefficients directly in a .pho |
293 |
file by adding special escape sequences like : |
294 |
|
295 |
;; F=0.8 |
296 |
;; T=1.2 |
297 |
|
298 |
You can change the voice characteristics with the -l parameter. If the |
299 |
sampling rate of your database is 16000, indicating -l 18000 allows |
300 |
you to shorten the vocal tract by a ratio of 16/18 (children voice, or |
301 |
women voice depending on the voice you're working on). With -l |
302 |
10000, you can lengthen the vocal tract by a ratio of 16/10 (namely |
303 |
the voice of a Troll). The same command in an initialization file |
304 |
becomes "VOICE 10000". |
305 |
|
306 |
Option "-v" specifies a VolumeRatio which multiplies each output |
307 |
sample. In the example below, each sample is multipliead by 0.7 (the |
308 |
loudness goes down). Warning: setting VolumeRatio too high generates |
309 |
saturation. |
310 |
|
311 |
mbrola -v 0.7 fr1/fr1 TEST/bonjour.pho bonjour.wav |
312 |
|
313 |
or add "VOLUME 0.7" in an initialization file |
314 |
|
315 |
The -c option lets you specify which symbol will be used as an escape |
316 |
sequence for comments and commands in .pho files. The default value is |
317 |
the semi-colon ';', but you may want to change this if your phonetic |
318 |
alphabet uses this symbol, like in: |
319 |
|
320 |
mbrola -c ! fr1/fr1 TEST/test1.pho test2.pho test.wav |
321 |
|
322 |
equivalent to "COMMENT !" in an initialization file |
323 |
|
324 |
The -F option lets you specify which symbol will be used to Flush the |
325 |
audio output. The default value is #, you may want to change the |
326 |
symbol like in: |
327 |
|
328 |
mbrola -F FLUSH_COMMAND fr1/fr1 test.pho test.wav |
329 |
|
330 |
equivalent to "FLUSH FLUSH_COMMAND" in the initialization file. |
331 |
|
332 |
|
333 |
Using Pipes |
334 |
----------- |
335 |
|
336 |
A - instead of command_file or output_file means stdin or stdout. On |
337 |
multitasking machines, it is easy to run the synthesizer in real time |
338 |
to obtain audio output from the audio device, by using pipes. |
339 |
|
340 |
Renaming and Cloning phonemes |
341 |
----------------------------- |
342 |
|
343 |
It may happen that the language processing module connected to MBROLA |
344 |
doesn't use the same phonemic alphabet as the voice used. The Renaming |
345 |
and Cloning mechanisms help you to quickly solve such problems |
346 |
(without adding extra CPU load). The only limitation about phoneme |
347 |
names is that they can't contain blank characters. |
348 |
|
349 |
If, for instance, phoneme "a" in the mbrola voice you use is called |
350 |
"my_a" in your alphabet, and phoneme "b" is called "my_b", then the |
351 |
following command solves the problem: |
352 |
|
353 |
mbrola -R "a my_a b my_b" fr1/fr1 test.pho test.wav |
354 |
|
355 |
You can give as many renaming pairs as you want. Circular definition |
356 |
are not a problem -> "a b b c" will rename original [a] into [b] and |
357 |
original [b] into [c] independantly ([a] won't be renamed to [c]). |
358 |
|
359 |
LIMITATION: you can't rename a phoneme into another that already |
360 |
exists. |
361 |
|
362 |
The cloning mechanism does exactly the same thing, though the old |
363 |
phoneme still exists after renaming. This is usefull if you have 2 |
364 |
allophones in your alphabet, but the Mbrola voice only provides one. |
365 |
|
366 |
Imagine for instance, that you make the disctinction between the |
367 |
voiced [r] and its unvoiced counterpart [r0] and that you are using a |
368 |
syllabic version [r=]. If as a first approximation using [r] for both |
369 |
is OK, then you may use an Mbrola voice that only provides one version |
370 |
of [r] by running: |
371 |
|
372 |
mbrola -C "r r0 r r=" fr1/fr1 test.pho test.wav |
373 |
|
374 |
which tells the synthesizer that [r0] and [r=] should be both |
375 |
synthesized as [r]. You can write a long cloning list of phoneme |
376 |
pairs to fit your needs. |
377 |
|
378 |
Renaming and cloning eats CPU since the complete diphone hash table |
379 |
has to be rebuilt, but once the renaming or cloning has occurred there |
380 |
is absolutely NO RELATED PERFORMANCE DROP. So using this feature |
381 |
is more efficient than a pre-processor, though incompatibilities |
382 |
cannot always be solved by a simple phoneme mapping. |
383 |
|
384 |
Before renaming anything as #, check paragraph 5.4 |
385 |
|
386 |
When you have long cloning and renaming lists, you can conveniently |
387 |
write them into an initialization file according to the following |
388 |
format: |
389 |
|
390 |
RENAME a my_a |
391 |
RENAME b my_b |
392 |
CLONE r r0 |
393 |
CLONE r r= |
394 |
|
395 |
The obsolete ";; RENAME a my_a" can't be used in .pho file anymore, |
396 |
but is correctly parsed in initialization files. |
397 |
|
398 |
Note to Festival and EN1 users: the consequence of the change above is |
399 |
that you must change the previous call format "mbrola en1 en1mrpa ..." |
400 |
into "mbrola -I en1mrpa en1 ...". |
401 |
|
402 |
|
403 |
BELOW ARE A NUMBER OF MACHINE DEPENDANT HINTS FOR BEST USING MBROLA |
404 |
|
405 |
On MSDOS/Windows or OS/2 |
406 |
------------------------ |
407 |
|
408 |
Type: mbrola fr1/fr1 TEST/bonjour.pho bonjour.wav |
409 |
|
410 |
Then you can play the RIFF Wav file with windows sound utility On OS/2 |
411 |
pipes may be used just like below. |
412 |
|
413 |
REMARK: MbrolaTools provide an excellent DLL and graphical pho player |
414 |
called Mbroli. We advise you to use them instead of mbrola.exe for |
415 |
Windows. |
416 |
|
417 |
On modern Unix systems such as Solaris or HPUX or Linux |
418 |
------------------------------------------------------- |
419 |
|
420 |
mbrola fr1/fr1 TEST/bonjour.pho -.au | audioplay |
421 |
|
422 |
where audioplay is your audio file player (* the name vary with the |
423 |
platform, e.g. splayer for HPUX *) |
424 |
|
425 |
If your audioplayer has problems with sun .AU files, try with .raw |
426 |
Never use .wav format when you pipe the ouput (mbrola can't rewind the |
427 |
file to write the audio size in the header). Wav format was not |
428 |
developped for Unix (on the contrary Au format let you specify in the |
429 |
header "we're on a pipe, read until end of file"). |
430 |
|
431 |
NOTE FOR LINUX: you can use the GPL rawplay program provided at |
432 |
ftp://tcts.fpms.ac.be/pub/mbrola/pclinux/ |
433 |
|
434 |
On Sun4 or with machines with an old audio interface |
435 |
----------------------------------------------------- |
436 |
|
437 |
Those machines are now quite old and only provide a mulaw 8Khz |
438 |
output. A hack is: |
439 |
|
440 |
mbrola fr1/fr1 input.pho - | sox -t raw -sw -r 16000 - -t raw -Ub -r 8000 - > /dev/audio |
441 |
|
442 |
(providing you have the public domain sox utility developed by Ircam). |
443 |
You should hear 'bonjour' without the need to create intermediate |
444 |
files. Note that we strongly recommend that you DON'T use SOX, since |
445 |
its resampling method (linear interpolation) will permanently damage |
446 |
the sound. |
447 |
|
448 |
Other solution: The UTILITY.ZIP file available from the MBROLA |
449 |
homepage provides RAW2SUN which does this conversion. |
450 |
|
451 |
On VAX or AXP workstations |
452 |
-------------------------- |
453 |
|
454 |
To make it easier for users to find MBROLA, you should add the |
455 |
following command to your system startup procedure: |
456 |
|
457 |
$ DEFINE/SYSTEM/EXEC MBROLA_DIR disk:[dir] |
458 |
|
459 |
where "disk:[dir]" is the name of the directory you created for the |
460 |
MBROLA_DIR files. You could also add the following command to your |
461 |
system login command procedure: |
462 |
|
463 |
$ MBROLA :== $MBROLA_DIR:MBROLA.EXE |
464 |
$ RAW2SUN :== $MBROLA_DIR:RAW2SUN.EXE |
465 |
|
466 |
to use the decsound device: |
467 |
|
468 |
$ MCR DECSOUND - volume 40 -play sound.au |
469 |
|
470 |
See also the MBR_OLA.COM batch file in the UTILITY.ZIP file available |
471 |
from the MBROLA Homepage if you cannot play 16 bits sound files on |
472 |
your machine. |
473 |
|
474 |
-------------------------------------------------------------- |
475 |
5.0 Format of input and output files - Limitations |
476 |
-------------------------------------------------------------- |
477 |
|
478 |
5.1 Phoneme commands |
479 |
-------------------- |
480 |
|
481 |
The input file bonjour.pho in the above example simply contains : |
482 |
|
483 |
; bonjour |
484 |
_ 51 25 114 |
485 |
b 62 |
486 |
o~ 127 48 170.42 |
487 |
Z 110 53.5 116 |
488 |
u 211 |
489 |
R 150 50 91 |
490 |
_ 91 |
491 |
|
492 |
This shows the format of the input data required by MBROLA. Each line |
493 |
contains a phoneme name, a duration (in ms), and a series (possibly |
494 |
none) of pitch targets composed of two float numbers each : the |
495 |
position of the pitch target within the phoneme (in % of its |
496 |
total duration), and the pitch value (in Hz) at this position. |
497 |
|
498 |
In order to increase readability, it is also possible to enclose pitch |
499 |
target in parentheses. Hence, the first line of bonjour.pho could |
500 |
be written : |
501 |
|
502 |
_ 51 (25,114) |
503 |
|
504 |
it tells the synthesizer to produce a silence of 51 ms, and to put a |
505 |
pitch target of 114 Hz at 25% of 51 ms. Pitch targets define a |
506 |
piecewise linear pitch curve. Notice that the intonation curve they |
507 |
define is continuous, since the program automatically drops pitch |
508 |
information when synthesizing unvoiced phones. |
509 |
|
510 |
The data on each line is separated by blank characters or tabs. |
511 |
Comments can optionally be introduced in command files, starting with |
512 |
a semi-colon ';'. This default can be overrun with the -c option |
513 |
of the command line. |
514 |
|
515 |
Another special escape sequence ';;' allows the user to introduce |
516 |
commands in the middle of .pho files as described below. This escape |
517 |
sequence is also affected by the -c option. |
518 |
|
519 |
5.2 Changing the Freq Ratio or Time Ratio |
520 |
----------------------------------------- |
521 |
|
522 |
A command escape sequence containing a line like "T=xx" modifies the |
523 |
time ratio to xx, the same result is obtained on the fundamental |
524 |
frequency by replacing T with F, like in: |
525 |
|
526 |
;; T = 1.2 |
527 |
;;F=0.8 |
528 |
|
529 |
|
530 |
5.3 Flush the output stream |
531 |
--------------------------- |
532 |
|
533 |
Note, finally, that the synthesizer outputs chunks of synthetic speech |
534 |
determined as sections of the piecewise linear pitch curve. Phones |
535 |
inside a section of this curve are synthesized in one go. The last |
536 |
phone of each chunk, however, cannot be properly synthesized while the |
537 |
next phone is not known (since the program uses diphones as base |
538 |
speech units). When using mbrola with pipes, this may be a |
539 |
problem. Imagine, for instance, that mbrola is used to create a |
540 |
pipe-based speaking clock on an HP: |
541 |
|
542 |
speaking_clock | mbrola - -.au | splayer |
543 |
|
544 |
which tells the time, say, every 30 seconds. The last phone of each |
545 |
time announcement will only be synthesized when the next announcement |
546 |
starts. To bypass this problem, mbrola accepts a special command |
547 |
phone, which flushes the synthesis buffer : "#" |
548 |
|
549 |
This default character can be replaced by another symbol thanks to the |
550 |
command: |
551 |
|
552 |
;; FLUSH new_flush_symbol |
553 |
|
554 |
Another important issue with piping under UNIX, is the possibility to |
555 |
prematurely end the audio output, if for example the user presses the |
556 |
stop button of your application. Since release 3.01, Mbrola handles |
557 |
signals. |
558 |
|
559 |
If in the previous example the user wants to interrupt the speaking |
560 |
clock message, the application just needs to send the USR1 signal. You |
561 |
can send such a signal from the console with: |
562 |
|
563 |
kill -SIGUSR1 mbrola_process_number |
564 |
|
565 |
Once mbrola catches the signal, it reads its input stream until it |
566 |
gets EOF or a FLUSH command (hence, surrounding sections with flush is |
567 |
a good habit). |
568 |
|
569 |
Limitations of the program |
570 |
-------------------------- |
571 |
|
572 |
Phones can be synthesized with a maximum duration which depends on |
573 |
the fundamental frequency with which they are produced. The higher the |
574 |
frequency, the lower the duration. For a frequency of 133 Hz, the |
575 |
maximum duration is 7.5 sec. For a frequency of 66.5 Hz, it is 15 sec. |
576 |
For a frequency of 266 Hz, it is 3.75 sec. |
577 |
|
578 |
-------------------------------------------------------------- |
579 |
6.0 Joining the MBROLA project as a user |
580 |
-------------------------------------------------------------- |
581 |
|
582 |
For convenience, we have defined two mailing lists : |
583 |
|
584 |
* mbrola-interest@tcts.fpms.ac.be : a forum for MBROLA questions and |
585 |
issues. It is used by the maintainers of the mbrola project to |
586 |
announce new releases, bug fixes, new voices and languages, and other |
587 |
information of interest to all MBROLA users. Users who want to share |
588 |
.pho files or free applications running on top of mbrola should send |
589 |
mail to mbrola-interest. |
590 |
|
591 |
It is your interest, as a user, to subscribe to the mbrola-interest |
592 |
mailing list, by sending an e-mail to : |
593 |
|
594 |
mbrola-interest-request@tcts.fpms.ac.be |
595 |
|
596 |
with the word 'subscribe' in either the header or the main text. To |
597 |
unsubscribe, just send another mail with 'unsubscribe'. |
598 |
|
599 |
BUGS |
600 |
---- |
601 |
|
602 |
If you detect a bug, or if you find an input for which the quality of |
603 |
the speech provided by mbrola is not as good as usual, first consult |
604 |
the FAQ file from the MBROLA Project homepage, which will be |
605 |
frequently updated. |
606 |
|
607 |
If this is of no help, send a kind mail to mbrola@tcts.fpms.ac.be in |
608 |
which you include the .pho file with which the problem appears and |
609 |
mention your machine architecture. |
610 |
|
611 |
NEW DATABASES |
612 |
------------- |
613 |
|
614 |
If you want to participate to the mbrola project by providing a |
615 |
diphone database (i.e. a set of sample files with one example of each |
616 |
diphone in your language), refer to the mbrola WWW homepage, or send |
617 |
an email to: mbrola@tcts.fpms.ac.be. |
618 |
|
619 |
APPLICATIONS |
620 |
------------ |
621 |
|
622 |
If you have used mbrola to build speaking apps on top of it (like |
623 |
talking clocks, talking agendas, talking tools for handicapped |
624 |
persons, etc., and want to make it available to the community (for |
625 |
free, of course, and for non-commercial, non-military applications, as |
626 |
imposed by the mbrola license agreement), just make an announcement to |
627 |
the mbrola mailing list: |
628 |
|
629 |
mbrola-interest@tcts.fpms.ac.be. |
630 |
|
631 |
COMMERCIAL VERSION |
632 |
------------------ |
633 |
|
634 |
If you are interested in the commercial version of mbrola (source code |
635 |
available), send an email to: mbrola@tcts.fpms.ac.be |
636 |
|
637 |
FEEDBACK |
638 |
-------- |
639 |
|
640 |
If you simply find this initiative useful, please drop us a note at |
641 |
mbrola@tcts.fpms.ac.be. We have spent a lot of our time to provide you |
642 |
with this program, and we would like to get some feedback in return. |
643 |
|
644 |
Don't forget, either, to mention the MBROLA reference paper : |
645 |
|
646 |
T. DUTOIT, V. PAGEL, N. PIERRET, F. BATAILLE, O. VAN DER VRECKEN |
647 |
"The MBROLA Project: Towards a Set of High-Quality Speech |
648 |
Synthesizers Free of Use for Non-Commercial Purposes" |
649 |
Proc. ICSLP 96, Philadelphia, vol. 3, pp. 1393-1396 |
650 |
|
651 |
or, for a more general reference to Text-To-Speech synthesis, the |
652 |
book: |
653 |
|
654 |
An Introduction to Text-To-Speech Synthesis, |
655 |
T. DUTOIT, Kluwer Academic Publishers, Dordrecht |
656 |
Hardbound, ISBN 0-7923-4498-7 |
657 |
April 1997, 312 pp. |
658 |
|
659 |
in any scientific publication referring to work for which this program |
660 |
has been used. |
661 |
|
662 |
-------------------------------------------------------------- |
663 |
7.0 Joining the MBROLA project as a database provider |
664 |
-------------------------------------------------------------- |
665 |
|
666 |
One of the biggest interests of the MBROLA project (and definitely its |
667 |
most original aspect) lies in its ability to provide an ever growing |
668 |
set of languages/voices to users. |
669 |
|
670 |
To achieve this goal, the MBROLA project has itself been organized so |
671 |
as to incite other research labs or companies to share their diphone |
672 |
databases. |
673 |
|
674 |
The terms of this sharing policy can be summarized as follows : |
675 |
|
676 |
1. We shall only use your database to adapt it to the mbrola format, |
677 |
and destroy the copy when this is done. |
678 |
|
679 |
2. The resulting mbrola diphone database will be copyright Faculte |
680 |
Polytechnique de Mons. Non-commercial use of the database in the |
681 |
framework of the MBROLA project will be automatically granted to |
682 |
Internet users. In return, we shall send you a license agreement which |
683 |
will transfer all our commercial rights on the database to you, |
684 |
provided the database is used with and only with the MBROLA program. |
685 |
|
686 |
3. All these details will be fixed by some official agreement before |
687 |
you send us anything. |
688 |
|
689 |
If you want to create a database from scratch |
690 |
--------------------------------------------- |
691 |
|
692 |
First, you should be aware that recording a diphone database is not a |
693 |
trivial operation. If it is not performed carefully, the result can be |
694 |
deceiving. FR1, for instance, required about one month of work, yet |
695 |
with the help of some efficient laboratory tools for signal recording |
696 |
and editing. What is more, some phonetic knowledge of the targeted |
697 |
language is necessary to create the initial corpus. |
698 |
|
699 |
So if you just think of designing a new diphone database as a game, |
700 |
forget it. |
701 |
|
702 |
If, on the contrary, you are willing to spend some time to provide the |
703 |
MBROLA community with a new language or voice, or if you already have |
704 |
a diphone database and wish to share it in mbrola format (and receive |
705 |
in return the rights for any commercial exploitation of the mbrola |
706 |
diphone database we will create for you), welcome here. |
707 |
|
708 |
If you want to build a new diphone database, please contact the author |
709 |
first. He will help you as much as he can, by providing phonetic |
710 |
information if available for instance. |
711 |
|
712 |
In all cases, make a first dummy trial : create a small corpus for a |
713 |
few diphones, record them, segment them, equalize them if you can, and |
714 |
send the result directly to the author. He will test your data, tell |
715 |
you how good it is, and what should be done to make it better. |
716 |
|
717 |
If you want to share an existing database |
718 |
----------------------------------------- |
719 |
|
720 |
contact the authors (see below). |
721 |
|
722 |
-------------------------------------------------------------- |
723 |
8.0 Acknowledgments |
724 |
-------------------------------------------------------------- |
725 |
|
726 |
I would like to thank Vincent Pagel (Mons / BE) for his intensive |
727 |
programming, testing, and debugging of this program, and for all sorts |
728 |
of fruitful discussions. Vincent also wrote MBRDICO a general purpose |
729 |
trainable phonetizer. Not to forget Nicolas Pierret and Olivier Van |
730 |
der Vreken, for their contribution to the Mbrola coder. |
731 |
|
732 |
Then let's greet our pioneer database providers: |
733 |
Alejandro Barbosa (MX1), |
734 |
Aggelos Bletsas (GR1), |
735 |
Marian Boldea (RO1), |
736 |
Gösta Bruce (SW1), |
737 |
Alistair Conkie (EN1 ES1), |
738 |
Denis Costa (BR1), |
739 |
Arthur Dirksen (NL1 NL2), |
740 |
Thierry Dutoit (FR1), |
741 |
Céline Egéa (FR2), |
742 |
Fred Englert (DE1 DE2), |
743 |
Nikolaj Lazic (CR1), |
744 |
Mike Macon (US3 MX1), |
745 |
Einar Meister (EE1), |
746 |
Yann-Ber Messager (BZ1), |
747 |
Vincent Pagel (FR3 FR4), |
748 |
Marcus Philipson (SW1), |
749 |
George Sergiadis (GR1), |
750 |
Nawfal Tounsi (AR1), |
751 |
Raymond Veldhuis (NL3), |
752 |
Gordon Tischer (ES2), |
753 |
Johan Wouters (US3) |
754 |
and the team at University Autonoma of Barcelona (ES1)! |
755 |
|
756 |
May they be thanked for their work. |
757 |
|
758 |
Sam Przyswa (NEXT Paris/FR), Fred Englert (IBMRS600 Frankfurt/DE), |
759 |
Arnaud Gaudinat (VAX-VMS University of Geneva, CH), Cyrille |
760 |
Mastchenko (BeOS Montreal/CA), Michael C. Thornburgh (SCO-Unix USA), |
761 |
Bruno Langlois (Java port Quebec/CA), Christophe M. Vallat (OS2 |
762 |
Domerat/FR), Cristiano Verondini (Mac Bologna/Italy), Gerald Kerma |
763 |
(Mac G'K2 Vaugrigneuse/FR), David Woodman (SUN4 Berkshire/England) |
764 |
Gary Thomas (Linux-PPC Grenoble/France), Thomas Fletcher (QNX-OS CA), |
765 |
Philippe Devallois(Mac DLL),Thomas Agopian (BeOs), Stephen Isard(Linux |
766 |
PC Redhat6.2), Matthias Nutt (Ultra 1)for their help in the compilation |
767 |
of MBROLA on many platforms. |
768 |
|
769 |
Arnaud Gaudinat (Lausanne/CH), Thierry Gartiser (Nancy/France), Alec |
770 |
Epting (Summer Institure of Linguistics/USA), Michael M. Cohen |
771 |
(University of California - Santa Cruz), and Patrick Bouffer (France) |
772 |
have arranged mirror sites. |
773 |
|
774 |
David Haubensack has written a French TTS in PERL, Stephen Isard and |
775 |
Alistair Conkie have provided the Freespeech British English TTS!! |
776 |
Alan Black and Paul Taylor have supported the Mbrola Project in their |
777 |
great Festival multilingual TTS Project. |
778 |
|
779 |
Fabrice Malfrere (Mons/BE) who has developped an efficient speech |
780 |
alignment program for Windows (distributed on the mbrola site). |
781 |
|
782 |
Alain Ruelle (Mons/BE) who has developped the MBRPlay dll and the |
783 |
Mbroli interactive pho file player for Windows. |
784 |
|
785 |
Nawfal Tounsi(Mons/BE) who has developped the W project aimed at |
786 |
helping disable people talk with the help of Mbrola. |
787 |
|
788 |
Last but not least, I am also greatly indebted to Francois Bataille |
789 |
(Mons/BE) for having supported the creation of this internet project. |
790 |
|
791 |
-------------------------------------------------------------- |
792 |
9.0 Contacting the author |
793 |
-------------------------------------------------------------- |
794 |
|
795 |
Dr Thierry Dutoit |
796 |
|
797 |
Faculte Polytechnique de Mons, TCTS Lab, |
798 |
31, bvd Dolez, B-7000 Mons, Belgium. |
799 |
tel : /32/65/374133 |
800 |
fax : /32/65/374129 |
801 |
e-mail: mbrola@tcts.fpms.ac.be, for general information, |
802 |
questions on the installation of software and databases. |
803 |
|