Full metadata record
DC FieldValueLanguage
dc.contributor.authorCHEN, SHen_US
dc.contributor.authorCHANG, Sen_US
dc.contributor.authorLEE, SMen_US
dc.date.accessioned2014-12-08T15:04:51Z-
dc.date.available2014-12-08T15:04:51Z-
dc.date.issued1992-07-01en_US
dc.identifier.issn0001-4966en_US
dc.identifier.urihttp://hdl.handle.net/11536/3355-
dc.description.abstractA novel method based on a statistical model for the fundamental-frequency (F0) synthesis in Mandarin text-to-speech is proposed. Specifically, a statistical model is employed to determine the relationship between F0 contour patterns of syllables and linguistic features representing the context. Parameters of the model were empirically estimated from a large training set of sentential utterances. Phonologic rules are then automatically deduced through the training process and implicitly memorized in the model. In the synthesis process, contextual features are extracted from a given input text, and the best estimates of F0 contour patterns of syllable are then found by a Viterbi algorithm using the well-trained model. This method can be regarded as employing a stochastic grammar to reduce the number of candidates of F0 contour pattern at each decision point of synthesis. Although linguistic features on various levels of input text can be incorporated into the model, only some relevant contextual features extracted from neighboring syllables were used in this study. Performance of this method was examined by simulation using a database composed of nine repetitions of 112 declarative sentential utterances of the same text, all spoken by a single speaker. By closely examining the well-trained model, some evidence was found to show that the declination effect as well as several sandhi rules are implicitly contained in the model. Experimental results show that 77.56% of synthesized F0 contours coincide with the VQ-quantized counterpart of the original natural speech. Naturalness of the synthesized speech was confirmed by an informal listening test.en_US
dc.language.isoen_USen_US
dc.titleA STATISTICAL-MODEL BASED FUNDAMENTAL-FREQUENCY SYNTHESIZER FOR MANDARINE SPEECHen_US
dc.typeArticleen_US
dc.identifier.journalJOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICAen_US
dc.citation.volume92en_US
dc.citation.issue1en_US
dc.citation.spage114en_US
dc.citation.epage120en_US
dc.contributor.department電信工程研究所zh_TW
dc.contributor.department電信研究中心zh_TW
dc.contributor.departmentInstitute of Communications Engineeringen_US
dc.contributor.departmentCenter for Telecommunications Researchen_US
dc.identifier.wosnumberWOS:A1992JD13400009-
dc.citation.woscount10-
Appears in Collections:Articles


Files in This Item:

  1. A1992JD13400009.pdf