Perceptual Error Analysis of Human and Synthesized Voices
Data
2017
Tipo
Artigo
Título da Revista
ISSN da Revista
Título de Volume
Resumo
Objective/ Hypothesis. To assess the quality of synthesized voices through listeners' skills in discriminating human and synthesized voices. Study Design. Prospective study. Methods. Eighteen human voices with different types and degrees of deviation (roughness, breathiness, and strain, with three degrees of deviation: mild, moderate, and severe) were selected by three voice specialists. Synthesized samples with the same deviations of human voices were produced by the VoiceSim system. The manipulated parameters were vocal frequency perturbation (roughness), additive noise (breathiness), increasing tension, subglottal pressure, and decreasing vocal folds separation (strain). Two hundred sixty-nine listeners were divided in three groups: voice specialist speech language pathologists (V-SLPs), general clinician SLPs (G-SLPs), and naive listeners (NLs). The SLP listeners also indicated the type and degree of deviation. Results. The listeners misclassified 39.3% of the voices, both synthesized (42.3%) and human (36.4%) samples (P = 0.001). V-SLPs presented the lowest error percentage considering the voice nature (34.6%)
G-SLPs and NLs identified almost half of the synthesized samples as human (46.9%, 45.6%). The male voices were more susceptible for misidentification. The synthesized breathy samples generated a greater perceptual confusion. The samples with severe deviation seemed to be more susceptible for errors. The synthesized female deviations were correctly classified. The male breathiness and strain were identified as roughness. Conclusion. VoiceSim produced stimuli very similar to the voices of patients with dysphonia. V-SLPs had a better ability to classify human and synthesized voices. VoiceSim is better to simulate vocal breathiness and female deviations
the male samples need adjustment.
G-SLPs and NLs identified almost half of the synthesized samples as human (46.9%, 45.6%). The male voices were more susceptible for misidentification. The synthesized breathy samples generated a greater perceptual confusion. The samples with severe deviation seemed to be more susceptible for errors. The synthesized female deviations were correctly classified. The male breathiness and strain were identified as roughness. Conclusion. VoiceSim produced stimuli very similar to the voices of patients with dysphonia. V-SLPs had a better ability to classify human and synthesized voices. VoiceSim is better to simulate vocal breathiness and female deviations
the male samples need adjustment.
Descrição
Citação
Journal Of Voice. New York, v. 31, n. 4, p. -, 2017.