Adapting and controlling DNN-based speech synthesis using input codes HT Luong, S Takaki, GE Henter, J Yamagishi 2017 IEEE International Conference on Acoustics, Speech and Signal …, 2017 | 101 | 2017 |
Wasserstein GAN and waveform loss-based acoustic model training for multi-speaker text-to-speech synthesis systems using a WaveNet vocoder Y Zhao, S Takaki, HT Luong, J Yamagishi, D Saito, N Minematsu IEEE access 6, 60478-60488, 2018 | 69 | 2018 |
Nautilus: a versatile voice cloning system HT Luong, J Yamagishi IEEE/ACM Transactions on Audio, Speech, and Language Processing 28, 2967-2981, 2020 | 49 | 2020 |
A non-expert Kaldi recipe for Vietnamese speech recognition system HT Luong, HQ Vu Proceedings of the Third International Workshop on Worldwide Language …, 2016 | 36 | 2016 |
Training Multi-Speaker Neural Text-to-Speech Systems Using Speaker-Imbalanced Speech Corpora. HT Luong, X Wang, J Yamagishi, N Nishizawa INTERSPEECH, 1303-1307, 2019 | 27 | 2019 |
Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech HT Luong, J Yamagishi 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2019 | 24 | 2019 |
Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects HT Luong, X Wang, J Yamagishi, N Nishizawa arXiv preprint arXiv:1808.00665, 2018 | 23 | 2018 |
Multimodal speech synthesis architecture for unsupervised speaker adaptation HT Luong, J Yamagishi arXiv preprint arXiv:1808.06288, 2018 | 11 | 2018 |
Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems HT Luong, J Yamagishi 2018 IEEE Spoken Language Technology Workshop (SLT), 610-617, 2018 | 10 | 2018 |
A Unified Speaker Adaptation Method for Speech Synthesis using Transcribed and Untranscribed Speech with Backpropagation HT Luong, J Yamagishi arXiv preprint arXiv:1906.07414, 2019 | 9 | 2019 |
LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example HT Luong, J Yamagishi arXiv preprint arXiv:2110.04946, 2021 | 8 | 2021 |
Latent linguistic embedding for cross-lingual text-to-speech and voice conversion HT Luong, J Yamagishi arXiv preprint arXiv:2010.03717, 2020 | 3 | 2020 |
A DNN-based text-to-speech synthesis system using speaker, gender, and age codes HT Luong, S Takaki, SJ Kim, J Yamagishi The Journal of the Acoustical Society of America 140 (4_Supplement), 2962-2962, 2016 | 1 | 2016 |
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance HT Luong, J Yamagishi arXiv preprint arXiv:2106.13479, 2021 | | 2021 |
Deep learning based voice cloning framework for a unified system of text-to-speech and voice conversion L Hieu-Thi, HT Luong 総合研究大学院大学, 2020 | | 2020 |
Do prosodic manual annotations matter for Japanese speech synthesis systems with WaveNet vocoder? HT Luong, X Wang, J Yamagishi, N Nishizawa IEICE Technical Report; IEICE Tech. Rep. 117 (517), 215-220, 2018 | | 2018 |
DNN に基づくテキスト音声合成における話者・ジェンダー・年齢コード利用の検討 高木, 山岸順一 電子情報通信学会技術研究報告; 信学技報 116 (279), 37-42, 2016 | | 2016 |
Controlling Multi-Class Human Vocalization Generation via a Simple Segment-based Labeling Scheme HT Luong, J Yamagishi | | |