Publications
Journal Paper
- Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis
IEEE/ACM Transactions on Audio, Speech and Language Processing, 2024.
[ieee xplore] - Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, and Hiroshi Saruwatari
SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources
IEEE Access, 2023.
[ieee xplore] [github] - Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials
IEICE Transactions on Information and Systems, 2021.
[paper] [jstage] [demo]
(Best Paper Award) - Takaaki Saeki, Shinnosuke Takamichi, and Hiroshi Saruwatari
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model
IEEE Signal Processing Letters, 2021.
[ieee xplore] [arxiv] [demo] [poster]
(The 37th Telecom System Technology Award for Students from TAF)
Peer-Reviewed Conference Paper
- Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, Hiroshi Saruwatari
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
Annual Conference of the International Speech Communication Association (Interspeech), 2024.
[arxiv] [github] - Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024.
[arxiv] [demo] [slide] - Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, and Hiroshi Saruwatari
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024.
[arxiv] - Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, Shinji Watanabe
YODAS: Youtube-Oriented Dataset for Audio and Speech
IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023.
[ieee xplore] - Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, and Hiroshi Saruwatari
Improving Robustness of Spontaneous Speech Synthesis with Linguistic Speech Regularization and Pseudo-Filled-Pause Insertion
ISCA Speech Synthesis Workshop (SSW), 2023.
[openreview] - Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
International Joint Conference on Artificial Intelligence (IJCAI), 2023.
[ijcai_proceedings] [arxiv] [demo] [github] [slide] - Soumi Maiti, Yifan Peng, Takaaki Saeki, Shinji Watanabe
SpeechLMScore: Evaluating Speech Generation Using Speech Language Model
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[arxiv] [github] - Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, and Hiroshi Saruwatari
Duration-Aware Pause Insertion Using Pre-trained Language Model for Multi-speaker Text-to-Speech
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[arxiv] [demo] - Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, and Hiroshi Saruwatari
Text-to-Speech Synthesis from Dark Data with Evaluation-in-the-Loop Data Selection
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[arxiv]
(IEEE SPS Travel Grant for IEEE ICASSP 2023) - Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, and Bhuvana Ramabhadran
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. (Oral)
[arxiv] [demo] [slide] - Yoshifumi Nakano, Takaaki Saeki, Shinnosuke Takamichi, Katsuhito Sudoh, and Hiroshi Saruwatari
vTTS: Visual-Text to Speech
IEEE Spoken Language Technology Workshop (SLT), 2023.
[arxiv] [corpus] [github] - Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, and Hiroshi Saruwatari
Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022.
[corpus] [demo] [github] - Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, and Hiroshi Saruwatari
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling
Annual Conference of the International Speech Communication Association (Interspeech), 2022. (Oral)
[arxiv] [demo] [github] [slide]
(Google East Asia Student Travel Grants) - Takaaki Saeki, Kentaro Tachibana, and Ryuichi Yamamoto
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning
Annual Conference of the International Speech Communication Association (Interspeech), 2022.
[arxiv] [demo] - Takaaki Saeki*, Detai Xin*, Wataru Nakata*, Tomoki Koriyama, Shinnosuke Takamichi, and Hiroshi Saruwatari
(*Equal contribution)
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
Annual Conference of the International Speech Communication Association (Interspeech), 2022. (Oral)
[arxiv] [github] [demo]
(Ranked 1st Place in 10/16 Metrics) - Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, and Hiroshi Saruwatari
Personalized Filled-Pause Generation with Group-Wise Prediction Models
Language Resources and Evaluation Conference (LREC), 2022.
[paper] [arxiv] [github] - Naoki Kimura, Zixiong Su, Takaaki Saeki and Jun Rekimoto
SSR7000: A Synchronized Corpus of Ultrasound Tongue Imaging for End-to-End Silent Speech Recognition
Language Resources and Evaluation Conference (LREC), 2022.
[paper] [github] - Takaaki Saeki, Shinnosuke Takamichi, and Hiroshi Saruwatari
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021.
[ieee xplore] [arxiv] [demo] [slide] - Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU
Annual Conference of the International Speech Communication Association (Interspeech), 2020. (Show & Tell)
[paper] [video] - Naoki Kimura, Zixiong Su, and Takaaki Saeki
End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge
Annual Conference of the International Speech Communication Association (Interspeech), 2020. (Show & Tell)
[paper] [video] - Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
Lifter Training and Sub-Band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020.
[ieee xplore] [arxiv] [slide] [video]
Preprint
- Shinnosuke Takamichi, Ludwig Kürzinger, Takaaki Saeki, Sayaka Shiota, and Shinji Watanabe
JTubeSpeech: Corpus of Japanese Speech Collected from YouTube for Speech Recognition and Speaker Verification
arXiv preprint, 2112.09323
[arxiv] [github] - Tomoki Hayashi, Ryuichi Yamamoto, Takenori Yoshimura, Peter Wu, Jiatong Shi, Takaaki Saeki, Yooncheol Ju, Yusuke Yasuda, Shinnosuke Takamichi, and Shinji Watanabe
ESPnet2-TTS: Extending the Edge of TTS Research
arXiv preprint, 2110.07840
[arxiv] [demo] [github]