Publications
Journal Paper
- Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials
IEICE Transactions on Information and Systems, 2021
[paper] [jstage] [demo]
(Best Paper Award from IEICE) - Takaaki Saeki, Shinnosuke Takamichi, and Hiroshi Saruwatari
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model
IEEE Signal Processing Letters, 2021
[ieee xplore] [arxiv] [demo] [poster]
(The 37th Telecom System Technology Award for Students from TAF)
Peer-Reviewed Conference Paper
- Yoshifumi Nakano, Takaaki Saeki, Shinnosuke Takamichi, Katsuhito Sudoh, and Hiroshi Saruwatari
vTTS: Visual-Text to Speech
IEEE SLT 2022
[arxiv] [corpus] [github] - Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, and Hiroshi Saruwatari
Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis
APSIPA ASC 2022
[corpus] [demo] [github] - Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, and Hiroshi Saruwatari
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling
INTERSPEECH 2022 (Oral)
[arxiv] [demo] [github] - Takaaki Saeki, Kentaro Tachibana, and Ryuichi Yamamoto
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning
INTERSPEECH 2022
[arxiv] [demo] - Takaaki Saeki*, Detai Xin*, Wataru Nakata*, Tomoki Koriyama, Shinnosuke Takamichi, and Hiroshi Saruwatari
(*Equal contribution)
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
INTERSPEECH 2022 (Oral)
[arxiv] [github]
(Ranked 1st Place in 10/16 Metrics) - Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, and Hiroshi Saruwatari
Personalized Filled-Pause Generation with Group-Wise Prediction Models
LREC 2022
[paper] [arxiv] [github] - Naoki Kimura, Zixiong Su, Takaaki Saeki and Jun Rekimoto
SSR7000: A Synchronized Corpus of Ultrasound Tongue Imaging for End-to-End Silent Speech Recognition
LREC 2022
[paper] [github] - Takaaki Saeki, Shinnosuke Takamichi, and Hiroshi Saruwatari
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
ASRU 2021
[ieee xplore] [arxiv] [demo] [slide] - Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU
INTERSPEECH 2020 (Show & Tell)
[paper] [video] - Naoki Kimura, Zixiong Su, and Takaaki Saeki
End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge
INTERSPEECH 2020 (Show & Tell)
[paper] [video] - Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
Lifter Training and Sub-Band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials
ICASSP 2020
[ieee xplore] [arxiv] [slide] [video]
Preprint
- Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, and Bhuvana Ramabhadran
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
arXiv preprint, 2210.15447
[arxiv] [demo] - Shinnosuke Takamichi, Ludwig Kürzinger, Takaaki Saeki, Sayaka Shiota, and Shinji Watanabe
JTubeSpeech: Corpus of Japanese Speech Collected from YouTube for Speech Recognition and Speaker Verification
arXiv preprint, 2112.09323
[arxiv] [github] - Tomoki Hayashi, Ryuichi Yamamoto, Takenori Yoshimura, Peter Wu, Jiatong Shi, Takaaki Saeki, Yooncheol Ju, Yusuke Yasuda, Shinnosuke Takamichi, and Shinji Watanabe
ESPnet2-TTS: Extending the Edge of TTS Research
arXiv preprint, 2110.07840
[arxiv] [demo] [github]
Thesis
- Takaaki Saeki (Supervisor: Prof. Hiroshi Saruwatari)
Real-Time, Full-Band, High-Quality Neural Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation
Master’s Thesis, the University of Tokyo, 2021
[thesis] [slide]