1. Shinji Watanabe - Google Scholar
Carnegie Mellon University - Cited by 31825 - Speech recognition - Speech processing - Speech enhancement - Speech translation
Carnegie Mellon University - Cited by 31,779 - Speech recognition - Speech processing - Speech enhancement - Speech translation
2. Shinji Watanabe - Google Sites
Shinji Watanabe is an Associate Professor at Carnegie Mellon University, Pittsburgh, PA. He received his BS, MS, and Ph.D. (Dr. Eng.) degrees from Waseda ...
Shinji Watanabe Associate Professor Carnegie Mellon University shinjiw_at_ieee.org or swatanab_at_andrew.cmu.edu
3. ESPnet2 pretrained model, Shinji Watanabe ... - OpenAIRE - Explore
This model was trained by Shinji Watanabe using jsut recipe in espnet. Python APISee https://github.com/espnet/espnet_model_zoo Evaluate in the recipegit .
This model was trained by Shinji Watanabe using jsut recipe in espnet. Python APISee https://github.com/espnet/espnet_model_zoo Evaluate in the recipegit ...
4. Shinji Watanabe | Papers With Code
In this work, we present SynesLM, an unified model which can perform three multimodal language understanding tasks: audio-visual automatic speech recognition(AV ...
Papers by Shinji Watanabe with links to code and results.
5. ESPnet2 ASR pretrained model - Hugging Face
This model was trained by Shinji Watanabe using librispeech recipe in espnet. Python API See https://github.com/espnet/espnet_model_zoo
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
6. OWSM-CTC: An Open Encoder-Only Speech Foundation Model ... - arXiv
20 feb 2024 · We propose OWSM-CTC, a novel encoder-only speech foundation model based on Connectionist Temporal Classification (CTC). It is trained on 180k hours of public ...
There has been an increasing interest in large speech models that can perform multiple tasks in a single model. Such models usually adopt an encoder-decoder or decoder-only architecture due to their popularity and good performance in many domains. However, autoregressive models can be slower during inference compared to non-autoregressive models and also have potential risks of hallucination. Though prior studies observed promising results of non-autoregressive models for certain tasks at small scales, it remains unclear if they can be scaled to speech-to-text generation in diverse languages and tasks. Inspired by the Open Whisper-style Speech Model (OWSM) project, we propose OWSM-CTC, a novel encoder-only speech foundation model based on Connectionist Temporal Classification (CTC). It is trained on 180k hours of public audio data for multilingual automatic speech recognition (ASR), speech translation (ST), and language identification (LID). Compared to encoder-decoder OWSM, our OWSM-CTC achieves competitive results on ASR and up to 24% relative improvement on ST, while it is more robust and 3 to 4 times faster for inference. OWSM-CTC also improves the long-form ASR result with 20x speed-up. We will publicly release our code, pre-trained model, and training logs to promote open science in speech foundation models.
7. Shinji Watanabe - Human Language Technology Center of Excellence
His research is focused on the area of spoken language processing which includes speech enhancement, source separation, microphone array, speaker adaptation.
Shinji Watanabe received his Bachelor’s and Master’s degrees in Theoretical Physics from the Waseda University in Tokyo, Japan. He received his Ph.D. in Engineering from Waseda University as well. His research is focused on the area of spoken language processing which includes speech enhancement, source separation, microphone array, speaker adaptation, speaker clustering, acoustic and language modeling […]
8. Shinji Watanabe - Electrical and Computer Engineering - College of ...
Shinji Watanabe is an Associate Professor at Carnegie Mellon University, Pittsburgh, PA. He received his B.S., M.S., and Ph.D. (Dr. Eng.) degrees from Waseda ...
Courtesy Faculty
9. Shinji Watanabe | IEEE Xplore Author Details
Publication Topics. Speech Recognition,Language Model,Word Error Rate,Speech Processing,Utterances,Self-supervised Learning,Automatic Speech Recognition ...
Your support ID is: 8203162004068049809.
10. ESPnet2 pretrained model, Shinji Watanabe ... - OpenAIRE - Explore
This model was trained by Shinji Watanabe using librispeech recipe in espnet. Python APISee https://github.com/espnet/espnet_model_zoo Evaluate in the rec.
This model was trained by Shinji Watanabe using librispeech recipe in espnet. Python APISee https://github.com/espnet/espnet_model_zoo Evaluate in the rec...
11. Shinji Watanabe | IEEE Xplore Author Details
Publication Topics. Speech Recognition,Language Model,Word Error Rate,Self-supervised Learning,Speech Processing,Utterances,Training Data,Beam Search,Data ...
Your support ID is: 8203162004071445588.
12. Shinji Watanabe - Semantic Scholar
Preliminary experiments on single-channel mixtures from multiple speakers show that a speaker-independent model trained on two-speaker mixtures can improve ...
Semantic Scholar profile for Shinji Watanabe, with 2142 highly influential citations and 591 scientific research papers.
13. sw005320 (Shinji Watanabe) - Hugging Face
Shinji Watanabe · AI & ML interests · Organizations · Papers 3 · spaces 1 · models 3. Sort: Recently updated · datasets.
User profile of Shinji Watanabe on Hugging Face
14. Shinji Watanabe - CatalyzeX
We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon ...
View Shinji Watanabe's papers and open-source code. See more researchers and engineers like Shinji Watanabe.
15. Shinji Watanabe 0001 - DBLP
Shinji Watanabe 0003 — Osaka Prefecture University, School of Knowledge ... One Model to Rule Them All ? Towards End-to-End Joint Speaker Diarization ...
List of computer science publications by Shinji Watanabe