Man-Machine Speech Communication [E-Book] : 17th National Conference, NCMMSC 2022, Hefei, China, December 15-18, 2022, Proceedings / edited by Ling Zhenhua, Gao Jianqing, Yu Kai, Jia Jia.
This book constitutes the refereed proceedings of the 17th National Conference on Man-Machine Speech Communication, NCMMSC 2022, held in China, in December 2022. The 21 full papers and 7 short papers included in this book were carefully reviewed and selected from 108 submissions. They were organized...
Saved in:
Full text |
|
Personal Name(s): | Jia, Jia, editor |
Jianqing, Gao, editor / Kai, Yu, editor / Zhenhua, Ling, editor | |
Edition: |
1st edition 2023. |
Imprint: |
Singapore :
Springer,
2023
|
Physical Description: |
XI, 332 pages 91 illustrations, 86 illustrations in color (online resource) |
Note: |
englisch |
ISBN: |
9789819924011 |
DOI: |
10.1007/978-981-99-2401-1 |
Series Title: |
/* Depending on the record driver, $field may either be an array with
"name" and "number" keys or a flat string containing only the series
name. We should account for both cases to maximize compatibility. */?>
Communications in Computer and Information Science ;
1765 |
Subject (LOC): |
- MCPN: A Multiple Cross-Perception Network for Real-Time Emotion Recognition in Conversation
- Baby Cry Recognition Based on Acoustic Segment Model
- A Multi-feature Sets Fusion Strategy with Similar Samples Removal for Snore Sound Classification
- Multi-Hypergraph Neural Networks for Emotion Recognition in Multi-Party Conversations
- Using Emoji as an Emotion Modality in Text-Based Depression Detection
- Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis
- Semantic enhancement framework for robust speech recognition
- Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model
- Predictive AutoEncoders are Context-Aware Unsupervised Anomalous Sound Detectors
- A pipelined framework with serialized output training for overlapping speech recognition
- Adversarial Training Based on Meta-Learning in Unseen Domains for Speaker Verification
- Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement
- Multiple Confidence Gates for Joint Training of SE and ASR
- Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion
- Pre-training Techniques For Improving Text-to-Speech Synthesis By Automatic Speech Recognition Based Data Enhancement
- A Time-Frequency Attention Mechanism with Subsidiary Information for Effective Speech Emotion Recognition
- Interplay between prosody and syntax-semantics: Evidence from the prosodic features of Mandarin tag questions
- Improving Fine-grained Emotion Control and Transfer with Gated Emotion Representations in Speech Synthesis
- Violence Detection through Fusing Visual Information to Auditory Scene
- Mongolian Text-to-Speech Challenge under Low-Resource Scenario for NCMMSC2022
- VC-AUG Voice Conversion based Data Augmentation for Text-Dependent Speaker Verification
- Transformer-based potential emotional relation mining network for emotion recognition in conversation
- FastFoley Non-Autoregressive Foley Sound Generation Based On Visual Semantics
- Structured Hierarchical Dialogue Policy with Graph Neural Networks
- Deep Reinforcement Learning for On-line Dialogue State Tracking
- Dual Learning for Dialogue State Tracking
- Automatic Stress Annotation and Prediction For Expressive Mandarin TTS
- MnTTS2 An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset.