
Abstract: In recent years, AI speech synthesis technology has been widely applied in multiple fields such as audio books, game dubbing, song production, and human-computer interaction, profoundly shaping and transforming people’s auditory experiences. Its technical features such as realism, interactivity and customization enable individuals to remain present in the physical, social and self dimensions, playing a significant role in optimizing immersive experiences, promoting human-computer interaction, and enhancing sound aesthetics.
However, these characteristics have also given rise to ethical and social issues such as voice fraud, excessive emotional dependence and confusion of self-identity. To effectively address the challenges, it is urgent to conduct a comprehensive ethical review of AI voice technology at all stages, including research and development, verification, and application. Efforts should focus on developing an agile governance ecosystem for AI voice technology, building a collaborative framework for multi-stakeholder co-governance and sharing, and establishing a dynamic early-warning mechanism to guide its ethical and beneficial development.
Key Words: AI voice synthesis; Presence; Realism; Interactivity; Customization
