秋天的风像什么| 胃炎是什么原因引起的| 正常人为什么会低血糖| 子宫内膜什么时候脱落| 罴是什么动物| 矫正牙齿挂什么科| 海带吃了有什么好处| 天秤座是什么象| 右后背疼是什么病| 经常头痛是什么原因| 牛油果什么味道| 孟母三迁的故事告诉我们什么道理| 男性补肾壮阳吃什么药效果比较好| 腊月是什么月| 放疗后吃什么恢复快| seifini是什么牌子| 1月1日是什么星座| 8.19是什么星座| 月底是什么时候| 人是什么结构| Preparing什么意思| 肠胃炎输液用什么药| 特警力量第二部叫什么| microsd卡是什么卡| 爱情和面包是什么意思| 小资情调是什么意思| d二聚体是查什么的| 送葬后回家注意什么| 未免是什么意思| 金玉其外败絮其中是什么意思| 卵巢早衰有什么症状| 深圳市市长是什么级别| 血压低有什么症状| 后背疼去医院挂什么科| molly什么意思| 黄花鱼是什么鱼| 宝姿是什么档次的牌子| 幺蛾子是什么意思| 冰箱底部漏水是什么原因| 涤纶是什么材质| 乙肝抗体阴性什么意思| 男性生殖器叫什么| bgm是什么| 尿酸高吃什么药好| 农历3月12日是什么星座| 乳铁蛋白对宝宝有什么好处| 肾虚去医院挂什么科| 猫咪喜欢吃什么| 上下眼皮肿是什么原因| 白痰是什么原因| 活字印刷术是什么时候发明的| 贝字旁的字和什么有关| 角瓜念什么| 查激素六项挂什么科| 子宫肌瘤是什么意思| 免冠照片是什么意思| 肠易激综合征是什么病| 肌酐偏高是什么意思| gy是什么意思| 白细胞低是什么原因引起的| 淋巴结是什么引起的| 牙龈炎吃什么药最有效| 胃反酸吃什么药| 什么药治失眠最有效| 复合维生素是什么| 线索细胞阳性是什么意思| 假借是什么意思| 早射吃什么药| 孩子手抖是什么原因| 便秘吃什么药效果最好| 痔疮的初期症状是什么| 昔字五行属什么| 三和大神什么意思| 绿鼻涕是什么原因| 降钙素原高是什么原因| 草酸是什么| 中国的四大发明是什么| 中国国酒是什么| 什么品种的鸡肉最好吃| 中国国菜是什么菜| 喝枸杞有什么好处| 老人身上痒是什么原因| 糖尿病吃什么| 早泄吃什么药见效| 低置胎盘有什么危险| 区域经理的岗位职责是什么| 提携是什么意思| 刺史相当于现在的什么官| 乙酉是什么意思| 血糖高吃什么好| 车顶放饮料什么意思| 肺炎为什么要7到10天才能好| 状元红又叫什么荔枝| 障碍性贫血是什么病| 夏天适合养什么花| 晋五行属什么| 尿毒症是什么引起的| 家是什么结构的字| 什么花适合室内养| 属猪男配什么属相最好| 养蛊是什么意思| 便秘应该吃什么| 气节是什么意思| 什么是断掌| 边缘性脐带入口是什么意思| 农历六月初六是什么星座| 孤僻的人给人什么感觉| 女性分泌物像豆腐渣用什么药| 小便有刺痛感什么原因| 体癣用什么药| 排卵期什么时候| 什么病不能吃芒果| 白身是什么意思| 10月生日是什么星座| nary是什么牌子的手表| 得了性疾病有什么症状| 吃菌子不能吃什么| 老是放屁什么原因| 小儿风寒感冒吃什么药最好| 儿童喉咙发炎吃什么药| 什么鱼适合做酸菜鱼| 钙片什么时间吃最好| 肾窦分离是什么意思| 野是什么意思| 全血铅测定是什么意思| 汗斑是什么样的图片| 语感是什么意思| 7.11什么星座| 90年属什么的生肖| 女生痛经有什么办法缓解| 室上速是什么原因导致的| 指腹脱皮是什么原因| mect是什么意思| 九月二十五是什么星座| 鹦鹉能吃什么| 芒果不能和什么水果一起吃| 最近老是犯困想睡觉是什么原因| 石英岩质玉是什么玉| 美满霉素又叫什么名字| 抗甲状腺球蛋白抗体高是什么意思| 芥菜是什么菜| 老年人出虚汗是什么原因引起的| 唾液腺是什么组织| 旖旎什么意思| 手指变形是什么原因| 什么入伏| 扁桃体发炎喉咙痛吃什么药| 肩周炎是什么原因引起的| 雄字五行属什么| 醋栗是什么东西| 记忆力下降是什么原因引起的| 肠易激综合征吃什么药| 预拌粉是什么东西| 什么来迟| 什么可以保护眼睛| 纸尿裤和拉拉裤有什么区别| 想睡睡不着是什么原因| 跟腱为什么会断裂| 胃疼吃什么药最管用| 修女是干什么的| 月字旁的字有什么| 胎盘厚度代表什么| 宫外孕什么症状| 为什么有的人特别招蚊子| nothomme什么牌子| 佛光普照是什么生肖| 补铁吃什么食物好| 黄体功能不足吃什么药| 属虎男和什么属相最配| 违拗是什么意思| 7月30日是什么星座| 什么是星座| 林黛玉是什么病| 九死一生是指什么生肖| p波增宽是什么意思| 路引是什么| 柏拉图式是什么意思| 里急后重是什么意思| 喜欢闻汽油味是什么原因| 金牛男喜欢什么样的女生| 脐带血能治疗什么病| 尖锐湿疣是什么病| qs是什么| 创伤性湿肺是什么意思| 气血不足吃什么食物最好| 一本线是什么意思| 什么的蔷薇| 先天性心脏病最怕什么| 人肉是什么味道的| yy飞机票是什么| 阿莫西林不能和什么一起吃| 小姨子是什么关系| 条件致病菌是什么意思| hc是胎儿的什么| 高血压吃什么助勃药好| 地什么人什么| 皮肤粗糙缺什么维生素| 梦见扫地是什么预兆| 吃什么东西能通便| 肛门不舒服是什么原因| 天蝎座女生配什么星座| ns是什么意思| 白居易有什么之称| 伯父是什么关系| 肝病有什么征兆| 角膜炎用什么眼药水| 印度属于什么人种| 近亲结婚有什么危害| 吃什么有利于排便| 咽炎吃什么药最好效果| 硬下疳长什么样| 蟑螂为什么会飞| 七杀大运是什么意思| 乳腺腺体是什么| 手汗多是什么原因| 男人味是什么意思| hbaic是什么意思| 声带小结是什么意思| 眼睛发红是什么原因| 双非是什么| 结节性硬化症是什么病| 发光免疫是检查什么的| eb病毒iga抗体阳性是什么意思| 麦字五行属什么| 铁皮石斛能治什么病| 女性出汗多是什么原因| 宫颈纳氏囊肿是什么意思| 11.2是什么星座| MD是什么| 警察是什么生肖| 没有了晨勃是什么原因| 百合和拉拉有什么区别| 纾字五行属什么| 老人适合吃什么水果| 午餐吃什么| 夏至要吃什么| 秋高气爽是什么意思| 流鼻血是什么病| supreme是什么牌子| 挫败感是什么意思| 左胸下方是什么部位| 维生素B3叫什么名字| 鸡壳是什么| 皮瓣手术是什么意思| 怀孕能吃什么水果| 16岁是什么年华| 疝气长在什么位置图片| 属马跟什么属相犯冲| 乳头为什么会痒| 淋巴结是什么病| 两岁宝宝拉肚子吃什么药| 女人肾虚吃什么补回来| 梦见撒尿是什么意思| 蚝油是干什么用的| 胆五行属什么| 元宵节吃什么| 为什么会梦游| 脚底板疼是什么原因| 88属什么| 辩证什么意思| 1月7日是什么星座| 蕴字五行属什么| 头晕头疼挂什么科| 脑供血不足用什么药好| 百度

神龙公司近两年将有6款国六发动机适配20余款车型投放

百度 视频中,一名男性游客曾在上菜前离开餐桌到店内服务台用手机支付购买了一瓶腐乳拿回餐桌。

WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016,[1] is able to generate relatively realistic-sounding human-like voices by directly modelling waveforms using a neural network method trained with recordings of real speech. Tests with US English and Mandarin reportedly showed that the system outperforms Google's best existing text-to-speech (TTS) systems, although as of 2016 its text-to-speech synthesis still was less convincing than actual human speech.[2] WaveNet's ability to generate raw waveforms means that it can model any kind of audio, including music.[3]

History

edit

Generating speech from text is an increasingly common task thanks to the popularity of software such as Apple's Siri, Microsoft's Cortana, Amazon Alexa and the Google Assistant.[4]

Most such systems use a variation of a technique that involves concatenated sound fragments together to form recognisable sounds and words.[5] The most common of these is called concatenative TTS.[6] It consists of large library of speech fragments, recorded from a single speaker that are then concatenated to produce complete words and sounds. The result sounds unnatural, with an odd cadence and tone.[7] The reliance on a recorded library also makes it difficult to modify or change the voice.[8]

Another technique, known as parametric TTS,[9] uses mathematical models to recreate sounds that are then assembled into words and sentences. The information required to generate the sounds is stored in the parameters of the model. The characteristics of the output speech are controlled via the inputs to the model, while the speech is typically created using a voice synthesiser known as a vocoder. This can also result in unnatural sounding audio.

Design and ongoing research

edit

Background

edit
?
A stack of dilated causal convolutional layers[10]

WaveNet is a type of feedforward neural network known as a deep convolutional neural network (CNN). In WaveNet, the CNN takes a raw signal as an input and synthesises an output one sample at a time. It does so by sampling from a softmax (i.e. categorical) distribution of a signal value that is encoded using μ-law companding transformation and quantized to 256 possible values.[11]

Initial concept and results

edit

According to the original September 2016 DeepMind research paper WaveNet: A Generative Model for Raw Audio,[12] the network was fed real waveforms of speech in English and Mandarin. As these pass through the network, it learns a set of rules to describe how the audio waveform evolves over time. The trained network can then be used to create new speech-like waveforms at 16,000 samples per second. These waveforms include realistic breaths and lip smacks?– but do not conform to any language.[13]

WaveNet is able to accurately model different voices, with the accent and tone of the input correlating with the output. For example, if it is trained with German, it produces German speech.[14] The capability also means that if the WaveNet is fed other inputs?– such as music?– its output will be musical. At the time of its release, DeepMind showed that WaveNet could produce waveforms that sound like classical music.[15]

Content (voice) swapping

edit

According to the June 2018 paper Disentangled Sequential Autoencoder,[16] DeepMind has successfully used WaveNet for audio and voice "content swapping": the network can swap the voice on an audio recording for another, pre-existing voice while maintaining the text and other features from the original recording. "We also experiment on audio sequence data. Our disentangled representation allows us to convert speaker identities into each other while conditioning on the content of the speech." (p.?5) "For audio, this allows us to convert a male speaker into a female speaker and vice versa [...]." (p.?1) According to the paper, a two-digit minimum amount of hours (c. 50 hours) of pre-existing speech recordings of both source and target voice are required to be fed into WaveNet for the program to learn their individual features before it is able to perform the conversion from one voice to another at a satisfying quality. The authors stress that "[a]n advantage of the model is that it separates dynamical from static features [...]." (p.?8), i. e. WaveNet is capable of distinguishing between the spoken text and modes of delivery (modulation, speed, pitch, mood, etc.) to maintain during the conversion from one voice to another on the one hand, and the basic features of both source and target voices that it is required to swap on the other.

The January 2019 follow-up paper Unsupervised speech representation learning using WaveNet autoencoders[17] details a method to successfully enhance the proper automatic recognition and discrimination between dynamical and static features for "content swapping", notably including swapping voices on existing audio recordings, in order to make it more reliable. Another follow-up paper, Sample Efficient Adaptive Text-to-Speech,[18] dated September 2018 (latest revision January 2019), states that DeepMind has successfully reduced the minimum amount of real-life recordings required to sample an existing voice via WaveNet to "merely a few minutes of audio data" while maintaining high-quality results.

Its ability to clone voices has raised ethical concerns about WaveNet's ability to mimic the voices of living and dead persons. According to a 2016 BBC article, companies working on similar voice-cloning technologies (such as Adobe Voco) intend to insert watermarking inaudible to humans to prevent counterfeiting, while maintaining that voice cloning satisfying, for instance, the needs of entertainment-industry purposes would be of a far lower complexity and use different methods than required to fool forensic evidencing methods and electronic ID devices, so that natural voices and voices cloned for entertainment-industry purposes could still be easily told apart by technological analysis.[19]

Applications

edit

At the time of its release, DeepMind said that WaveNet required too much computational processing power to be used in real world applications.[20] As of October 2017, Google announced a 1,000-fold performance improvement along with better voice quality. WaveNet was then used to generate Google Assistant voices for US English and Japanese across all Google platforms.[21] In November 2017, DeepMind researchers released a research paper detailing a proposed method of "generating high-fidelity speech samples at more than 20 times faster than real-time", called "Probability Density Distillation".[22] At the annual I/O developer conference in May 2018, it was announced that new Google Assistant voices were available and made possible by WaveNet; WaveNet greatly reduced the number of audio recordings that were required to create a voice model by modeling the raw audio of the voice actor samples.[23]

See also

edit

References

edit
  1. ^ van den Oord, Aaron; Dieleman, Sander; Zen, Heiga; Simonyan, Karen; Vinyals, Oriol; Graves, Alex; Kalchbrenner, Nal; Senior, Andrew; Kavukcuoglu, Koray (2025-08-14). "WaveNet: A Generative Model for Raw Audio". arXiv:1609.03499 [cs.SD].
  2. ^ Kahn, Jeremy (2025-08-14). "Google's DeepMind Achieves Speech-Generation Breakthrough". Bloomberg.com. Retrieved 2025-08-14.
  3. ^ Meyer, David (2025-08-14). "Google's DeepMind Claims Massive Progress in Synthesized Speech". Fortune. Retrieved 2025-08-14.
  4. ^ Kahn, Jeremy (2025-08-14). "Google's DeepMind Achieves Speech-Generation Breakthrough". Bloomberg.com. Retrieved 2025-08-14.
  5. ^ Condliffe, Jamie (2025-08-14). "When this computer talks, you may actually want to listen". MIT Technology Review. Retrieved 2025-08-14.
  6. ^ Hunt, A. J.; Black, A. W. (May 1996). "Unit selection in a concatenative speech synthesis system using a large speech database". 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (PDF). Vol.?1. pp.?373–376. CiteSeerX?10.1.1.218.1335. doi:10.1109/ICASSP.1996.541110. ISBN?978-0-7803-3192-1. S2CID?14621185.
  7. ^ Coldewey, Devin (2025-08-14). "Google's WaveNet uses neural nets to generate eerily convincing speech and music". TechCrunch. Retrieved 2025-08-14.
  8. ^ van den Oord, A?ron; Dieleman, Sander; Zen, Heiga (2025-08-14). "WaveNet: A Generative Model for Raw Audio". DeepMind. Retrieved 2025-08-14.
  9. ^ Zen, Heiga; Tokuda, Keiichi; Black, Alan W. (2009). "Statistical parametric speech synthesis". Speech Communication. 51 (11): 1039–1064. CiteSeerX?10.1.1.154.9874. doi:10.1016/j.specom.2009.04.004. S2CID?3232238.
  10. ^ van den Oord, A?ron (2025-08-14). "High-fidelity speech synthesis with WaveNet". DeepMind. Retrieved 2025-08-14.
  11. ^ Oord, Aaron van den; Dieleman, Sander; Zen, Heiga; Simonyan, Karen; Vinyals, Oriol; Graves, Alex; Kalchbrenner, Nal; Senior, Andrew; Kavukcuoglu, Koray (2025-08-14). "WaveNet: A Generative Model for Raw Audio". arXiv:1609.03499 [cs.SD].
  12. ^ Aaron van den Oord; Dieleman, Sander; Zen, Heiga; Simonyan, Karen; Vinyals, Oriol; Graves, Alex; Kalchbrenner, Nal; Senior, Andrew; Kavukcuoglu, Koray (2016). "WaveNet: A Generative Model for Raw Audio". arXiv:1609.03499 [cs.SD].
  13. ^ Gershgorn, Dave (2025-08-14). "Are you sure you're talking to a human? Robots are starting to sounding eerily lifelike". Quartz. Retrieved 2025-08-14.
  14. ^ Coldewey, Devin (2025-08-14). "Google's WaveNet uses neural nets to generate eerily convincing speech and music". TechCrunch. Retrieved 2025-08-14.
  15. ^ van den Oord, A?ron; Dieleman, Sander; Zen, Heiga (2025-08-14). "WaveNet: A Generative Model for Raw Audio". DeepMind. Retrieved 2025-08-14.
  16. ^ Li, Yingzhen; Mandt, Stephan (2018). "Disentangled Sequential Autoencoder". arXiv:1803.02991 [cs.LG].
  17. ^ Chorowski, Jan; Weiss, Ron J.; Bengio, Samy; Van Den Oord, Aaron (2019). "Unsupervised Speech Representation Learning Using WaveNet Autoencoders". IEEE/ACM Transactions on Audio, Speech, and Language Processing. 27 (12): 2041–2053. arXiv:1901.08810. doi:10.1109/TASLP.2019.2938863.
  18. ^ Chen, Yutian; Assael, Yannis; Shillingford, Brendan; Budden, David; Reed, Scott; Zen, Heiga; Wang, Quan; Cobo, Luis C.; Trask, Andrew; Laurie, Ben; Gulcehre, Caglar; A?ron van den Oord; Vinyals, Oriol; Nando de Freitas (2018). "Sample Efficient Adaptive Text-to-Speech". arXiv:1809.10460 [cs.LG].
  19. ^ Adobe Voco 'Photoshop-for-voice' causes concern, 7 November 2016, BBC
  20. ^ "Adobe Voco 'Photoshop-for-voice' causes concern". BBC News. 2025-08-14. Retrieved 2025-08-14.
  21. ^ WaveNet launches in the Google Assistant
  22. ^ Aaron van den Oord; Li, Yazhe; Babuschkin, Igor; Simonyan, Karen; Vinyals, Oriol; Kavukcuoglu, Koray; George van den Driessche; Lockhart, Edward; Cobo, Luis C.; Stimberg, Florian; Casagrande, Norman; Grewe, Dominik; Noury, Seb; Dieleman, Sander; Elsen, Erich; Kalchbrenner, Nal; Zen, Heiga; Graves, Alex; King, Helen; Walters, Tom; Belov, Dan; Hassabis, Demis (2017). "Parallel WaveNet: Fast High-Fidelity Speech Synthesis". arXiv:1711.10433 [cs.LG].
  23. ^ Martin, Taylor (May 9, 2018). "Try the all-new Google Assistant voices right now". CNET. Retrieved May 10, 2018.
edit
京东公司全称是什么 铮铮是什么意思 55岁属什么 珍惜眼前人是什么意思 彩金是什么
放风是什么意思 木槿花什么时候开花 心阴虚吃什么食物 吃什么能提高记忆力 甲功七项挂什么科
早茶是什么意思 状元红又叫什么荔枝 5月30日是什么星座 和五行属什么 腹胀吃什么药最有效
南昌有什么好玩的地方 六月二十五号是什么星座 爱困总想睡觉什么原因 钳子什么牌子好 梦到掉牙齿是什么意思
十二点是什么时辰hcv9jop7ns5r.cn 司令是什么军衔hcv8jop9ns4r.cn 梦到别人结婚是什么意思hcv9jop6ns9r.cn 什么食物含蛋白质高1949doufunao.com 乳腺纤维瘤有什么症状表现hkuteam.com
爱是什么颜色wzqsfys.com 生什么什么什么hcv8jop1ns6r.cn 籽骨出现意味着什么hcv8jop4ns6r.cn 孕早期适合吃什么水果hcv8jop3ns9r.cn 6月7日是什么星座hcv9jop1ns1r.cn
什么是低钠盐hcv9jop7ns5r.cn pr间期延长是什么意思hcv9jop3ns3r.cn 装模作样是什么生肖hcv7jop4ns8r.cn 蒸米饭时加什么好吃hcv9jop6ns0r.cn d代表什么单位hcv7jop7ns3r.cn
夏天脸上皮肤痒是什么原因hcv8jop4ns4r.cn 莎莎舞是什么意思hcv8jop1ns2r.cn 肺的作用和功能是什么hcv7jop9ns8r.cn 金牛座女和什么星座最配hcv7jop4ns8r.cn 飞机联程票是什么意思hcv8jop5ns5r.cn
百度