當前位置

首頁 > 英語閱讀 > 雙語新聞 > 谷歌語言交互新突破 能更逼真模擬人聲

谷歌語言交互新突破 能更逼真模擬人聲

推薦人: 來源: 閱讀: 2.34W 次

Google's DeepMind have revealed a new speech synthesis generator that will be used to help computer voices, like Siri and Cortana, sound more human.

谷歌旗下的人工智能公司DeepMind近日研製出了一種新型語音合成系統, 該技術可以讓如Siri和Cortana這樣的計算機合成語音聽起來更接近真實人聲。

Named WaveNet, the model works with raw audio waveforms to make our robotic assistants sound, err, less robotic.

這項名爲WaveNet的技術通過研究原始音頻波形,使機器人助手的聲音聽起來不那麼像機器人。

WaveNet doesn't control what the computer is saying, instead it uses AI to make it sound more like a person, adding breathing noises, emotion and different emphasis into senteneces.

WaveNet並不會控制計算機的說話內容,它只會應用人工智能技術在句子中添加呼吸聲、情感和各種重音,從而使計算機語音聽起來更像真人。

Generating speech with computers is called text-to-speech (TTS) and up until now has worked by piecing together short pre-recorded syllables and sound fragments to form words.

用計算機合成語音的技術叫做“從文本到語音(TTS)”,現存的工作原理是將提前錄製好的短音節和聲音碎片合成語言。

谷歌語言交互新突破 能更逼真模擬人聲

As the words are taken from a database of speech fragments, it's very difficult to modify the voice, so adding things like intonation and emphasis is almost impossible.

由於語言是從語音碎片數據庫中提取出來的,聲音很難修飾,所以幾乎不可能添加聲調和重音等因素。

This is why robotic voices often sound monotonous and decidedly different from humans.

這就是爲什麼機器人語音聽起來很生硬,明顯和人聲不同。

WaveNet however overcomes this problem, by using its neural network models to build an audio signal from the ground up, one sample at a time.

然而WaveNet克服了這個難關,利用神經元網絡模型從頭建立一個音頻信號,每次生成一個樣本。

During training the DeepMind team gave WaveNet real waveforms recorded from human speakers to learn from.

培訓期間,DeepMind團隊讓WaveNet學習了一些真實記錄的人類語音波形。

Using a type of AI called a neural network, the program then learns from these, much in the same way a human brain does.

通過一種叫做神經元網絡的人工智能技術,這個系統可以像人類的大腦一樣對這些波形進行學習。

The result was that the WaveNet learned the characteristics of different voices, could make non-speech sounds, such as breathing and mouth movements, and say the same thing in different voices.

所以WaveNet學習了不同聲音的特點,可以發出非語言聲音,比如呼吸聲和嘴部活動的聲音,並且可以用不同的聲音說同樣的內容。

Despite the exciting advancement, the system still requires a huge amount of processing power, which means it will be a while before the technology appears in the likes of Siri.

雖然這個系統有激動人心的進步,但是它需要很強大的處理能力,這意味着這項技術並不能很快應用到Siri當中。

Google's machine learning unit DeepMind is based in the UK and have previously made headlines when their computer beat the Go World champion earlier this year.

Google旗下的機器學習技術企業DeepMind總部設在英國,今年早些時候,他們的計算機因打敗了圍棋世界冠軍而上了頭條。