AWS機器學習初探（2）：文本翻譯Translate、文本轉語音Polly、語音轉文本Transcribe

本文轉載自查看原文 2018-08-27 11:05 1216 基礎知識/ 使用案例/ AWS

AWS機器學習初探（1）：Comprehend - 自然語言處理服務

這幾個服務的功能和使用都很直接和簡單，因此放在一篇文章中介紹。

1. 文本翻譯服務 Translate

1.1 功能介紹

AWS Translate 服務是一種AWS 機器學習應用服務，它利用高級機器學習技術來進行文本翻譯。它的使用非常簡單，只需要提供輸入文本，該服務就給出輸出文本。

輸入文本（Source text）：待翻譯的文本，必須是 UTF-8 格式。
輸出文本（Output text）：AWS Translate 服務輸出的翻譯好的文本，也是 UTF-8 格式。

AWS Translate 服務有兩個組件：

encoder：每次從輸入文本中讀取一個單詞，然后根據其含義構造語義表達。
decoder：利用encoder給出的語義表達，產生一個翻譯詞匯。

AWS Translate 利用 attention 機制來理解上下文，它幫助 decoder 聚焦在原文中最相關的部分，這有助於它翻譯模糊的單詞和短語。

Translate 目前只支持將多種語言翻譯為英文，以及將英文翻譯成多種目標語言。Translate 能自動檢測輸入文本是哪種語言，它是利用 Comprehend 來實現語言探測的。

來對比下AWS Translate 和 Google 翻譯的結果：

這是一段川普的推特文本：

I am hearing so many great things about the Republican Party’s California Gubernatorial Candidate, John Cox. He is a very successful businessman who is tired of high Taxes & Crime. He will Make California Great Again &  make you proud of your Great State again. Total Endorsement!

Google 翻譯結果：

關於共和黨加州州長候選人約翰考克斯，我聽到了很多很棒的事情。 他是一個非常成功的商人，厭倦了高稅收和犯罪。 他將使加利福尼亞再次偉大，讓你再次為你的偉大國家感到驕傲。 總代言！

AWS Translate 翻譯結果：

我聽到很多關於共和黨加州州長候選人約翰·考克斯的偉大事情。 他是一個非常成功的商人，厭倦了高稅與犯罪。 他將再次使加州成為偉大的國家，讓你再次為你的偉大國家感到驕傲。 完全贊同！

從結果看，AWS Translage的質量應該比Google 稍微好一些。

1.2 界面操作示例

以下示例將中文文本翻譯為英文：

1.3 CLI 操作示例

aws translate translate-text --region us-east-1 --source-language-code "auto" --target-language-code "zh" --text "I am hearing so many great things about the Republican Party California Gubernatorial Candidate, John Cox. He is a very successful businessman who is tired of high Taxes & Crime. He will Make California Great Again &  make you proud of your Great State again. Total Endorsement"
{
    "TranslatedText": "我聽到很多關於共和黨加州州長候選人約翰·考克斯的偉大事情。 他是一個非常成功的商人，厭倦了高稅與犯罪。 他將再次使加州成為偉大的國家，讓你再次為你的偉大國家感到驕傲。 完全贊同",
    "SourceLanguageCode": "en",
    "TargetLanguageCode": "zh"
}

1.4 API

Translate 服務只有一個API，就是 TranslateText。

請求語法：

{
"SourceLanguageCode": "string", "TargetLanguageCode": "string", "Text": "string"
}

返回語法：

{
"SourceLanguageCode": "string", "TargetLanguageCode": "string", "TranslatedText": "string"
}

1.5 python 示例代碼

代碼：

import boto3

translate = boto3.client(service_name='translate', region_name='us-east-1', use_ssl=True)

result = translate.translate_text(Text="Hello World", SourceLanguageCode="auto", TargetLanguageCode="zh")

print('TranslatedText: ' + result.get('TranslatedText'))
print('SourceLanguageCode: ' + result.get('SourceLanguageCode'))
print('TargetLangaugeCode: ' + result.get('TargetLanguageCode'))

輸出：

TranslatedText: 您好世界
SourceLanguageCode: en
TargetLangaugeCode: zh

2. 文本轉語音Polly

2.1 功能介紹

所謂的文本轉語音服務，就是把文本朗讀出來。它的輸入輸出為：

輸入文本：待被Polly轉化為語音的文本。可以是純文字（plain text），也可以是 SSML（Speech Syntessis Markup Language）格式。SSML 格式可以進行更精細的控制，比如音量、語速、發音等。
輸出的語言種類：Polly 支持多種語言，每種語音支持多種發聲模式，比如女生聲音和男性聲音。
輸出格式：Polly 支持輸出多種格式的語音，比如 mp3格式，PCM 格式等。

幾個特色功能：

支持發音字典（lexicon）：通過發音字典可以自定義單詞的發音。用戶可以將發音字典上傳到AWS 上，然后將其應用到 SynthesizeSpeech API 中。
支持異步語音合成：可以以異步方式為大文本合成語音。三步走：啟動一個合成任務，獲取任務的詳情，從S3中獲取合成結果。近實時API只支持3000個字符，而異步API可以支持最多20萬個字符。
支持 SSML：詳情可參考官方文檔。

2.2 界面操作示例

Listen to speech：直接聽語音
Download MP3：可以將語音保存為 MP3 格式，並直接下載
Syntesize to S3：將語音輸出保存到 S3 中。

2.3 CLI 操作示例

SammydeMacBook-Air:~ Sammy$ aws polly synthesize-speech --output-format mp3 --voice-id Joanna --text 'Hello, my name is Joanna. I learned about the W3C on 10/3 of last year.' helloworld.mp3
{
    "ContentType": "audio/mpeg",
    "RequestCharacters": "71"
}
SammydeMacBook-Air:~ Sammy$ ls helloworld.mp3
helloworld.mp3

2.4 API

Polly 具有以下幾個API：

• SynthesizeSpeech：合成語音

• ListLexicons：列表發音詞典

• PutLexicon：創建發音詞典

• GetLexicon：檢索發音詞典

• DeleteLexicon：刪除發音詞典

• DescribeVoices：獲取聲音列表
• GetSpeechSynthesisTask：獲取語音生成任務

• ListSpeechSynthesisTasks：獲取語音生成任務列表

• StartSpeechSynthesisTask：開始語音生成任務

2.5 python 示例代碼

from boto3 import Session
from contextlib import closing
import os
import sys
import subprocess
from tempfile import gettempdir

session = Session(profile_name="default")
polly = session.client("polly")

try:
    text = "To the incredible people of the Great State of Wyoming: Go VOTE TODAY for Foster Friess - He will be a fantastic Governor! Strong on Crime, Borders & 2nd Amendment. Loves our Military & our Vets. He has my complete and total Endorsement!"
    response = polly.synthesize_speech(Text = text, OutputFormat="mp3", VoiceId="Joanna")
except Exception as error:
    print(error)
    sys.exit(-1)

if "AudioStream" in response:
    with closing(response["AudioStream"]) as stream:
        output = os.path.join(gettempdir(), "speech.mp3")
        try:
            with open(output, "wb") as file:
                file.write(stream.read())
        except IOError as error:
            print(error)
            sys.exit(-1)
else:
    print("Could not stream audio")
    sys.exit(-1)

if sys.platform == "win32":
    os.startfile(output)
else:
    opener = "open" if sys.platform == "darwin" else "xdg-open"
    subprocess.call([opener, output])

這段代碼會將語音保存到 speech.mp3中，然后調用系統默認播放器進行播放。

3. 語音轉文本服務Transcribe

3.1 功能介紹

AWS Transcribe 服務於利用機器學習來識別語音文件中的聲音，然后將其轉化為文本。目前支持英語和西班牙文語音。必須將語音文件保存在S3中，輸出結果也會被保存在S3中。

輸入聲音文件，支持 flac、mp3、mp4 和 wav 文件格式。長度不能超過2小時。
指定語言。

幾個特色功能：

發音者識別（speaker identification）：Transcribe 能區別一個語音文件中的多個說話者。支持2到10個發音者。
支持多聲道（channel identification）: 如果聲音文件中有多聲道，那么
支持字典（vocabulary）：比如不能識別的單詞，特定領域不常用的單詞

3.2 界面操作示例

創建一個job：

job列表：

結果：

3.3 CLI 操作示例

（1）提交一個job

{
    "TranscriptionJobName": "testTranscribe",
    "LanguageCode": "en-US",
    "MediaFormat": "mp3",
    "Media": {
        "MediaFileUri": "https://s3.dualstack.us-east-1.amazonaws.com/*********/hellosammy.mp3"
    }
}

aws transcribe start-transcription-job --region us-east-1 --cli-input-json file://testTranscribeJob.json

（2）獲取job 列表

aws transcribe list-transcription-jobs --region us-east-1 --status IN_PROGRESS
{
    "Status": "IN_PROGRESS",
    "TranscriptionJobSummaries": [
        {
            "TranscriptionJobName": "testTranscribe",
            "CreationTime": 1535338023.662,
            "LanguageCode": "en-US",
            "TranscriptionJobStatus": "IN_PROGRESS",
            "OutputLocationType": "SERVICE_BUCKET"
        }
    ]
}

（3）一旦 job 完成后，從其包含的TranscriptFileUri地址可以下載輸出文本，部分內容如下：

{"jobName":"testTranscribe","accountId":"725348140609","results":{"transcripts":[{"transcript":"Hello, my name is sami. I learned about the w three c on october third last year."}],"items":[{"start_time":"0.0","end_time":"0.59","alternatives":[{"confidence":"0.9023","content":"Hello"}],"type":"pronunciation"},{"alternatives":[{"confidence":null,"content":","}],"type":"punctuation"},{"start_time":"0.7","end_time":"0.88","alternatives":
[{"confidence":"0.9867","content":"last"}],"type":"pronunciation"},{"start_time":"4.69","end_time":"5.07","alternatives":[{"confidence":"0.9867","content":"year"}],"type":"pronunciation"},{"alternatives":[{"confidence":null,"content":"."}],"type":"punctuation"}]},"status":"COMPLETED"}

3.4 API

StartTranscriptionJob：開始一個轉換任務
ListTranscriptionJobs：獲取任務列表
GetTranscriptionJob：獲取任務
CreateVocabulary：創建字典
DeleteVocabulary：刪除字典
GetVocabulary：獲取字典
ListVocabularies：獲取字典列表
UpdateVocabulary：上傳字典

3.5 python 示例代碼

import time
import boto3

transcribe = boto3.client(('transcribe'))
job_name = "testTranscribeJob100"
job_uri = "https://s3.dualstack.us-east-1.amazonaws.com/*****/hellosammy.mp3"

transcribe.start_transcription_job(TranscriptionJobName=job_name, Media={'MediaFileUri': job_uri}, MediaFormat='mp3', LanguageCode='en-US')

while True:
    status = transcribe.get_transcription_job(TranscriptionJobName = job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', "FAILED"]:
        break

    print("Job not ready yet...")
    time.sleep(5)

print(status)

參考文檔：

AWS Translate、Polly 和 Transcribe 開發者文檔

歡迎大家關注我的個人公眾號：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 AWS機器學習初探（1）：Comprehend - 自然語言處理服務初探機器學習之使用訊飛TTS服務實現在線語音合成 scala spark 機器學習初探 C# 文本轉語音朗讀 QT文本轉語音模塊（TTS）QTextToSpeech c#文本轉語音以及語音閱讀小實例機器學習 -- 文本挖掘機器學習之文本特征提取機器學習-文本處理 Windows ML，系統內置的機器學習平台初探