Speech-to-Text API - Thai NLP Meetup #2

Speech-to-Text API
ประสบการณ์ หัดใช้เพื่อถอดเทป ภาษาไทย

Outline
• ทฤษฎี
• Speech-to-Text API
• ปัญหาการถอดเทป
• ปฏิบัติ - Colab
• Libraries & Data
• pydub, SpeechClient()
• conﬁg, response
• Interface design

Google Speech-to-Text
• เดิมชื่อ Speech API เปลี่ยนเป็น Speech-to-Text
• กลุ่ม Cloud ML แบบเรียกใช้สำเร็จ
• ASR: Automatic Speech Recognition
• รองรับ 120 ภาษา (มากที่สุด)
• ตัวเลือกอื่น: Nuance, AmiVoice, Tellvoice 
(MS, AWS ไม่มีภาษาไทย)

ความสามารถ
• แปลงเสียง เป็น Text ใน 3 mode
• Synchronous (สั้น)
• Asynchronous (ยาว)
• Streaming (real-time)
• ใช้ผ่าน Library, gcloud, curl

Advanced Features
• Alternatives
• Timestamps
• Separate Speakers
• Identify Language
• Enhanced Models (เฉพาะ English)
• Word-level Conﬁdence

ราคา
• ฟรี 60 นาที/เดือน
• เสียงพูด $0.006 = 0.20฿ ต่อ 15 sec
• วิดีโอ x2 เท่า
• เช่น ถอดเทป 1 ชั่วโมง ≈ 50 บาท

การถอดเทป
• ทำไมต้องถอดเทป (transcribe)
• เพื่อ search
• ทำ และแปล sub-title (accessibility)
• text mining หา insight
• แปลง unstructured data เป็น structured
• ปัญหา: กินแรงคนมาก=แพง

Requirement
• ระบบถอดเทป โดยใช้ Speech-to-Text API
• ไม่แพงเกินไป
• แม่นยำพอ และสามารถตรวจคำผิดได้
• มี time-stamp เพื่อแปลงเป็น subtitle ได้

ปฏิบัติ 
Practice

Google Colab
• https://colab.research.google.com
• Free cloud instance
• CPU: Xeon 2.3 Ghz.
• RAM: 12.6 GB
• Disk: 33 GB
• GPU Tesla K80, 2496 cores, 12GB VRAM
• idle cutoff 90 นาที (max 12 ชั่วโมง)

Import & Install
• from IPython.display import HTML, Audio
• !pip install youtube-dl
• !pip install pydub !apt install ffmpeg
• !pip install google-cloud-speech
• from google.cloud import  
speech_v1p1beta1 as speech

youtube-dl
• Download from YouTube in many formats
• !youtube-dl -F [youtube_url] ดู formats
• !youtube-dl -f bestaudio -o ‘audio.%(ext)s’ [url]

librosa, peaks.js
• Librosa
• waveform
• spectrogram
• CQT
• Peaks.js
• interaction

Cloud Authentication
• Register Google Cloud (free $200 credit)
• Create a new project
• Enable Speech API for the project
• Download credential ﬁle, save to Google Drive
• Load credential ﬁle into Colab, set environment

gcloud speech recognize
• For command line, you can also use ‘curl’

Python Library
• เรียก API ด้วย audio และ conﬁg

แบบง่ายสุด

หลาย alternatives
• ระบุ conﬁg เป็น max_alternatives=n

Next
• client.long_running_recognize()
• คำยากๆ ที่สะกดผิด เก็บสถิติแล้ว correct
• context hint ด้วย speechContexts phrases  
(500 คำ)
• speech analysis (waveform, spectrogram) เพื่อ
correct boundary
• เอา label ไป train ASR เช่น KALDI, Deep Speech

Speech-to-Text API - Thai NLP Meetup #2

More Related Content

Featured

Speech-to-Text API - Thai NLP Meetup #2