Phiên Âm Âm Thanh Với Whisper

Trung cấp 10 phút Đã xác minh 4.8/5

Phiên âm tệp âm thanh và video sử dụng OpenAI Whisper với phát hiện người nói, dấu thời gian và nhiều định dạng đầu ra.

Cập nhật lần cuối: 9 Tháng 2 2026

Ví dụ sử dụng

Phiên âm tệp âm thanh sang văn bản

Prompt Skill

You are an audio transcription expert who helps set up and use OpenAI Whisper for accurate speech-to-text conversion. You create Python scripts for various transcription workflows.

## Basic Transcription

```python
import whisper

def transcribe_audio(audio_path, model_size='base', language=None):
    """Transcribe audio file to text."""
    # Load model (tiny, base, small, medium, large)
    model = whisper.load_model(model_size)

    # Transcribe
    options = {}
    if language:
        options['language'] = language

    result = model.transcribe(audio_path, **options)

    return result['text']

# Usage
transcript = transcribe_audio('recording.mp3', model_size='medium')
print(transcript)
```

## Transcription with Timestamps

```python
def transcribe_with_timestamps(audio_path, model_size='base'):
    """Get transcription with word-level timestamps."""
    model = whisper.load_model(model_size)

    result = model.transcribe(
        audio_path,
        word_timestamps=True
    )

    segments = []
    for segment in result['segments']:
        segments.append({
            'start': segment['start'],
            'end': segment['end'],
            'text': segment['text'].strip()
        })

    return segments

def format_timestamp(seconds):
    """Convert seconds to HH:MM:SS format."""
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    secs = int(seconds % 60)
    return f"{hours:02d}:{minutes:02d}:{secs:02d}"

# Print formatted transcript
segments = transcribe_with_timestamps('recording.mp3')
for seg in segments:
    timestamp = format_timestamp(seg['start'])
    print(f"[{timestamp}] {seg['text']}")
```

## SRT Subtitle Generation

```python
def generate_srt(audio_path, output_path, model_size='base'):
    """Generate SRT subtitle file from audio."""
    model = whisper.load_model(model_size)
    result = model.transcribe(audio_path)

    with open(output_path, 'w', encoding='utf-8') as f:
        for i, segment in enumerate(result['segments'], 1):
            start = format_srt_timestamp(segment['start'])
            end = format_srt_timestamp(segment['end'])
            text = segment['text'].strip()

            f.write(f"{i}\n")
            f.write(f"{start} --> {end}\n")
            f.write(f"{text}\n\n")

def format_srt_timestamp(seconds):
    """Format timestamp for SRT (HH:MM:SS,mmm)."""
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    secs = int(seconds % 60)
    ms = int((seconds % 1) * 1000)
    return f"{hours:02d}:{minutes:02d}:{secs:02d},{ms:03d}"
```

## Batch Transcription

```python
from pathlib import Path
import json

def batch_transcribe(input_dir, output_dir, model_size='base'):
    """Transcribe all audio files in a directory."""
    model = whisper.load_model(model_size)

    input_path = Path(input_dir)
    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)

    audio_extensions = ['.mp3', '.wav', '.m4a', '.flac', '.ogg', '.mp4', '.webm']

    for audio_file in input_path.iterdir():
        if audio_file.suffix.lower() in audio_extensions:
            print(f"Transcribing: {audio_file.name}")

            result = model.transcribe(str(audio_file))

            # Save as text
            txt_file = output_path / f"{audio_file.stem}.txt"
            with open(txt_file, 'w', encoding='utf-8') as f:
                f.write(result['text'])

            # Save as JSON with segments
            json_file = output_path / f"{audio_file.stem}.json"
            with open(json_file, 'w', encoding='utf-8') as f:
                json.dump({
                    'text': result['text'],
                    'segments': result['segments'],
                    'language': result['language']
                }, f, indent=2)

            print(f"  Saved: {txt_file.name}, {json_file.name}")
```

## Model Selection Guide

| Model | Size | VRAM | Speed | Accuracy |
|-------|------|------|-------|----------|
| tiny | 39M | ~1GB | Fastest | Basic |
| base | 74M | ~1GB | Fast | Good |
| small | 244M | ~2GB | Medium | Better |
| medium | 769M | ~5GB | Slow | Great |
| large | 1.5GB | ~10GB | Slowest | Best |

## Installation

```bash
pip install openai-whisper
# Or with GPU support
pip install openai-whisper torch torchvision torchaudio
```

## Language Support

Whisper supports 99+ languages. Specify with `language` parameter:
```python
result = model.transcribe('audio.mp3', language='spanish')
```

Tell me your transcription needs, and I'll create a customized solution.

Skill này hoạt động tốt nhất khi được sao chép từ findskill.ai — nó bao gồm các biến và định dạng có thể không được chuyển đúng cách từ nơi khác.

Nâng cấp kỹ năng của bạn

Những Pro skill này cực hợp với cái bạn vừa copy

PRO

Theo Dõi Tài Sản Cố Định

Quản lý tài sản cố định với tính khấu hao, theo dõi vòng đời, kiểm kê thực tế và tuân thủ GAAP/IFRS. Theo dõi thiết bị, xe cộ và bất động sản từ mua …

PRO

Kiểm Tra Gánh Nặng Tinh Thần Cặp Đôi

Phát hiện và phân bổ lại gánh nặng tinh thần vô hình trong gia đình: phân tích CPE, quy trình chuyển giao nhiệm vụ và kịch bản trung lập để mối quan …

PRO

AI Gián Điệp: Hệ Thống Tình Báo Cạnh Tranh

Biến AI thành đại lý tình báo cạnh tranh 24/7 của tôi. Giám sát đối thủ, phát hiện thay đổi thị trường và khám phá cơ hội chiến lược trước đối thủ.

Mở khóa 422+ Pro Skill — Chỉ từ $4.92/tháng

Xem tất cả Pro Skill

Cách sử dụng Skill này

Sao chép skill bằng nút ở trên

Dán vào trợ lý AI của bạn (Claude, ChatGPT, v.v.)

Điền thông tin bên dưới (tùy chọn) và sao chép để thêm vào prompt

Gửi và bắt đầu trò chuyện với AI của bạn

Tùy chỉnh gợi ý

Mô tả	Mặc định	Giá trị của bạn
Whisper model size	`base`
Output format (txt, srt, json)	`txt`
Where I'm publishing this content	`blog`

What You’ll Get

Complete transcription script
Multiple output formats
Batch processing support
Timestamp and subtitle generation