FFmpeg4.0+SDL2.0笔记03：Playing Sound

环境

背景：在系统性学习FFmpeg时，发现官方推荐教程还是15年的，不少接口已经弃用，大版本也升了一级，所以在这里记录下FFmpeg4.0 SDL2.0的学习过程。

win10,VS2019,FFmpeg4.3.2,SDL2.0.14

原文地址：http://dranger.com/ffmpeg/tutorial03.html

音频

播放音频，SDL也提供了相关的接口SDL_OpenAudio，该接口可以用来打开音频设备。它的入参是SDL_AudioSpec，我们要在这个结构体里填入输出音频的相关信息。

写代码之前，先简单了解一下采样，采样率和通道的概念。数字音频由一串很长的采样流组成，每个采样代表一个音频波形的值。音频会以一个特定的采样率录制（采样率是每秒对声音的采样次数，单位Hz），例如广播的采样率是22050Hz，CD采样率是44100Hz。大多数音频会使用多个通道来实现立体声或环绕声的效果，比如立体声有两个通道，也就是一次会播放两个采样。

SDL播放音频的方法是这样的：

设置好入参SDL_AudioSpec里的采样格式，采样率，通道数等参数，以及回调函数和userdata。然后调用SDL_OpenAudio，打开音频设备，同时返回给我们另一个SDL_AudioSpec结构体，这个结构体里的参数才是最后真正应用的参数，它与我们设置的参数可能会有出入。最后调用SDL_PauseAudio，真正开始播放音频，之后SDL会在内部不断调用我们的回调函数，向我们要音频数据填充到它的buffer里。

配置音频

有了上述基本概念，可以开始写代码了。首先找到音频流并初始化音频AVCodecContext，方法与之前找视频流一模一样。

代码语言：javascript复制

    AVCodecContext* pAudioCodecCtx = nullptr;
    AVCodecParameters* pAudioCodecPar = nullptr;
    AVCodec* pAudioCodec = nullptr;
    int iAudioStream = -1;

    //找到视频流编码信息
    for (unsigned i = 0; i < pFormatCtx->nb_streams;   i) {
        if (iVideoStream == -1 && pFormatCtx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
            iVideoStream = i;
        }
        else if (iAudioStream == -1 && pFormatCtx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
            iAudioStream = i;
        }
    }
    if (iVideoStream == -1) {
        cout << "couldn't find video stream" << endl;
        return -1;
    }
    if (iAudioStream == -1) {
        cout << "couldn't find audio stream" << endl;
        return -1;
    }

        //audio codec初始化
        pAudioCodecPar = pFormatCtx->streams[iAudioStream]->codecpar;
        pAudioCodec = avcodec_find_decoder(pAudioCodecPar->codec_id);
        if (pAudioCodec == nullptr) {
            cout << "avcodec_find_decoder failed" << endl;
            return -1;
        }

        //生成一份codec context，供avcodec_open2用
        pAudioCodecCtx = avcodec_alloc_context3(pAudioCodec);
        if (avcodec_parameters_to_context(pAudioCodecCtx, pAudioCodecPar) < 0) {
            cout << "avcodec_parameters_to_context failed" << endl;
            return -1;
        }

        //open codec
        if (avcodec_open2(pAudioCodecCtx, pAudioCodec, nullptr) < 0) {
            cout << "avcodec_open2 failed" << endl;
            return -1;
        }

由于FFmpeg4.0直接解码出来的音频SDL无法直接播放，所以需要使用<libswresample/swresample.h>做音频转码，下面是配置和初始化SwrContext（吐槽一下这里ffmpeg和SDL部分参数的命名，对不上也不能望文生义，初学时可能会看得一脸懵逼）

代码语言：javascript复制

    uint64_t inChannelLayout;

    AVSampleFormat outSampleFormat = AV_SAMPLE_FMT_S16;
    int outSampleRate = 0;
    int outSamples = 0;    //样本数
    uint64_t outChannelLayout = AV_CH_LAYOUT_STEREO;  //通道布局 输出双声道
    int outChannels = 0;        //通道数
    uint8_t* outBuffer = nullptr;

        //输入格式
        inChannelLayout = av_get_default_channel_layout(pAudioCodecCtx->channels);  //通道布局 

        //输出格式
        outSampleFormat = AV_SAMPLE_FMT_S16;
        outSampleRate = 44100;   //采样率
        outSamples = pAudioCodecCtx->frame_size;    //样本数
        outChannelLayout = AV_CH_LAYOUT_STEREO;  //通道布局 输出双声道
        outChannels = av_get_channel_layout_nb_channels(outChannelLayout);        //通道数
        outBuffer = (uint8_t*)av_malloc(kMaxAudioFrameSize * 2);

        pSwrCtx = swr_alloc_set_opts(NULL,
            outChannelLayout, outSampleFormat, outSampleRate,
            inChannelLayout, pAudioCodecCtx->sample_fmt, pAudioCodecCtx->sample_rate,
            0, NULL);
        swr_init(pSwrCtx);

以上是ffmpeg解码与转码的准备工作，接下来配置SDL音频播放部分。

代码语言：javascript复制

    
typedef struct SDLFFmpegAudioContext {
	SwrContext* pSwrCtx;
	AVCodecContext* pAudioCodecCtx;
	uint8_t* outBuffer;
}SDLFFmpegAudioContext;
    
    
SDL_AudioSpec desiredSpec;
SDL_AudioSpec obtainedSpec;
SDLFFmpegAudioContext* pSDLFFmpegAudioCtx;
	
//自定义的结构体，用来装载ffmpeg解码、转码、buffer组件
pSDLFFmpegAudioCtx= (SDLFFmpegAudioContext*)malloc(sizeof(SDLFFmpegAudioContext));
pSDLFFmpegAudioCtx->pAudioCodecCtx = pAudioCodecCtx;
pSDLFFmpegAudioCtx->pSwrCtx = pSwrCtx;
pSDLFFmpegAudioCtx->outBuffer = outBuffer;
    
desiredSpec.freq = 22050;
desiredSpec.format = AUDIO_S16SYS;
desiredSpec.channels = outChannels;
desiredSpec.silence = 0;
desiredSpec.samples = outSamples;
desiredSpec.callback = audioCallback;
desiredSpec.userdata = pSDLFFmpegAudioCtx;
if (SDL_OpenAudio(&desiredSpec, &obtainedSpec) < 0) {
     cout << "SDL_OpenAudio failed:" << SDL_GetError() << endl;
     return -1;
}

来看一下这些参数：

freq：即采样率，注意这里的采样率是22050，是44100的一半。
format：音频格式，AUDIO_S16SYS，其中S16=signed 16bit long，就是说每个采样是16位有符号整数，SYS表示字节序由所在操作系统决定。FFmpeg4.0解码出的音频格式必须要再经过转码才能得到该格式。
channels：音频通道数。
silence：静音值，即把该值填满SDL的buffer时就静音。由于样本是16位有符号整数，所以一般取0即可。
samples：这是SDL回调函数里的音频bufferSize，取值一般在[512,8192]，我测试的视频是2048。
callback：回调函数，后续会详细讲
userdata：回调函数中带的userdata，原教程中因为只需要解码，所以传的是AVCodecCtx，而现在还需要转码，所以传入了我们自定义的一个结构体SDLFFmpegAudioContext，装载必须的组件；

阻塞队列

教程里借用阻塞队列，在主线程里读取音频包，在SDL回调函数里解码并填充buffer。这里我把原来C-style代码简单封装成了类。PS：Java在Concurrent包里实现了阻塞队列ArrayBlockingQueue，我之前用过它来送NV21数据给人脸识别库。不得不说，JAVA各种现成的工具类比C/C 方便多了。

代码语言：javascript复制

class PacketQueue {
public:
    PacketQueue()
        :firstPkt_(nullptr),
        lastPkt_(nullptr),
        nbPakcets_(0),
        mutex_(nullptr),
        cond_(nullptr),
        size_(0),
        quit_(0) {

    }
    ~PacketQueue() {
    }

    int init() {
        quit_ = 0;
        mutex_ = SDL_CreateMutex();
        cond_ = SDL_CreateCond();
        return 0;
    }

    int deinit() {
        quit_ = 1;
        SDL_DestroyMutex(mutex_);
        SDL_DestroyCond(cond_);
        return 0;
    }

    int push(AVPacket* packet) {
        if (av_packet_make_refcounted(packet) < 0)
            return -1;
        AVPacketList* node = (AVPacketList*)av_malloc(sizeof(AVPacketList));
        if (!node)return -1;
        node->pkt = *packet;
        node->next = nullptr;

        SDL_LockMutex(mutex_);

        if (!firstPkt_)
            firstPkt_ = node;
        else
            lastPkt_->next = node;
        lastPkt_ = node;
          nbPakcets_;
        size_  = packet->size;
        SDL_CondSignal(cond_);

        SDL_UnlockMutex(mutex_);
        return 0;
    }

    int pop(AVPacket* packet, int block) {
        int ret = 0;
        SDL_LockMutex(mutex_);

        for (;;) {
            if (quit_) {
                ret = -1;
                break;
            }
            if (nbPakcets_ > 0) {
                AVPacketList* node = firstPkt_;
                firstPkt_ = firstPkt_->next;
                if (!firstPkt_)lastPkt_ = nullptr;

                *packet = node->pkt;
                --nbPakcets_;
                size_ -= packet->size;
                av_free(node);

                ret = 1;
                break;
            }
            else if (!block) {
                break;
            }
            else {
                SDL_CondWait(cond_, mutex_);
            }
        }

        SDL_UnlockMutex(mutex_);

        return ret;
    }

private:
    AVPacketList* firstPkt_, * lastPkt_;
    int nbPakcets_;
    //the size is not list size, but sum of all packets' size
    int size_;
    SDL_mutex* mutex_;
    SDL_cond* cond_;
    int quit_;
};

注意这里有一个quit_变量用来退出循环，避免线程无法退出。SDL库会自动捕捉终止信号，我们应用也要做出相应变化

代码语言：javascript复制

        SDL_PollEvent(&event);
        switch (event.type) {
        case SDL_QUIT:
            gQuit = 1;
            gPacketQueue.deinit();
            SDL_Quit();
            exit(0);
            break;
        default:
            break;
        }

传递音频包

还有最后两步初始化，init初始化阻塞队列，SDL_PauseAudio真正打开音频设备，并在我们给它喂数据前保持静音播放。

代码语言：javascript复制

        gPacketQueue.init();
        SDL_PauseAudio(0);

av_read_frame读取音频包，并push到阻塞队列中。

代码语言：javascript复制

while (av_read_frame(pFormatCtx, &packet) >= 0) {
		if (packet.stream_index == iVideoStream) {
			...
		}
		else if (packet.stream_index == iAudioStream) {
			gPacketQueue.push(&packet);
		}
		else {
			av_packet_unref(&packet);
		}
		SDL_PollEvent(&event);
		switch (event.type) {
		case SDL_QUIT:
			gQuit = 1;
			gPacketQueue.deinit();
			SDL_Quit();
			exit(0);
			break;
		default:
			break;
		}
	}

处理音频包

代码语言：javascript复制

void audioCallback(void* userdata, uint8_t* stream, int len) {
    SDLFFmpegAudioContext* pCtx = (SDLFFmpegAudioContext *)userdata;
    int decodedLen, decodedAudioSize = 0;

    static uint8_t audioBuf[kMaxAudioFrameSize * 3 / 2];
    static unsigned audioBufSize = 0;
    static unsigned audioBufIndex = 0;
    while (len > 0) {
        if (audioBufIndex >= audioBufSize) {
            //所有音频数据已发出，从队列里再拿一份
            decodedAudioSize = audioDecodeFrame(pCtx, audioBuf, sizeof(audioBuf));
            if (decodedAudioSize < 0) {
                //拿不到数据，静音播放
                audioBufSize = kSDLAudioSize;
                memset(audioBuf, 0, audioBufSize);
            }
            else {
                audioBufSize = decodedAudioSize;
            }
            audioBufIndex = 0;
        }
        decodedLen = audioBufSize - audioBufIndex;
        if (decodedLen > len)
            decodedLen = len;
        memcpy(stream, audioBuf   audioBufIndex, decodedLen);
        len -= decodedLen;
        stream  = decodedLen;
        audioBufIndex  = decodedLen;
    }
}

一个简单的循环，完成了解码和播放工作，整体逻辑如下：

如果audioBuf里的数据不够填充SDL的buffer，则通过audioDecodeFrame获取解码后的音频数据
将解码后的音频数据填充SDL的buffer，如果buffer满了，则退出回调函数，SDL会在内部播放buffer里的数据，如果buffer美满，则重复上一步。
注意这里audioBuf等用了static修饰，即使一次拿到的数据多于SDL的buffer，也能保存下来供下次回调时使用。
audioBufSize是ffmpeg会返给我们的最大音频长度*1.5，留足了缓冲空间。

音频解码

代码语言：javascript复制

int audioDecodeFrame(SDLFFmpegAudioContext* pCtx, uint8_t* audioBuf, int audioBufSize) {
    static AVPacket packet;
    static AVFrame frame;

    int len = -1;

    if (gQuit || gPacketQueue.pop(&packet, 1) < 0) {
        return -1;
    }
    if (avcodec_send_packet(pCtx->pAudioCodecCtx, &packet)) {
        cerr << "avcodec_send_packet failed" << endl;
    }
    if (avcodec_receive_frame(pCtx->pAudioCodecCtx, &frame) == 0) {
		int samples = swr_convert(pCtx->pSwrCtx, &pCtx->outBuffer, kMaxAudioFrameSize, (const uint8_t**)frame.data, frame.nb_samples);
		len = av_samples_get_buffer_size(nullptr, frame.channels, samples, AV_SAMPLE_FMT_S16, 1);
		memcpy(audioBuf, frame.data[0], len);
    }
    if (packet.data)
        av_packet_unref(&packet);
    return len;
}

原先的音频解码接口avcodec_decode_audio4已被弃用，这里使用了更为简单的新接口avcodec_send_packet和avcodec_receive_frame，流程很简单，从队列里拿音频数据，送给ffmpeg解码，拿解码后的数据，转换成SDL能播放的格式。

以上就是音频播放的全部内容。由于有采样率在，所以声音已经是正常速度播放了。

代码：https://github.com/onlyandonly/ffmpeg_sdl_player

编程算法视频处理 r语言

0 人点赞