iOS音頻流混音實踐


背景:

xx項目某個業務流程的某個功能,需要采集用戶的操作全過程(錄屏+錄音),初看需求時,想到ReplayKit 是最切合需求的,因為ReplayKit系統錄屏自帶三路數據分別是視頻幀數據CMSampleBuffer, App音頻PCM CMSampleBuffer,麥克風音頻PCM CMSampleBuffer,如此一來需求瞬間就解決了,但是后來因為麥克風權限被另一個三方通話VoIP功能的SDK搶占了,所以麥克風數據只能由他們提供,並且授權彈窗老是被用戶拒絕,所以ReplayKit錄屏方案被領導否了,另外App內部播放的聲音也是一樣需求業務方提供,如此一來,錄屏獲取圖像可以換成定時器(CADisplayLink)+截圖(繪制圖層獲取UIImageCVPixelBuffer)生成視頻的方案,而音頻推流只能通過混音才能保證音畫同步~

定時器 + 截圖實現錄屏功能的核心代碼

///MARK:- 起一個CADisplayLink定時器 通過CGImageRef => CVPixelBufferRef 
- (void)snapshotWithImage {
    @autoreleasepool {
        //通過Graphics context拿到截屏圖片
        UIGraphicsBeginImageContextWithOptions(self.recordView.bounds.size, NO, 0);
      //這種方式生成的視頻可以錄制到動畫 但是局限性也很明顯比如視頻和相機預覽的圖層無法捕捉到以及系統的一些組件(鍵盤...等)也無法錄制到
        [self.recordView drawViewHierarchyInRect:self.recordView.bounds afterScreenUpdates:NO];
        UIImage *image = UIGraphicsGetImageFromCurrentImageContext();
        UIGraphicsEndImageContext();
        //UIImage 對象獲取 CGImageRef
        CGImageRef imgRef = image.CGImage;
        // CGImage => bitmap
        NSDictionary *options = [NSDictionary dictionaryWithObjectsAndKeys:
                                 [NSNumber numberWithBool:YES], kCVPixelBufferCGImageCompatibilityKey,
                                 [NSNumber numberWithBool:YES], kCVPixelBufferCGBitmapContextCompatibilityKey,
                                 nil];

        CVPixelBufferRef pixelBuffer = NULL;
        CGFloat frameWidth = CGImageGetWidth(imgRef);
        CGFloat frameHeight = CGImageGetHeight(imgRef);
        //創建CVPixelBuffer
        CVReturn status = CVPixelBufferCreate(kCFAllocatorDefault,
                                              frameWidth,
                                              frameHeight,
                                              kCVPixelFormatType_32ARGB,
                                              (__bridge CFDictionaryRef) options,
                                              &pixelBuffer);

        NSParameterAssert(status == kCVReturnSuccess && pixelBuffer != NULL);
        //上鎖
        CVPixelBufferLockBaseAddress(pixelBuffer, 0);
        //獲得基地址
        void *pxdata = CVPixelBufferGetBaseAddress(pixelBuffer);
        NSParameterAssert(pxdata != NULL);
        //獲取設備的顏色通道
        CGColorSpaceRef rgbColorSpace = CGColorSpaceCreateDeviceRGB();
        //創建bitmap
        CGContextRef context = CGBitmapContextCreate(pxdata,
                                                     frameWidth,
                                                     frameHeight,
                                                     8,
                                                     CVPixelBufferGetBytesPerRow(pixelBuffer),
                                                     rgbColorSpace,
                                                     (CGBitmapInfo)kCGImageAlphaNoneSkipFirst);
        NSParameterAssert(context);
        //transform調整
        CGContextConcatCTM(context, CGAffineTransformIdentity);
        // 畫圖
        CGContextDrawImage(context, CGRectMake(0,0,frameWidth,frameHeight),imgRef);
        //釋放
        CGColorSpaceRelease(rgbColorSpace);
        //回調出去 推流處理或者顯示 此處也可利用`AVAssetWriter`寫入本地視頻文件
      if (pixelBuffer != NULL) {
            !self.screenRecordCallback? : self.screenRecordCallback(pixelBuffer);
          }
        //釋放上下文
        CGContextRelease(context);
        //解鎖
        CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
        //釋放buffer
        CVPixelBufferRelease(pixelBuffer);
    }
}

混音的前提條件

混音原理介紹可參考 使用這個混音技術,你也能與愛豆隔空對唱

並非任何兩路音頻流都可以直接混合。兩路音視頻流,必須符合以下條件才能混合:

  • 格式相同,要解壓成 PCM 格式。
  • 采樣率相同,要轉換成相同的采樣率。主流采樣率包括:16k Hz、32k Hz、44.1k Hz 和 48k Hz。
  • 幀長相同,幀長由編碼格式決定,PCM 沒有幀長的概念,開發者自行決定幀長。為了和主流音頻編碼格式的幀長保持一致,推薦采用 20ms 為幀長。
  • 位深(Bit-Depth)或采樣格式 (Sample Format) 相同,承載每個采樣點數據的 bit 數目要相同。
  • 聲道數相同,必須同樣是單聲道或者雙聲道 (立體聲)。這樣,把格式、采樣率、幀長、位深和聲道數對齊了以后,兩個音頻流就可以混合了。

在混音之前,還需要做回聲消除、噪音抑制和靜音檢測等處理。回聲消除和噪音抑制屬於語音前處理范疇的工作。在編碼之前,采集、語音前處理、混音之前的處理、混音和混音之后的處理應該按順序進行。靜音抑制(VAD,Voice Activity Detect)可做可不做。對於終端混音,是要把采集到的主播聲音和從音頻文件中讀到的伴奏聲音混合。如果主播停頓一段時間不發出聲音,通過 VAD 檢測到了,那么這段時間不混音,直接采用伴奏音樂的數據就好了。然而,為了簡單起見,也可以不做 VAD。主播不發聲音的期間,繼續做混音也可以(主播的聲音為零振幅)。

混音算法

​ 參考一個C++的repo代碼

  1. 疊加法: 這種方法數據量比較大,容易溢出 y = a + b + c

    for (int i = 0; i < channels; ++i)
    {
      //疊加法
      sumBuf[i] = LimAmp(Sum(buf1[i], buf2[i], buf3[i], buf4[i]));
    }
    fwrite(sumBuf, sizeof(Int16), NUM, pMux);
    
  2. 加權平均法: 這種方法兩路數據問題不大,隨着音源數量增加,聲音質量會降低 y = (a + b + c)/3

    for (int i = 0; i < channels; ++i)
    {
      //加權平均法
      sumBuf[i] = LimAmp(AAW(buf1[i], buf2[i], buf3[i], buf4[i]));//從打印看沒有溢出的
    }
    fwrite(sumBuf, sizeof(Int16), NUM, pMux);
    
  3. 自定義權重法: 設定比重,哪個聲音大就比重加大一些 y = (sgn(a)*a^2 + sgn(b)*b^2 + sgn(c)*c^2)/(abs(a) + abs(b) + abs(c))

    for (int i = 0; i < channels; ++i)
    {
      //自對齊權重法
      sumBuf[i] = LimAmp(ASW(buf1[i], buf2[i], buf3[i], buf4[i]));
    }
    fwrite(sumBuf, sizeof(Int16), NUM, pMux);
    
  4. 歸一化 參考改進型歸一化混音算法

static void pcmAudioMix(SInt16 *bufferA, SInt16 *bufferB, UInt32 bufferLength){
    char * sourseFile[2];
    sourseFile[0] = (char *)bufferA;
    sourseFile[1] = (char *)bufferB;
    Mix(sourseFile, 2, (char *)bufferB, bufferLength);
}
 
static void Mix(char **buffers,int number,char *mix_buf, UInt32 bufferLength){
    //歸一化混音
    int const MAX = 32767;
    int const MIN = -32768;
    
    double f = 1;
    int output;
    for (int i = 0; i < bufferLength; i++){
        int temp = 0;
        for (int j = 0; j < number; j++){
            char *point = buffers[j];
            if (j == 0) {
                int mixTemp = *(short *)(point + i*2);
                temp += (int)(mixTemp);
            }else{
                temp += *(short *)(point + i*2);
            }
        }
        output = (int)(temp * f);
        
        if (output > MAX){
            f = (double)MAX / (double)(output);
            output = MAX;
        }
        if (output < MIN){
            f = (double)MIN / (double)(output);
            output = MIN;
        }
        if (f < 1){
            f += ((double)1 - f) / (double)32;
        }
        *(short *)(mix_buf + i*2) = (short)output;
    }
}

目前也是采用的該算法

int main()
{
    
    FILE * fp1;
    FILE * fp2;
    FILE * fpmix;
    
    int size = 4*1024;
    int channels = 2;//雙聲道
    //本地pcm文件讀取流的方式進行混合 在線的流得根據實際場景去處理
    NSString *path1 = [[NSBundle mainBundle] pathForResource:@"mic" ofType:@"pcm"];
    NSString *path2 = [[NSBundle mainBundle] pathForResource:@"audio" ofType:@"pcm"];
    // 輸出混合后的pcm文件的地址
    NSString *mix_path = [[NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES) firstObject] stringByAppendingPathComponent:@"mix.pcm"];
    
    // 打開FP讀寫
    fp1 = fopen([path1 UTF8String],"rb");
    if (fp1 == NULL){
        printf("Open FILE1 failed!");
    }
    
    fp2 = fopen([path2 UTF8String],"rb");
    if (fp2 == NULL){
        printf("Open FILE2 failed!");
    }
    
    fpmix = fopen([mix_path UTF8String],"wb");
    if (fpmix == NULL){
        printf("Open MIX_FILE failed!");
    }
    
    short *src_data1, *src_data2, *mix_data;
    //初始化分配內存空間
    src_data1 = (short *)malloc(size);
    if (src_data1 == NULL){
        printf("Malloc data1 failed!");
    }
    
    src_data2 = (short *)malloc(size);
    if (src_data2 == NULL){
        printf("Malloc data2 failed!");
    }
    
    mix_data = (short *)malloc(size);
    if (mix_data == NULL){
        printf("Malloc mix_data failed!");
    }
    
    int ret1,ret2;
    //定義二維數組為數據源格式
    char *sourse_data[2];
    printf("開始混音!!\n");

    //循環讀取文件流數據
    while(1){
        ret1 = fread(src_data1, 1, size, fp1);
        ret2 = fread(src_data2, 1, size, fp2);
        
        sourse_data[0] = (char *)src_data1;
        sourse_data[1] = (char *)src_data2;
        
        if(ret1 > 0 && ret2 > 0){
            //調用混音
            Mix(sourse_data, channels,(char *)mix_data, size);
            fwrite(mix_data, 1, size, fpmix);
        }else if( (ret1 > 0) && (ret2 == 0)){
            //ret2已讀完 把ret1繼續讀完寫入
            fwrite(src_data1, 1, ret1, fpmix);
        }else if( (ret2 > 0) && (ret1 == 0)){
            //ret1已讀完 把ret2繼續讀完寫入
            fwrite(src_data2, 1, ret2, fpmix);
        }else if( (ret1 == 0) && (ret2 == 0)){
            //數據為空 或者均讀取完
            break;
        }
    }
    printf("混合完畢!!\n");
    
    free(src_data1);
    free(src_data2);
    free(mix_data);
    
    fclose(fp1);
    fclose(fp2);
    fclose(fpmix);
    
    return 0;
}

  1. github找的一個a+b-ab的一個實現,不明覺厲就是了~

    #define  MY_INT16_MAX   32767
    #define  MY_INT16_MIN  -32768
    
    // 混音算法
    inline short TPMixSamples(short a, short b)
    {
     int result = a < 0 && b < 0 ? ((int)a + (int)b) - (((int)a * (int)b) / MY_INT16_MIN) : ( a > 0 && b > 0 ? ((int)a + (int)b) - (((int)a * (int)b)/MY_INT16_MAX) : a + b);
     return result > MY_INT16_MAX ? MY_INT16_MAX : (result < MY_INT16_MIN ? MY_INT16_MIN : result);
    }
    
    

常用的相關代碼塊

  • ASDB 音頻格式描述結構體
    AudioStreamBasicDescription inputFormat = {0}; //結構體初始化
    inputFormat.mSampleRate = 44100;//采樣率,每秒鍾的采樣頻率
    inputFormat.mFormatID = kAudioFormatLinearPCM;//格式類型
    inputFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked;//大小端等標識
    inputFormat.mChannelsPerFrame = 2;//聲道數
    inputFormat.mFramesPerPacket = 1;//一個數據包一幀
    inputFormat.mBitsPerChannel = 16;//采樣位數或位深度
    inputFormat.mBytesPerFrame = inputFormat.mBitsPerChannel / 8 * inputFormat.mChannelsPerFrame;//每幀多少個字節
    inputFormat.mBytesPerPacket = inputFormat.mBytesPerFrame * inputFormat.mFramesPerPacket;//一個包幾個字節

  • 音頻CMSampleBufferRefNSData

    - (void)pushAudioBuffer:(CMSampleBufferRef)sampleBuffer {
        AudioBufferList audioBufferList;
        CMBlockBufferRef blockBuffer;
        
        CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer, NULL, &audioBufferList, sizeof(audioBufferList), NULL, NULL, 0, &blockBuffer);
        
        for( int y=0; y<audioBufferList.mNumberBuffers; y++ ) {
            AudioBuffer audioBuffer = audioBufferList.mBuffers[y];
            void* audio = audioBuffer.mData;
            NSData *data = [NSData dataWithBytes:audio length:audioBuffer.mDataByteSize];
            [self pushAudio:data];
        }
        CFRelease(blockBuffer);
    }
    
  • NSDataCMSampleBufferRef

    -(AudioStreamBasicDescription)getASBD{
      	int channels = 2;
        AudioStreamBasicDescription format = {0};
        format.mSampleRate = 44100;
        format.mFormatID = kAudioFormatLinearPCM;
        format.mFormatFlags =  kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked;
        format.mChannelsPerFrame = channels;
        format.mBitsPerChannel = 16;
        format.mFramesPerPacket = 1;
        format.mBytesPerFrame = format.mBitsPerChannel / 8 * format.mChannelsPerFrame;
        format.mBytesPerPacket = format.mBytesPerFrame * format.mFramesPerPacket;
        format.mReserved = 0;
        return format;
    }
    
    - (CMSampleBufferRef)convertAudioSampleWithData:(NSData *)audioData{
        int channels = 2;
        AudioBufferList audioBufferList;
        audioBufferList.mNumberBuffers = 1;
        audioBufferList.mBuffers[0].mNumberChannels = channels;
        audioBufferList.mBuffers[0].mDataByteSize = audioData.length;
        audioBufferList.mBuffers[0].mData = audioData.bytes;
        
        AudioStreamBasicDescription asbd = [self getASBD];
        CMSampleBufferRef buff = NULL;
        static CMFormatDescriptionRef format = NULL;
        CMSampleTimingInfo timing = {CMTimeMake(1,44100), kCMTimeZero, kCMTimeInvalid };
        OSStatus error = 0;
        if(format == NULL){
          error = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &asbd, 0, NULL, 0, NULL, NULL, &format);
        }
            
        error = CMSampleBufferCreate(kCFAllocatorDefault, NULL, false, NULL, NULL, format, len/(2*channels), 1, &timing, 0, NULL, &buff);
        
        if (error) {
            NSLog(@"CMSampleBufferCreate returned error: %ld", (long)error);
            return NULL;
        }
        
        error = CMSampleBufferSetDataBufferFromAudioBufferList(buff, kCFAllocatorDefault, kCFAllocatorDefault, 0, &audioBufferList);
        
        if(error){
            NSLog(@"CMSampleBufferSetDataBufferFromAudioBufferList returned error: %ld", (long)error);
            return NULL;
        }
        return buff;
    }
    

​ 其實iOS底層AudioUnit框架可以通過輸入輸出不同的bus進行混音 可參考AUGraph結合RemoteI/O Unit與Mixer Unit, 但是局限在於需要調用硬件接口,則需要麥克風權限以及揚聲器都需設置相關的音頻會話 AVAudioSession ,如果是本地文件+麥克風錄音用系統提供的就OK了

業務場景較為復雜且數據源分散由不同的SDK提供,這時就只能做數據層的處理了,避免各個SDK之間相互搶占系統音頻會話的設置權限.

參考文獻


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM