本文檔嘗試用Video Toolbox進行H.265(HEVC)硬件編碼,視頻源為iPhone后置攝像頭。去年做完硬解H.264,沒做編碼,技能上感覺有些缺失。正好剛才發現CMFormatDescription.h中enum : CMVideoCodecType
提供了kCMVideoCodecType_HEVC
枚舉值。所以呢,作死試試 iOS 9.2 硬編HEVC。
結論:不支持開發者使用H.265(HEVC),可以用H.264(AVC)。
1、讀取iPhone后置攝像頭
提示:iPhone不支持同時打開前后攝像頭。因為SoC目前通常只有一個視頻通道(Video Channel),當有兩個AVCaptureSession先后運行,前一個會自動停止,后一個會繼續運行。或者,有人想一個AVCaptureSession添加前后攝像頭作為AVCaptureDeviceInput,這樣會異常。因為兩個辦法我都試過。
iOS 8及后續版本,打開攝像頭需要用戶授權。
1.1、指定攝像頭
我使用iPhone 6p當測試機,它有兩個攝像頭,要指定需使用的攝像頭,在此使用后置攝像頭當數據源。
AVCaptureDevice *avCaptureDevice;
NSArray *cameras = [AVCaptureDevice devicesWithMediaType:AVMediaTypeVideo];
for (AVCaptureDevice *device in cameras) {
if (device.position == AVCaptureDevicePositionBack) {
avCaptureDevice = device;
}
}
若想直接使用后置攝像頭,可簡化上述代碼。
AVCaptureDevice * avCaptureDevice = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeVideo];
1.2、打開攝像頭
對於捕獲攝像頭,整個行為都由AVCaptureSession會話類維護,簡化了編程復雜度。輸入為攝像頭,輸出為用戶需要的通道,如屏幕。
NSError *error = nil;
AVCaptureDeviceInput *videoInput = [AVCaptureDeviceInput deviceInputWithDevice:avCaptureDevice error:&error];
if (!videoInput) {
return;
}
AVCaptureSession *avCaptureSession = [[AVCaptureSession alloc] init];
avCaptureSession.sessionPreset = AVCaptureSessionPresetHigh; // sessionPreset為AVCaptureSessionPresetHigh,可不顯式指定
[avCaptureSession addInput:videoInput];
配置好輸入,現在配置輸出,即攝像頭的輸出數據格式等。由AVCaptureDevice.formats可知當前設備支持的像素格式,對於iPhone 6,就兩個默認格式:420f和420v。需要輸出32BGRA,則需AVCaptureSession進行配置kCVPixelBufferPixelFormatTypeKey,已測可用值為
-
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange
,即420v -
kCVPixelFormatType_420YpCbCr8BiPlanarFullRange
,即420f -
kCVPixelFormatType_32BGRA
,iOS在內部進行YUV至BGRA格式轉換
YUV420一般用於標清視頻,YUV422用於高清視頻,這里的限制讓人感到意外。但是,在相同條件下,YUV420計算耗時和傳輸壓力比YUV422都小。
AVCaptureVideoDataOutput *avCaptureVideoDataOutput = [[AVCaptureVideoDataOutput alloc] init];
NSDictionary*settings = @{(__bridge id)kCVPixelBufferPixelFormatTypeKey: @(kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange)};
avCaptureVideoDataOutput.videoSettings = settings;
dispatch_queue_t queue = dispatch_queue_create("com.github.michael-lfx.back_camera_io", NULL);
[avCaptureVideoDataOutput setSampleBufferDelegate:self queue:queue];
[avCaptureSession addOutput:avCaptureVideoDataOutput];
添加預覽界面。
AVCaptureVideoPreviewLayer *previewLayer = [AVCaptureVideoPreviewLayer layerWithSession:avCaptureSession];
previewLayer.frame = self.view.bounds;
previewLayer.videoGravity= AVLayerVideoGravityResizeAspectFill;
[self.view.layer addSublayer:previewLayer];
啟動會話。
[avCaptureSession startRunning];
啟動應用可看到攝像頭當前圖像。
1.3、從回調中獲取攝像頭數據
默認情況下,iPhone 6p為30 fps,意味着如下函數每秒調用30次,那么,先簡單打印攝像頭輸出數據的信息。
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection {
CVPixelBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
if (CVPixelBufferIsPlanar(pixelBuffer)) {
NSLog(@"kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange -> planar buffer");
}
CMVideoFormatDescriptionRef desc = NULL;
CMVideoFormatDescriptionCreateForImageBuffer(NULL, pixelBuffer, &desc);
CFDictionaryRef extensions = CMFormatDescriptionGetExtensions(desc);
NSLog(@"extensions = %@", extensions);
}
結果如下:
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange -> planar buffer
extensions = {
CVBytesPerRow = 2904;
CVImageBufferColorPrimaries = "ITU_R_709_2";
CVImageBufferTransferFunction = "ITU_R_709_2";
CVImageBufferYCbCrMatrix = "ITU_R_709_2";
Version = 2;
}
在我有限的視頻基礎中,ITU_R_709_2是HD視頻的方案,一般用於YUV422,YUV至RGB的轉換矩陣和SD視頻(一般是ITU_R_601_4)並不相同。
CVPixelBufferGetPixelFormatType()可獲取攝像頭輸出的像素數據格式,和前面指定的格式一致。
在當iPhone 6上運行且將sessionPreset設置為AVCaptureSessionPreset640x480,得到如下輸出結果。
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange -> planar buffer
extensions = {
CVBytesPerRow = 964;
CVImageBufferColorPrimaries = "ITU_R_709_2";
CVImageBufferTransferFunction = "ITU_R_709_2";
CVImageBufferYCbCrMatrix = "ITU_R_601_4";
Version = 2;
}
分析一下CVBytesPerRow。CVBytesPerRow值964與CVPixelBufferGetBytesPerRow函數返回值一致。從預置可知,Y平面為640和CVPixelBufferGetWidth、CVPixelBufferGetWidthOfPlane(0)函數返回值一致。
CVPixelBufferGetBytesPerRow文檔
The number of bytes per row of the image data. For planar buffers, this function returns a rowBytes value such that bytesPerRow * height covers the entire image, including all planes.
從上述文檔可知CVPixelBufferGetBytesPerRow返回Planar緩沖區多個通道的寬度和,在此是Y、UV通道的寬度和:Y + U + V = 640 + (640/2 + 640/2) = 1280。當然,這個計算方式是錯的。按YUV420采樣規則計算,則一個像素點用8+2+2表示,即是,每個像素點12個位,那么每行圖像實際擁有字節數為
640x12/8 = 960
與CVBytesPerRow不等。下面,再用理論公式計算圖像的體積。
通過CVPixelBufferGetHeight得到高為480,圖像體積為
640x480 + ((640/2) x (480/2)) + ((640/2) x (480/2))
=>640x480x3/2
=>460800
而CVPixelBufferGetDataSize返回462728,顯然不相等。就像FFmpeg出於加速讀取內存的目的,在AVFrame.data中加入填充數據,導致AVFrame.linesize >= AVFrame.width。那么,CVPixelBuffer是否存在行為呢?
size_t extraColumnsOnLeft;
size_t extraColumnsOnRight;
size_t extraRowsOnTop;
size_t extraRowsOnBottom;
CVPixelBufferGetExtendedPixels(pixelBuffer,
&extraColumnsOnLeft,
&extraColumnsOnRight,
&extraRowsOnTop,
&extraRowsOnBottom);
NSLog(@"extra (left, right, top, bottom) = (%ld, %ld, %ld, %ld)",
extraColumnsOnLeft,
extraColumnsOnRight,
extraRowsOnTop,
extraRowsOnBottom);
上述代碼輸出結果都為0,並無拓展像素。此問題留待解決。
2、VideoToolbox HEVC、AVC編碼嘗試
iOS支持硬編H.264(AVC)的Profile與Level描述在VTCompressionProperties.h,簡單總結為:
-
Baseline
-
1 - 3
-
3 - [0, 2]
-
4 - [0, 2]
-
5 - [0, 2]
-
自動Profile、Level
-
Main
-
3 - [0, 2]
-
4 - [0, 2]
-
5 - [0, 2]
-
自動Profile、Level
-
Extended Main
-
5 - [0]
-
自動Profile、Level
-
High
-
3 - [0, 2]
-
4 - [0, 2]
-
5 - [0, 2]
-
自動Profile、Level
VideoToolbox編碼算法如下:
-
創建編碼會話
-
准備編碼
-
逐幀編碼
-
結束編碼
2.1、創建編碼會話
// 獲取攝像頭輸出圖像的寬高
size_t width = CVPixelBufferGetWidth(pixelBuffer);
size_t height = CVPixelBufferGetHeight(pixelBuffer);
static VTCompressionSessionRef compressionSession;
OSStatus status = VTCompressionSessionCreate(NULL,
width, height,
kCMVideoCodecType_H264,
NULL,
NULL,
NULL, &compressionOutputCallback, NULL, &compressionSession);
kCMVideoCodecType_H264
改成kCMVideoCodecType_HEVC
,在iOS 9.2.1 iPhone 6p、iPhone 6sp執行均返回錯誤-12908,kVTCouldNotFindVideoEncoderErr
,找不到編碼器。看來iOS 9.2並不開放HEVC編碼器。
編碼回調函數定義如下:
static void compressionOutputCallback(void * CM_NULLABLE outputCallbackRefCon, void * CM_NULLABLE sourceFrameRefCon,
OSStatus status,
VTEncodeInfoFlags infoFlags,
CM_NULLABLE CMSampleBufferRef sampleBuffer ) {
if (status != noErr) {
NSLog(@"%s with status(%d)", __FUNCTION__, status);
return;
}
if (infoFlags == kVTEncodeInfo_FrameDropped) {
NSLog(@"%s with frame dropped.", __FUNCTION__);
return;
}
/* ------ 輔助調試 ------ */
CMFormatDescriptionRef fmtDesc = CMSampleBufferGetFormatDescription(sampleBuffer); CFDictionaryRef extensions = CMFormatDescriptionGetExtensions(fmtDesc); NSLog(@"extensions = %@", extensions);
CMItemCount count = CMSampleBufferGetNumSamples(sampleBuffer); NSLog(@"samples count = %d", count); /* ====== 輔助調試 ====== */
// 推流或寫入文件
}
編碼成功時輸出如下信息:
extensions = {
FormatName = "H.264";
SampleDescriptionExtensionAtoms = {
avcC = <014d0028 ffe1000b 274d0028 ab603c01 13f2a001 000428ee 3c30>;
};
}
samples count = 1
采樣數據為1,並不意味着slice數量為1。目前沒找到輸出多slice碼流(多個I、P Slice)的參數配置。sampleBuffer的詳細信息示例如下:
CMSampleBuffer 0x126e9fd80 retainCount: 1 allocator: 0x1a227cb68
invalid = NO
dataReady = YES
makeDataReadyCallback = 0x0
makeDataReadyRefcon = 0x0
formatDescription = <CMVideoFormatDescription 0x126e9fd50 [0x1a227cb68]> {
mediaType:'vide'
mediaSubType:'avc1'
mediaSpecific: {
codecType: 'avc1' dimensions: 1920 x 1080
}
extensions: {<CFBasicHash 0x126e9eae0 [0x1a227cb68]>{type = immutable dict, count = 2, entries =>
0 : <CFString 0x19dd523e0 [0x1a227cb68]>{contents = "SampleDescriptionExtensionAtoms"} = <CFBasicHash 0x126e9e090 [0x1a227cb68]>{type = immutable dict, count = 1, entries =>
2 : <CFString 0x19dd57c20 [0x1a227cb68]>{contents = "avcC"} = <CFData 0x126e9e1b0 [0x1a227cb68]>{length = 26, capacity = 26, bytes = 0x014d0028ffe1000b274d0028ab603c01 ... a001000428ee3c30} }
2 : <CFString 0x19dd52440 [0x1a227cb68]>{contents = "FormatName"} = H.264} } }
sbufToTrackReadiness = 0x0
numSamples = 1
sampleTimingArray[1] = {
{PTS = {196709596065916/1000000000 = 196709.596}, DTS = {INVALID}, duration = {INVALID}},
}
sampleSizeArray[1] = {
sampleSize = 5707,
}
sampleAttachmentsArray[1] = {
sample 0: DependsOnOthers = false
}
dataBuffer = 0x126e9fc50
為方便調試,可將H264文件寫入文件,用VLC等工具分析,這是本系列文檔第二篇:
iOS VideoToolbox硬編H.265(HEVC)H.264(AVC):2 H264數據寫入文件。
下面介紹avcC的作用。
avcC
放入CFDictionaryRef
然后傳遞至CMVideoFormatDescriptionCreate
,創建視頻格式描述,接着創建解碼會話,開始解碼。
由此也可發現,VideoToolbox編碼輸出為avcC格式,而且VideoToolbox也只支持avcC格式的H.264。如果從網絡中得到Annex-B格式的H.264數據(一般稱作H.264裸流或Elementary Stream),用CMVideoFormatDescriptionCreateFromH264ParameterSets
創建視頻格式描述更方便,同時解碼時需要將Annex-B轉換成avcC,這也是WWDC2014 513 "direct access to media encoding and decoding"中說VideoToolbox只支持MP4容器裝載的H.264數據的原因,就我所知,當寫入MP4時,Annex-B使用的起始碼(Start Code)會被寫成長度(Length)。這就是VideoToolBox硬解最容易出問題的點,我去年做硬解花了很長時間就是因為不了解H.264相關知識,各種出錯。
2.2、准備編碼
開始編碼前,可配置H.264 Profile、Level、幀間距等設置,它們最終體現在SPS、PPS,指導解碼器進行解碼操作。
VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_ProfileLevel, kVTProfileLevel_H264_Main_AutoLevel);
// 等等一系列屬性
OSStatus status = VTCompressionSessionPrepareToEncodeFrames(compressionSession);
if (status != noErr) {
// FAILED.
}
本系列文檔第二篇iOS VideoToolbox硬編H.265(HEVC)H.264(AVC):2 H264數據寫入文件進一步解釋SPS、PPS。
2.3、逐幀編碼
編碼前,一般會鎖定像素緩沖區基位置,編碼完解除。同時,需要指定顯示時間戳和持續時間。
if(CVPixelBufferLockBaseAddress(pixelBuffer, 0) != kCVReturnSuccess) {
// FAILED.
}
CMTime presentationTimeStamp = CMSampleBufferGetOutputPresentationTimeStamp(sampleBuffer);
CMTime duration = CMSampleBufferGetOutputDuration(sampleBuffer);
status = VTCompressionSessionEncodeFrame(compressionSession, pixelBuffer, presentationTimeStamp, duration, NULL, pixelBuffer, NULL);
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
編碼不像解碼一樣可以指定VTDecodeFrameFlags
為同步操作,所以編碼的回調是異步的。異步雖然提高了代碼運行效率,同時帶來整理幀序等額外操作,讓音頻同步編碼等操作變復雜。
2.4、結束編碼
編碼結束時,調用VTCompressionSessionCompleteFrames
停止編碼並指示編碼器如何處理已編碼及待編碼幀。
接着調用VTCompressionSessionInvalidate
結束會話,否則硬件容易異常,需要重啟手機。
最后釋放VTCompressionSession
。
3、討論
WWDC2014 513 "direct access to media encoding and decoding" 提及了在實時要求不高的場合,編碼用MultiPass可得到更好的效果。我並沒嘗試。