使用Python讀取Mp3的標簽信息

本文轉載自查看原文 2018-06-22 14:39 2610 mp3/ id3v2/ python/ id3v1/ Python

什么是ID3

MP3是音頻文件最流行的格式，它的全稱是 MPEG layer III。但是這種格式不支持對於音頻內容的描述信息，包括歌曲名稱、演唱者、專輯等等。

因此在1996年，Eric Kemp在Studio 3項目中提出了ID3的解決方案。ID3全稱是Identity of MP3，其解決方案是在音頻文件末尾附加一串數據，包含關於歌曲的名稱、歌手、圖片的信息。為了方便檢測，數據的長度固定為128bytes。這個版本的解決方案稱為 ID3v1。

Michael Mutschler 在 1997 對格式進行了微小的調整，壓縮了Comment字段，增加了Track信息，這個版本被稱為ID3v1.1。

1998年，Martin Nilsson & Michael Mutschler牽頭，多個貢獻者一起發起了ID3v2的工作。這個班的結構和ID3v1完全不同，數據的長度不再固定，位置也從尾部移到了頭部，並且引入了Unicode支持。ID3v2的第一個版本是ID3v2.2，2000年發布了ID3v2.4。

ID3v1

附着在音頻數據之后，長度為128bytes，每個字段最大支持30個字符。

具體的字段信息

Song Title	30 characters
Artist	30 characters
Album	30 characters
Year	4 characters
Comment	30 characters
Genre	1 byte

在數據開始之前，總是有三個字符TAG，這樣和上面的字段加起來，正好是128bytes。如果Artist字段內容不足30個字符，不足的部分用0填充。

ID3v2

ID3v2是加在音頻數據前面的一組數據，每項具體的數據稱為frame(例如歌曲名稱)。可以包含任意類型的數據，每個frame最大支持16MB，整個tag大小最大支持256MB。存儲編碼支持Unicode，這樣就不會產生亂碼問題。

Tag數據放在音頻數據之前還有一個好處，對於流式訪問可以首先獲得歌曲信息並展現給用戶。

列舉一些特性：

The ID3v2 tag is a container format, just like IFF or PNG files, allowing new frames (chunks) as evolution proceeds.
Residing in the beginning of the audio file makes it suitable for streaming.
Has an 'unsynchronization scheme' to prevent ID3v2-incompatible players to attempt to play the tag.
Maximum tag size is 256 megabytes and maximum frame size is 16 megabytes.
Byte conservative and with the capability to compress data it keeps the files small.
The tag supports Unicode.
Isn't entirely focused on musical audio, but also other types of audio.
Has several new text fields such as composer, conductor, media type, BPM, copyright message, etc. and the possibility to design your own as you see fit.
Can contain lyrics as well as music-synced lyrics (karaoke) in almost any language.
Is able to contain volume, balance, equalizer and reverb settings.
Could be linked to CD-databases such as CDDB and FreeDB.
Is able to contain images and just about any file you want to include.
Supports enciphered information, linked information and weblinks.

使用 Python 讀取ID3 信息

我寫了一個 Python 腳本可以用來讀取 ID3v1 的信息。實際操作過程中發現兩個問題：
1、ID3v1的信息沒有編碼字段，所以有的時候同樣的Mp3，在不同的系統環境中播放，會顯示亂碼。針對這個問題，打算再寫一篇文章說一下如何探測編碼。
2、iTunes應該是優先使用ID3v2的信息

我把腳本放到了 github 上，感興趣的同學可以通過 https://github.com/cocowool/py-id3 查看。

# Read ID3v1 tag information
import os
import string
import base64
import chardet

def parse(fileObj, version = 'v1'):
fileObj.seek(0,2)
# ID3v1's max length is 128 bytes
if(fileObj.tell() < 128):
return False
fileObj.seek(-128,2)
tag_data = fileObj.read()

if(tag_data[0:3] != b'TAG'):
return False
return getTag(tag_data)

# Detect the encoding and decode
def decodeData(bin_seq):
# print(bin_seq)
result = chardet.detect(bin_seq)
# print(result)
if(result['confidence'] > 0):
try:
return bin_seq.decode(result['encoding'])
except UnicodeDecodeError:
return 'Decode Failed'


# Get ID3v1 tag data
def getTag(tag_data):
# STRIP_CHARS = compat.b(string.whitespace) + b"\x00"
STRIP_CHARS = b"\x00"

tags = {}
tags['title'] = tag_data[3:33].strip(STRIP_CHARS)

if(tags['title']):
tags['title'] = decodeData(tags['title'])

tags['artist'] = tag_data[33:63].strip(STRIP_CHARS)
if(tags['artist']):
tags['artist'] = decodeData(tags['artist'])

tags['album'] = tag_data[63:93].strip(STRIP_CHARS)
if(tags['album']):
tags['album'] = decodeData(tags['album'])

tags['year'] = tag_data[93:97].strip(STRIP_CHARS)
# if(tags['year']):
# tags['year'] = decodeData(tags['year'])

tags['comment'] = tag_data[97:127].strip(STRIP_CHARS)
#@TODO Need to analyze comment to verfiy v1 or v1.1
if(tags['comment']):
tags['comment'] = decodeData(tags['comment'])

tags['genre'] = ord(tag_data[127:128])

return tags

# Set ID3v1 tag data
def setTag():
pass

本文為作者原創，如果您覺得本文對您有幫助，請隨意打賞，您的支持將鼓勵我繼續創作。

參考資料：
1、ID3.org
2、Wiki ID3
3、Python open() 函數
4、使用Python讀取和寫入mp3文件的id3v1信息
5、Python 模塊

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python 播放mp3 Python實例獲取mp3文件的tag信息 MP3基本概念及MP3的頭信息 Python使用PyMedia播放mp3，wave等文件 python上播放mp3歌曲 Python 用pygame模塊播放MP3 經典 python播放mp3 通用使用NAudio實現Wav轉Mp3 使用python將m4a格式轉mp3格式 C#之獲取mp3文件信息