【持續更新】
【index - librosa 0.8.0 documentation】
首先:import librosa
load
讀取wav文件:
wav, sr = librosa.load(path, sr=22050, mono=True, offset=0.0, duration=None, dtype=<class 'numpy.float32'>, res_type='kaiser_best')
1. Load an audio file as a floating point time series. 2. Audio will be automatically resampled to the given rate (default sr=22050). 3. To preserve the native sampling rate of the file, use sr=None. |
Any codec supported by soundfile or audioread will work.
源碼中,先嘗試soundfile解碼,不然再audioread解碼。
sampling-rate-conversion
librosa.load函數可以指定采樣率讀取音頻文件。濾波器實現???
默認重采樣類別kaiser_best,表示 `resampy` python包的high-quality mode,參考:Introduction — resampy 0.2.2 documentation
resampy is a python module for efficient time-series resampling. It is based on the band-limited sinc interpolation method for sampling rate conversion.
cache
參考:librosa之cache_daisycolour_新浪博客 (sina.com.cn)
- 10:過濾器基礎,獨立於音頻數據(dct、mel、色度、constant-q)
- 20:低級特征(cqt、stft、過零等)
- 30:高級特征(節奏、節拍、分解、重復等)
- 40:后處理(delta、stack_memory、normalize、sync)
- 默認緩存級別為 10。
display
specshow (data[, x_coords, y_coords, x_axis, …]) |
Display a spectrogram/chromagram/cqt/etc. |
waveplot (y[, sr, max_points, x_axis, …]) |
Plot the amplitude envelope of a waveform. |
cmap (data[, robust, cmap_seq, cmap_bool, …]) |
Get a default colormap from the given data. |
TimeFormatter ([lag, unit]) |
A tick formatter for time axes. |
NoteFormatter ([octave, major]) |
Ticker formatter for Notes |
LogHzFormatter ([major]) |
Ticker formatter for logarithmic frequency |
ChromaFormatter |
A formatter for chroma axes |
TonnetzFormatter |
A formatter for tonnetz axes |
[1]中介紹了很多關於librosa的應用,同時提出librosa.display模塊並不默認包含在librosa中,使用時要單獨引入:
import librosa.display
waveplot
Plot the amplitude envelope of a waveform.
If y is monophonic, a filled curve is drawn between [-abs(y), abs(y)].
If y is stereo, the curve is drawn between [-abs(y[1]), abs(y[0])], so that the left and right channels are drawn above and below the axis, respectively.
Long signals (duration >= max_points) are down-sampled to at most max_sr before plotting.
librosa.display.waveplot(y, sr=22050, max_points=50000.0, x_axis='time', offset=0.0, max_sr=1000, ax=None, **kwargs)
specshow
Display a spectrogram/chromagram/cqt/etc.
librosa.display.specshow(data, x_coords=None, y_coords=None, x_axis=None, y_axis=None, sr=22050, hop_length=512, fmin=None, fmax=None, tuning=0.0, bins_per_octave=12, ax=None, **kwargs)
注意:源碼中 sr 默認是22050Hz,如果音頻文件是8k或者16k,一定要指定采樣率。
可以選擇不同的尺度顯示頻譜圖,y_axis={‘linear’, ‘log’, ‘mel’, ‘cqt_hz’,...}
feature-extraction
參考:https://librosa.org/doc/latest/feature.html
melspectrogram
計算mel-scaled spectrogram。
librosa.feature.melspectrogram(y=None, sr=22050, S=None, n_fft=2048, hop_length=512, win_length=None, window='hann', center=True, pad_mode='reflect', power=2.0, **kwargs)
應用實例:
S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128, fmax=8000)
與filters中的mel相關:
librosa.filters.mel(sr, n_fft, n_mels=128, fmin=0.0, fmax=None, htk=False, norm='slaney', dtype=<class 'numpy.float32'>)
stft / istft
短時傅里葉變換 / 逆短時傅里葉變換,參考librosa源碼和博客[librosa語音信號處理]。
librosa.stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, pad_mode='reflect')
librosa.core.stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, dtype=<class 'numpy.complex64'>, pad_mode='reflect') # This function caches at level 20.
The STFT represents a signal in the time-frequency domain by computing discrete Fourier transforms (DFT) over short overlapping windows. This function returns a complex-valued matrix D such that
- np.abs(D[f, t]) is the magnitude of frequency bin f at frame t, and
- np.angle(D[f, t]) is the phase of frequency bin f at frame t.
Parameters: |
|
---|---|
Returns: |
|
librosa.istft(stft_matrix, hop_length=None, win_length=None, window='hann', center=True, length=None)
librosa.core.istft(stft_matrix, hop_length=None, win_length=None, window='hann', center=True, dtype=<class 'numpy.float32'>, length=None) # This function caches at level 30.
Converts a complex-valued spectrogram stft_matrix to time-series y by minimizing the mean squared error between stft_matrix and STFT of y as described in [2] up to Section 2 (reconstruction from MSTFT).
In general, window function, hop length and other parameters should be same as in stft, which mostly leads to perfect reconstruction of a signal from unmodified stft_matrix.
Parameters: |
|
---|---|
Returns: |
|
有用的函數
effects.split
librosa.effects.split(y, top_db=60, ref=<function amax at 0x7fa274a61d90>, frame_length=2048, hop_length=512)
Split an audio signal into non-silent intervals. 參數說明源碼。
Parameters: |
|
---|---|
Returns: |
|
參考
[1] 音頻特征提取——librosa工具包使用 - 桂。 - 博客園 (cnblogs.com)
[2] D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. ASSP, vol.32, no.2, pp.236–243, Apr. 1984.
[3] librosa語音信號處理 - 凌逆戰 - 博客園 (cnblogs.com)
Load an audio file as a floating point time series.
Audio will be automatically resampled to the given rate (default sr=22050
).
To preserve the native sampling rate of the file, use sr=None
.