Module: Awaaz::Features
- Included in:
- Awaaz
- Defined in:
- lib/awaaz/features.rb
Overview
Audio Features
Instance Method Summary collapse
-
#build_ranges(signal_length, frame_size, hop_length) ⇒ Array<Range>
Builds a list of sample index ranges for each analysis frame.
-
#compute_bandwidth(freqs, magnitude, centroid, power) ⇒ Float
Computes the bandwidth for a single frame.
-
#compute_centroid(freqs, magnitude) ⇒ Float
Computes the spectral centroid of a single frame.
-
#fft(samples) ⇒ Numo::DComplex
Computes the FFT (Fast Fourier Transform) of each channel in a multi-channel signal using a Hann window.
-
#frame_magnitude(frame) ⇒ Numo::DFloat
Computes the magnitude spectrum of a single frame using an FFT.
-
#frame_ranges(array, frame_size: 2048, hop_length: 512) ⇒ Array<(Numo::NArray, Array<Range>)>
Pads the signal (if necessary) and returns the padded array along with frame index ranges.
-
#frames_to_time(frames, hop_length: 512, sample_rate: 22_050) ⇒ Numo::DFloat
Convert frame indices to time in seconds.
-
#frequency_bins(frame_size, sample_rate) ⇒ Numo::DFloat
Computes the frequency bin centers for an FFT.
-
#hann_window(frame_size) ⇒ Numo::DFloat
Generates a Hann window of given frame size.
-
#pad_amount(signal_length, frame_size, hop_length) ⇒ Integer
Computes how many samples are needed to right-pad a signal so that its length perfectly fits the given frame and hop size.
-
#pad_right(array, pad_count, axis: 1, with: 0) ⇒ Numo::NArray
Pads an array with zeros (or a specified value) along a given axis.
-
#prepare_for_fft(samples, frame_size:, hop_length:) ⇒ Array
Prepares audio samples and parameters for FFT-based feature extraction.
-
#rms(samples, frame_size: 2048, hop_length: 512) ⇒ Numo::SFloat
Calculates the RMS (Root Mean Square) energy for each frame in the given audio.
-
#rms_overall(samples) ⇒ Float
Calculates the overall RMS for an entire signal without framing.
-
#rolloff_for_frame(spectrum, freqs, threshold) ⇒ Float
Computes the spectral rolloff for a single frame.
-
#spectral_bandwidth(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, power: 2) ⇒ Numo::DFloat
Computes the spectral bandwidth over time for a signal.
-
#spectral_centroids(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050) ⇒ Numo::DFloat
Computes the spectral centroid trajectory of an audio signal.
-
#spectral_flatness(samples, frame_size: 2048, hop_length: 512, amin: 1e-10, power: 2) ⇒ Numo::DFloat
Computes the spectral flatness of an audio signal.
-
#spectral_rolloff(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, threshold: 0.85) ⇒ Numo::DFloat
Computes the spectral rolloff over time for a signal.
-
#stft(samples, frame_size: 2048, hop_length: 512) ⇒ Numo::DComplex
Computes the Short-Time Fourier Transform (STFT) of a multi-channel signal.
-
#total_frames(signal_length, frame_size, hop_length) ⇒ Integer
Calculates the total number of frames for a given signal length, frame size, and hop length.
-
#zcr(samples, frame_size: 2048, hop_length: 512) ⇒ Numo::SFloat
Calculates the zero-crossing rate (ZCR) of an audio signal frame-by-frame.
-
#zcr_for_frame(frame, frame_size) ⇒ Numo::SFloat
Calculates the zero-crossing rate for a single frame of audio.
-
#zcr_overall(samples) ⇒ Numo::SFloat
Calculates the overall zero-crossing rate (ZCR) of an entire audio signal.
Instance Method Details
#build_ranges(signal_length, frame_size, hop_length) ⇒ Array<Range>
Builds a list of sample index ranges for each analysis frame.
61 62 63 64 65 66 67 68 69 |
# File 'lib/awaaz/features.rb', line 61 def build_ranges(signal_length, frame_size, hop_length) ranges = [] start = 0 while start + frame_size <= signal_length ranges << (start...(start + frame_size)) start += hop_length end ranges end |
#compute_bandwidth(freqs, magnitude, centroid, power) ⇒ Float
Computes the bandwidth for a single frame.
377 378 379 380 381 382 383 384 |
# File 'lib/awaaz/features.rb', line 377 def compute_bandwidth(freqs, magnitude, centroid, power) mag_sum = magnitude.sum return 0 if mag_sum.zero? diff = (freqs - centroid).abs**power value = (magnitude * diff).sum / mag_sum value**(1.0 / power) end |
#compute_centroid(freqs, magnitude) ⇒ Float
Computes the spectral centroid of a single frame.
The spectral centroid is the “center of mass” of the spectrum and is often associated with the perceived brightness of a sound.
328 329 330 331 332 333 |
# File 'lib/awaaz/features.rb', line 328 def compute_centroid(freqs, magnitude) mag_sum = magnitude.sum return 0 if mag_sum.zero? (freqs * magnitude).sum / mag_sum end |
#fft(samples) ⇒ Numo::DComplex
Computes the FFT (Fast Fourier Transform) of each channel in a multi-channel signal using a Hann window.
283 284 285 286 287 288 289 290 |
# File 'lib/awaaz/features.rb', line 283 def fft(samples) window = hann_window(samples.shape[1]) channels_count = samples.shape[0] fft_results = channels_count.times.map do |ch| Numo::Pocketfft.fft(samples[ch, true] * window) end Numo::DComplex[*fft_results] end |
#frame_magnitude(frame) ⇒ Numo::DFloat
Computes the magnitude spectrum of a single frame using an FFT.
312 313 314 |
# File 'lib/awaaz/features.rb', line 312 def frame_magnitude(frame) Numo::Pocketfft.rfft(frame).abs end |
#frame_ranges(array, frame_size: 2048, hop_length: 512) ⇒ Array<(Numo::NArray, Array<Range>)>
Pads the signal (if necessary) and returns the padded array along with frame index ranges.
84 85 86 87 88 89 90 91 |
# File 'lib/awaaz/features.rb', line 84 def frame_ranges(array, frame_size: 2048, hop_length: 512) raise ArgumentError, "Hop Length can't be less than 1" if hop_length < 1 amount = pad_amount(array.shape[1], frame_size, hop_length) array = pad_right(array, amount) if amount.positive? [array, build_ranges(array.shape[1], frame_size, hop_length)] end |
#frames_to_time(frames, hop_length: 512, sample_rate: 22_050) ⇒ Numo::DFloat
Convert frame indices to time in seconds.
This method maps analysis frame indices (or total frame count) into corresponding time positions in seconds, similar to ‘librosa.frames_to_time`.
485 486 487 488 |
# File 'lib/awaaz/features.rb', line 485 def frames_to_time(frames, hop_length: 512, sample_rate: 22_050) frames_size = frames.shape[1] unless frames.is_a?(Integer) Numo::DFloat[0...frames_size] * hop_length / sample_rate.to_f end |
#frequency_bins(frame_size, sample_rate) ⇒ Numo::DFloat
Computes the frequency bin centers for an FFT.
301 302 303 |
# File 'lib/awaaz/features.rb', line 301 def frequency_bins(frame_size, sample_rate) Numo::DFloat.new((frame_size / 2) + 1).seq * (sample_rate.to_f / frame_size) end |
#hann_window(frame_size) ⇒ Numo::DFloat
Generates a Hann window of given frame size.
A Hann window is commonly used in spectral analysis to reduce spectral leakage before applying an FFT.
202 203 204 205 |
# File 'lib/awaaz/features.rb', line 202 def hann_window(frame_size) idx = Numo::DFloat.new(frame_size).seq 0.5 * (1 - Numo::NMath.cos(2 * Math::PI * idx / (frame_size - 1))) end |
#pad_amount(signal_length, frame_size, hop_length) ⇒ Integer
Computes how many samples are needed to right-pad a signal so that its length perfectly fits the given frame and hop size.
29 30 31 32 33 |
# File 'lib/awaaz/features.rb', line 29 def pad_amount(signal_length, frame_size, hop_length) frames = total_frames(signal_length, frame_size, hop_length) padded_length = ((frames - 1) * hop_length) + frame_size padded_length - signal_length end |
#pad_right(array, pad_count, axis: 1, with: 0) ⇒ Numo::NArray
Pads an array with zeros (or a specified value) along a given axis.
45 46 47 48 49 50 |
# File 'lib/awaaz/features.rb', line 45 def pad_right(array, pad_count, axis: 1, with: 0) channels_count = array.shape.first padded_array = Numo::SFloat.new(channels_count, pad_count).fill(with) array.concatenate(padded_array, axis: axis) end |
#prepare_for_fft(samples, frame_size:, hop_length:) ⇒ Array
Prepares audio samples and parameters for FFT-based feature extraction.
229 230 231 232 233 234 235 236 |
# File 'lib/awaaz/features.rb', line 229 def prepare_for_fft(samples, frame_size:, hop_length:) samples, ranges = frame_ranges(samples, frame_size: frame_size, hop_length: hop_length) window = hann_window(frame_size) channels_count = samples.shape[0] freqs_size = (frame_size / 2) + 1 [samples, ranges, window, channels_count, freqs_size] end |
#rms(samples, frame_size: 2048, hop_length: 512) ⇒ Numo::SFloat
Calculates the RMS (Root Mean Square) energy for each frame in the given audio.
102 103 104 105 106 107 108 109 110 111 |
# File 'lib/awaaz/features.rb', line 102 def rms(samples, frame_size: 2048, hop_length: 512) samples, frame_groups = frame_ranges(samples, frame_size: frame_size, hop_length: hop_length) means = Numo::SFloat.zeros(samples.shape[0], frame_groups.length) frame_groups.each_with_index do |frame_range, idx| means[true, idx] = samples[true, frame_range].rms(axis: 1) end means end |
#rms_overall(samples) ⇒ Float
Calculates the overall RMS for an entire signal without framing.
120 121 122 |
# File 'lib/awaaz/features.rb', line 120 def rms_overall(samples) samples.rms end |
#rolloff_for_frame(spectrum, freqs, threshold) ⇒ Float
Computes the spectral rolloff for a single frame.
416 417 418 419 420 421 422 423 424 425 426 427 |
# File 'lib/awaaz/features.rb', line 416 def rolloff_for_frame(spectrum, freqs, threshold) total_energy = spectrum.sum return 0.0 if total_energy.zero? cumsum = spectrum.cumsum threshold_energy = threshold * total_energy rolloff_bin = cumsum.ge(threshold_energy).where[0] rolloff_bin ||= freqs.size - 1 freqs[rolloff_bin] end |
#spectral_bandwidth(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, power: 2) ⇒ Numo::DFloat
Computes the spectral bandwidth over time for a signal.
394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 |
# File 'lib/awaaz/features.rb', line 394 def spectral_bandwidth(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, power: 2) samples, ranges, window, channels_count = prepare_for_fft(samples, frame_size: frame_size, hop_length: hop_length) freqs = frequency_bins(frame_size, sample_rate) bandwidth_matrix = Numo::DFloat.zeros(channels_count, ranges.size) ranges.each_with_index do |range, frame_idx| channels_count.times do |ch| magnitude = frame_magnitude(samples[ch, range] * window) centroid = compute_centroid(freqs, magnitude) bandwidth_matrix[ch, frame_idx] = compute_bandwidth(freqs, magnitude, centroid, power) end end bandwidth_matrix end |
#spectral_centroids(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050) ⇒ Numo::DFloat
Computes the spectral centroid trajectory of an audio signal.
This method frames the signal, applies a Hann window, computes the FFT magnitudes, and calculates the centroid for each frame. The result is a time series of centroids.
354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 |
# File 'lib/awaaz/features.rb', line 354 def spectral_centroids(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050) samples, ranges, window, channels_count = prepare_for_fft(samples, frame_size: frame_size, hop_length: hop_length) freqs = frequency_bins(frame_size, sample_rate) centroid_matrix = Numo::DFloat.zeros(channels_count, ranges.size) ranges.each_with_index do |range, frame_idx| channels_count.times do |ch| frame = samples[ch, range] * window magnitude = frame_magnitude(frame) centroid_matrix[ch, frame_idx] = compute_centroid(freqs, magnitude) end end centroid_matrix end |
#spectral_flatness(samples, frame_size: 2048, hop_length: 512, amin: 1e-10, power: 2) ⇒ Numo::DFloat
Computes the spectral flatness of an audio signal.
Spectral flatness measures how noise-like a signal is, as opposed to being tone-like. A value closer to 1.0 indicates the spectrum is flat (similar to white noise), while values closer to 0.0 indicate a peaky spectrum (like a sine wave or harmonic-rich signal).
523 524 525 526 527 528 529 530 531 |
# File 'lib/awaaz/features.rb', line 523 def spectral_flatness(samples, frame_size: 2048, hop_length: 512, amin: 1e-10, power: 2) stft_matrix = stft(samples, frame_size: frame_size, hop_length: hop_length).abs stft_matrix = Numo::DFloat.maximum(amin, stft_matrix**power) gms = Numo::DFloat::Math.exp Numo::DFloat::Math.log(stft_matrix).mean(axis: -2) ams = stft_matrix.mean(axis: -2) gms / ams end |
#spectral_rolloff(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, threshold: 0.85) ⇒ Numo::DFloat
Computes the spectral rolloff over time for a signal.
Spectral rolloff is the frequency below which a fixed percentage (threshold) of the total spectral energy is contained.
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 |
# File 'lib/awaaz/features.rb', line 440 def spectral_rolloff(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, threshold: 0.85) stft_matrix = stft(samples, frame_size: frame_size, hop_length: hop_length).abs channels, _freqs_size, frames_size = stft_matrix.shape freqs = frequency_bins(frame_size, sample_rate) rolloff_matrix = Numo::DFloat.zeros(channels, frames_size) frames_size.times do |frame_idx| channels.times do |ch| rolloff_matrix[ch, frame_idx] = rolloff_for_frame( stft_matrix[ch, true, frame_idx], freqs, threshold ) end end rolloff_matrix end |
#stft(samples, frame_size: 2048, hop_length: 512) ⇒ Numo::DComplex
Computes the Short-Time Fourier Transform (STFT) of a multi-channel signal.
This method applies a sliding Hann window to the input signal, computes the FFT for each frame and each channel, and stores the positive frequency bins into a 3D complex-valued matrix.
The resulting STFT matrix has dimensions:
`[channels, frequencies, frames]`
258 259 260 261 262 263 264 265 266 267 268 269 270 271 |
# File 'lib/awaaz/features.rb', line 258 def stft(samples, frame_size: 2048, hop_length: 512) samples, ranges, window, channels_count, freqs_size = prepare_for_fft(samples, frame_size: frame_size, hop_length: hop_length) stft_matrix = Numo::DComplex.zeros(channels_count, freqs_size, ranges.size) ranges.each_with_index do |range, frame_idx| channels_count.times do |ch| fft_result = Numo::Pocketfft.fft(samples[ch, range] * window) stft_matrix[ch, true, frame_idx] = fft_result[0...freqs_size] end end stft_matrix end |
#total_frames(signal_length, frame_size, hop_length) ⇒ Integer
Calculates the total number of frames for a given signal length, frame size, and hop length.
15 16 17 |
# File 'lib/awaaz/features.rb', line 15 def total_frames(signal_length, frame_size, hop_length) ((signal_length - frame_size) / hop_length.to_f).ceil + 1 end |
#zcr(samples, frame_size: 2048, hop_length: 512) ⇒ Numo::SFloat
Calculates the zero-crossing rate (ZCR) of an audio signal frame-by-frame.
The zero-crossing rate is the proportion of consecutive samples in a frame where the signal changes sign (positive to negative or vice versa). It is often used as a simple feature in speech/music analysis.
142 143 144 145 146 147 148 149 150 151 152 153 |
# File 'lib/awaaz/features.rb', line 142 def zcr(samples, frame_size: 2048, hop_length: 512) framed_samples, frame_groups = frame_ranges(samples, frame_size: frame_size, hop_length: hop_length) n_channels = framed_samples.shape[0] zcrs = Numo::SFloat.zeros(n_channels, frame_groups.length) frame_groups.each_with_index do |frame_range, idx| zcrs[true, idx] = zcr_for_frame(framed_samples[true, frame_range], frame_size) end zcrs end |
#zcr_for_frame(frame, frame_size) ⇒ Numo::SFloat
Calculates the zero-crossing rate for a single frame of audio.
167 168 169 170 171 172 173 174 175 176 |
# File 'lib/awaaz/features.rb', line 167 def zcr_for_frame(frame, frame_size) first_part = frame[true, 0...-1] second_part = frame[true, 1..-1] products = first_part * second_part sign_changes = products < 0 counts = sign_changes.count_true(axis: 1) counts / frame_size.to_f end |
#zcr_overall(samples) ⇒ Numo::SFloat
Calculates the overall zero-crossing rate (ZCR) of an entire audio signal.
191 192 193 |
# File 'lib/awaaz/features.rb', line 191 def zcr_overall(samples) ((samples[true, 0...-1] * samples[true, 1..-1]) < 0).count_true(axis: 1) / samples.shape[1].to_f end |