Module: Awaaz::Features

Included in:: Awaaz

Defined in:: lib/awaaz/features.rb

Overview

Audio Features

Instance Method Summary collapse

#build_ranges(signal_length, frame_size, hop_length) ⇒ Array<Range>

Builds a list of sample index ranges for each analysis frame.
#compute_bandwidth(freqs, magnitude, centroid, power) ⇒ Float

Computes the bandwidth for a single frame.
#compute_centroid(freqs, magnitude) ⇒ Float

Computes the spectral centroid of a single frame.
#fft(samples) ⇒ Numo::DComplex

Computes the FFT (Fast Fourier Transform) of each channel in a multi-channel signal using a Hann window.
#frame_magnitude(frame) ⇒ Numo::DFloat

Computes the magnitude spectrum of a single frame using an FFT.
#frame_ranges(array, frame_size: 2048, hop_length: 512) ⇒ Array<(Numo::NArray, Array<Range>)>

Pads the signal (if necessary) and returns the padded array along with frame index ranges.
#frames_to_time(frames, hop_length: 512, sample_rate: 22_050) ⇒ Numo::DFloat

Convert frame indices to time in seconds.
#frequency_bins(frame_size, sample_rate) ⇒ Numo::DFloat

Computes the frequency bin centers for an FFT.
#hann_window(frame_size) ⇒ Numo::DFloat

Generates a Hann window of given frame size.
#pad_amount(signal_length, frame_size, hop_length) ⇒ Integer

Computes how many samples are needed to right-pad a signal so that its length perfectly fits the given frame and hop size.
#pad_right(array, pad_count, axis: 1, with: 0) ⇒ Numo::NArray

Pads an array with zeros (or a specified value) along a given axis.
#prepare_for_fft(samples, frame_size:, hop_length:) ⇒ Array

Prepares audio samples and parameters for FFT-based feature extraction.
#rms(samples, frame_size: 2048, hop_length: 512) ⇒ Numo::SFloat

Calculates the RMS (Root Mean Square) energy for each frame in the given audio.
#rms_overall(samples) ⇒ Float

Calculates the overall RMS for an entire signal without framing.
#rolloff_for_frame(spectrum, freqs, threshold) ⇒ Float

Computes the spectral rolloff for a single frame.
#spectral_bandwidth(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, power: 2) ⇒ Numo::DFloat

Computes the spectral bandwidth over time for a signal.
#spectral_centroids(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050) ⇒ Numo::DFloat

Computes the spectral centroid trajectory of an audio signal.
#spectral_flatness(samples, frame_size: 2048, hop_length: 512, amin: 1e-10, power: 2) ⇒ Numo::DFloat

Computes the spectral flatness of an audio signal.
#spectral_rolloff(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, threshold: 0.85) ⇒ Numo::DFloat

Computes the spectral rolloff over time for a signal.
#stft(samples, frame_size: 2048, hop_length: 512) ⇒ Numo::DComplex

Computes the Short-Time Fourier Transform (STFT) of a multi-channel signal.
#total_frames(signal_length, frame_size, hop_length) ⇒ Integer

Calculates the total number of frames for a given signal length, frame size, and hop length.
#zcr(samples, frame_size: 2048, hop_length: 512) ⇒ Numo::SFloat

Calculates the zero-crossing rate (ZCR) of an audio signal frame-by-frame.
#zcr_for_frame(frame, frame_size) ⇒ Numo::SFloat

Calculates the zero-crossing rate for a single frame of audio.
#zcr_overall(samples) ⇒ Numo::SFloat

Calculates the overall zero-crossing rate (ZCR) of an entire audio signal.

Instance Method Details

#build_ranges(signal_length, frame_size, hop_length) ⇒ `Array<Range>`

Builds a list of sample index ranges for each analysis frame.

Parameters:

signal_length (Integer) —

Number of samples in the (possibly padded) signal.
frame_size (Integer) —

Size of each frame (in samples).
hop_length (Integer) —

Step size between consecutive frames (in samples).

Returns:

(Array<Range>) —

An array where each element is the sample index range for one frame.

# File 'lib/awaaz/features.rb', line 61

def build_ranges(signal_length, frame_size, hop_length)
  ranges = []
  start = 0
  while start + frame_size <= signal_length
    ranges << (start...(start + frame_size))
    start += hop_length
  end
  ranges
end

#compute_bandwidth(freqs, magnitude, centroid, power) ⇒ `Float`

Computes the bandwidth for a single frame.

Parameters:

freqs (Numo::DFloat) —

Frequency bins (Hz)
magnitude (Numo::DFloat) —

Magnitude spectrum for the frame
centroid (Float) —

Spectral centroid for the frame (Hz)
power (Integer) —

Power/exponent used for bandwidth calculation (commonly 2)

Returns:

(Float) —

Spectral bandwidth for the frame

# File 'lib/awaaz/features.rb', line 377

def compute_bandwidth(freqs, magnitude, centroid, power)
  mag_sum = magnitude.sum
  return 0 if mag_sum.zero?

  diff = (freqs - centroid).abs**power
  value = (magnitude * diff).sum / mag_sum
  value**(1.0 / power)
end

#compute_centroid(freqs, magnitude) ⇒ `Float`

Computes the spectral centroid of a single frame.

The spectral centroid is the “center of mass” of the spectrum and is often associated with the perceived brightness of a sound.

Parameters:

freqs (Numo::DFloat) —

1D array of frequency bin centers.
magnitude (Numo::DFloat) —

1D array of magnitude values corresponding to each frequency bin.

Returns:

(Float) —

The spectral centroid in Hz for the given frame.

# File 'lib/awaaz/features.rb', line 328

def compute_centroid(freqs, magnitude)
  mag_sum = magnitude.sum
  return 0 if mag_sum.zero?

  (freqs * magnitude).sum / mag_sum
end

#fft(samples) ⇒ `Numo::DComplex`

Computes the FFT (Fast Fourier Transform) of each channel in a multi-channel signal using a Hann window.

Parameters:

samples (Numo::NArray) —

A 2D array of shape [channels, samples] containing the audio data.

Returns:

(Numo::DComplex) —

A 2D complex array of shape ‘[channels, samples]` containing the FFT result for each channel.

# File 'lib/awaaz/features.rb', line 283

def fft(samples)
  window = hann_window(samples.shape[1])
  channels_count = samples.shape[0]
  fft_results = channels_count.times.map do |ch|
    Numo::Pocketfft.fft(samples[ch, true] * window)
  end
  Numo::DComplex[*fft_results]
end

#frame_magnitude(frame) ⇒ `Numo::DFloat`

Computes the magnitude spectrum of a single frame using an FFT.

Parameters:

frame (Numo::NArray) —

1D array of audio samples for a single frame.

Returns:

(Numo::DFloat) —

1D array of magnitude values for each FFT bin.



312
313
314

# File 'lib/awaaz/features.rb', line 312

def frame_magnitude(frame)
  Numo::Pocketfft.rfft(frame).abs
end

#frame_ranges(array, frame_size: 2048, hop_length: 512) ⇒ `Array<(Numo::NArray, Array<Range>)>`

Pads the signal (if necessary) and returns the padded array along with frame index ranges.

Parameters:

array (Numo::NArray) —

A 2D array where shape is [channels, samples].
frame_size (Integer) (defaults to: 2048) —

Size of each frame (in samples).
hop_length (Integer) (defaults to: 512) —

Step size between consecutive frames (in samples).

Returns:

(Array<(Numo::NArray, Array<Range>)>) —
- padded signal array
- array of frame index ranges

Raises:

(ArgumentError) —

If hop length is less than 1.

# File 'lib/awaaz/features.rb', line 84

def frame_ranges(array, frame_size: 2048, hop_length: 512)
  raise ArgumentError, "Hop Length can't be less than 1" if hop_length < 1

  amount = pad_amount(array.shape[1], frame_size, hop_length)
  array = pad_right(array, amount) if amount.positive?

  [array, build_ranges(array.shape[1], frame_size, hop_length)]
end

#frames_to_time(frames, hop_length: 512, sample_rate: 22_050) ⇒ `Numo::DFloat`

Convert frame indices to time in seconds.

This method maps analysis frame indices (or total frame count) into corresponding time positions in seconds, similar to ‘librosa.frames_to_time`.

Examples:

Using total frame count

frames_to_time(100, hop_length: 512, sample_rate: 22050)
# => Numo::DFloat[0.0, 0.0232, ..., 2.3121]

Using a spectrogram matrix

samples = Numo::DFloat.new(2, 500) # 2 channels, 500 frames
frames_to_time(samples, hop_length: 512, sample_rate: 22050)
# => Numo::DFloat[0.0, 0.0232, ..., 11.61]

Parameters:

frames (Integer, Numo::NArray) —

Either a single frame index, or a Numo array of shape (n_channels, n_frames) from which the total number of frames is inferred.
hop_length (Integer) (defaults to: 512) —

Number of audio samples between adjacent frames. Defaults to 512.
sample_rate (Integer) (defaults to: 22_050) —

Sampling rate of the audio signal in Hz. Defaults to 22,050 Hz.

Returns:

(Numo::DFloat) —

A 1-D Numo array of times (in seconds) corresponding to each frame index. If ‘frames` is an Integer, the return value spans from frame 0 up to `frames - 1`. If `frames` is a Numo array, the return value spans the number of frames inferred from `frames.shape`.

# File 'lib/awaaz/features.rb', line 485

def frames_to_time(frames, hop_length: 512, sample_rate: 22_050)
  frames_size = frames.shape[1] unless frames.is_a?(Integer)
  Numo::DFloat[0...frames_size] * hop_length / sample_rate.to_f
end

#frequency_bins(frame_size, sample_rate) ⇒ `Numo::DFloat`

Computes the frequency bin centers for an FFT.

Parameters:

frame_size (Integer) —

The size of the FFT frame (in samples).
sample_rate (Integer) —

The sampling rate of the audio (Hz).

Returns:

(Numo::DFloat) —

1D array of frequency values (Hz) corresponding to FFT bins. Shape: ‘[frame_size/2 + 1]`.



301
302
303

# File 'lib/awaaz/features.rb', line 301

def frequency_bins(frame_size, sample_rate)
  Numo::DFloat.new((frame_size / 2) + 1).seq * (sample_rate.to_f / frame_size)
end

#hann_window(frame_size) ⇒ `Numo::DFloat`

Generates a Hann window of given frame size.

A Hann window is commonly used in spectral analysis to reduce spectral leakage before applying an FFT.

Parameters:

frame_size (Integer) —

the size of the frame (number of samples per window)

Returns:

(Numo::DFloat) —

the Hann window of length ‘frame_size`

# File 'lib/awaaz/features.rb', line 202

def hann_window(frame_size)
  idx = Numo::DFloat.new(frame_size).seq
  0.5 * (1 - Numo::NMath.cos(2 * Math::PI * idx / (frame_size - 1)))
end

#pad_amount(signal_length, frame_size, hop_length) ⇒ `Integer`

Computes how many samples are needed to right-pad a signal so that its length perfectly fits the given frame and hop size.

Parameters:

signal_length (Integer) —

Number of samples in the signal.
frame_size (Integer) —

Size of each analysis frame (in samples).
hop_length (Integer) —

Step size between consecutive frames (in samples).

Returns:

(Integer) —

Number of padding samples required.

# File 'lib/awaaz/features.rb', line 29

def pad_amount(signal_length, frame_size, hop_length)
  frames = total_frames(signal_length, frame_size, hop_length)
  padded_length = ((frames - 1) * hop_length) + frame_size
  padded_length - signal_length
end

#pad_right(array, pad_count, axis: 1, with: 0) ⇒ `Numo::NArray`

Pads an array with zeros (or a specified value) along a given axis.

Parameters:

array (Numo::NArray) —

The input array (e.g., shape [channels, samples]).
pad_count (Integer) —

Number of padding elements to add.
axis (Integer) (defaults to: 1) —

Axis along which to pad (default: 1 for time axis).
with (Numeric) (defaults to: 0) —

Value to pad with (default: 0).

Returns:

(Numo::NArray) —

The padded array.

# File 'lib/awaaz/features.rb', line 45

def pad_right(array, pad_count, axis: 1, with: 0)
  channels_count = array.shape.first
  padded_array = Numo::SFloat.new(channels_count, pad_count).fill(with)

  array.concatenate(padded_array, axis: axis)
end

#prepare_for_fft(samples, frame_size:, hop_length:) ⇒ `Array`

Prepares audio samples and parameters for FFT-based feature extraction.

Examples:

samples, ranges, window, channels_count, freqs_size =
  prepare_for_fft(audio, frame_size: 2048, hop_length: 512)

Parameters:

samples (Numo::NArray) —

Multichannel audio samples as a 2D array (shape: [channels, samples]).
frame_size (Integer) —

Number of samples per frame (FFT window length).
hop_length (Integer) —

Number of samples to shift between consecutive frames.

Returns:

(Array) —
A tuple containing:
- samples [Numo::NArray] : Windowed audio samples aligned to frames
- ranges [Array<Range>] : Frame index ranges for iteration
- window [Numo::DFloat] : Hann window for FFT
- channels_count [Integer] : Number of audio channels
- freqs_size [Integer] : Number of FFT frequency bins per frame

# File 'lib/awaaz/features.rb', line 229

def prepare_for_fft(samples, frame_size:, hop_length:)
  samples, ranges = frame_ranges(samples, frame_size: frame_size, hop_length: hop_length)
  window = hann_window(frame_size)
  channels_count = samples.shape[0]
  freqs_size = (frame_size / 2) + 1

  [samples, ranges, window, channels_count, freqs_size]
end

#rms(samples, frame_size: 2048, hop_length: 512) ⇒ `Numo::SFloat`

Calculates the RMS (Root Mean Square) energy for each frame in the given audio.

Parameters:

samples (Numo::NArray) —

A 2D array of shape [channels, samples].
frame_size (Integer) (defaults to: 2048) —

Size of each analysis frame (in samples).
hop_length (Integer) (defaults to: 512) —

Step size between consecutive frames (in samples).

Returns:

(Numo::SFloat) —

A 2D array of RMS values with shape [channels, frames].

# File 'lib/awaaz/features.rb', line 102

def rms(samples, frame_size: 2048, hop_length: 512)
  samples, frame_groups = frame_ranges(samples, frame_size: frame_size, hop_length: hop_length)

  means = Numo::SFloat.zeros(samples.shape[0], frame_groups.length)
  frame_groups.each_with_index do |frame_range, idx|
    means[true, idx] = samples[true, frame_range].rms(axis: 1)
  end

  means
end

#rms_overall(samples) ⇒ `Float`

Calculates the overall RMS for an entire signal without framing.

Parameters:

samples (Numo::NArray) —

A 2D or 1D array of samples.

Returns:

(Float) —

RMS value for the entire signal.



120
121
122

# File 'lib/awaaz/features.rb', line 120

def rms_overall(samples)
  samples.rms
end

#rolloff_for_frame(spectrum, freqs, threshold) ⇒ `Float`

Computes the spectral rolloff for a single frame.

Parameters:

spectrum (Numo::DFloat) —

Magnitude spectrum for the frame
freqs (Numo::DFloat) —

Frequency bins (Hz)
threshold (Float) —

Proportion of spectral energy to retain (default: 0.85)

Returns:

(Float) —

Roll-off frequency (Hz) for the frame

# File 'lib/awaaz/features.rb', line 416

def rolloff_for_frame(spectrum, freqs, threshold)
  total_energy = spectrum.sum
  return 0.0 if total_energy.zero?

  cumsum = spectrum.cumsum
  threshold_energy = threshold * total_energy

  rolloff_bin = cumsum.ge(threshold_energy).where[0]
  rolloff_bin ||= freqs.size - 1

  freqs[rolloff_bin]
end

#spectral_bandwidth(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, power: 2) ⇒ `Numo::DFloat`

Computes the spectral bandwidth over time for a signal.

Parameters:

samples (Numo::DFloat) —

Input samples (channels x samples)
frame_size (Integer) (defaults to: 2048) —

FFT window size (default: 2048)
hop_length (Integer) (defaults to: 512) —

Step size between frames (default: 512)
sample_rate (Integer) (defaults to: 22_050) —

Sampling rate of the audio signal (default: 22050 Hz)
power (Integer) (defaults to: 2) —

Exponent for bandwidth calculation (default: 2)

Returns:

(Numo::DFloat) —

Spectral bandwidth matrix (channels x frames)

# File 'lib/awaaz/features.rb', line 394

def spectral_bandwidth(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, power: 2)
  samples, ranges, window, channels_count = prepare_for_fft(samples, frame_size: frame_size, hop_length: hop_length)
  freqs = frequency_bins(frame_size, sample_rate)
  bandwidth_matrix = Numo::DFloat.zeros(channels_count, ranges.size)

  ranges.each_with_index do |range, frame_idx|
    channels_count.times do |ch|
      magnitude = frame_magnitude(samples[ch, range] * window)
      centroid = compute_centroid(freqs, magnitude)
      bandwidth_matrix[ch, frame_idx] = compute_bandwidth(freqs, magnitude, centroid, power)
    end
  end

  bandwidth_matrix
end

#spectral_centroids(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050) ⇒ `Numo::DFloat`

Computes the spectral centroid trajectory of an audio signal.

This method frames the signal, applies a Hann window, computes the FFT magnitudes, and calculates the centroid for each frame. The result is a time series of centroids.

Examples:

centroids = spectral_centroids(samples, frame_size: 1024, hop_length: 256, sample_rate: 44100)
puts centroids.shape # => [channels, n_frames]

Parameters:

samples (Numo::NArray) —

A 2D array of shape [channels, samples].
frame_size (Integer) (defaults to: 2048) —

Size of each analysis frame (default: 2048).
hop_length (Integer) (defaults to: 512) —

Step size between frames in samples (default: 512).
sample_rate (Integer) (defaults to: 22_050) —

Sampling rate of the audio in Hz (default: 22050).

Returns:

(Numo::DFloat) —

2D array of spectral centroids with shape ‘[channels, n_frames]`.

# File 'lib/awaaz/features.rb', line 354

def spectral_centroids(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050)
  samples, ranges, window, channels_count = prepare_for_fft(samples, frame_size: frame_size, hop_length: hop_length)
  freqs = frequency_bins(frame_size, sample_rate)
  centroid_matrix = Numo::DFloat.zeros(channels_count, ranges.size)

  ranges.each_with_index do |range, frame_idx|
    channels_count.times do |ch|
      frame = samples[ch, range] * window
      magnitude = frame_magnitude(frame)
      centroid_matrix[ch, frame_idx] = compute_centroid(freqs, magnitude)
    end
  end

  centroid_matrix
end

#spectral_flatness(samples, frame_size: 2048, hop_length: 512, amin: 1e-10, power: 2) ⇒ `Numo::DFloat`

Computes the spectral flatness of an audio signal.

Spectral flatness measures how noise-like a signal is, as opposed to being tone-like. A value closer to 1.0 indicates the spectrum is flat (similar to white noise), while values closer to 0.0 indicate a peaky spectrum (like a sine wave or harmonic-rich signal).

Examples:

Compute spectral flatness for an audio clip

samples = Awaaz::Utils::Soundread.new("audio.wav").read
flatness = spectral_flatness(samples, frame_size: 1024, hop_length: 256)
puts flatness.shape

Parameters:

samples (Numo::NArray) —

The input audio samples (1D array).
frame_size (Integer) (defaults to: 2048) —

(2048) The size of each FFT window (frame). Larger sizes give better frequency resolution but worse time resolution.
hop_length (Integer) (defaults to: 512) —

(512) The number of samples to shift between consecutive FFT frames. Smaller values provide more overlap and smoother results.
amin (Float) (defaults to: 1e-10) —

(1e-10) A small constant added for numerical stability, preventing log(0) or division by zero.
power (Integer) (defaults to: 2) —

(2) The power to which the magnitude spectrum is raised. Typically 2 to work with power spectrograms.

Returns:

(Numo::DFloat) —

A 1D Numo::DFloat array containing the spectral flatness values for each frame.

# File 'lib/awaaz/features.rb', line 523

def spectral_flatness(samples, frame_size: 2048, hop_length: 512, amin: 1e-10, power: 2)
  stft_matrix = stft(samples, frame_size: frame_size, hop_length: hop_length).abs
  stft_matrix = Numo::DFloat.maximum(amin, stft_matrix**power)

  gms = Numo::DFloat::Math.exp Numo::DFloat::Math.log(stft_matrix).mean(axis: -2)
  ams = stft_matrix.mean(axis: -2)

  gms / ams
end

#spectral_rolloff(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, threshold: 0.85) ⇒ `Numo::DFloat`

Computes the spectral rolloff over time for a signal.

Spectral rolloff is the frequency below which a fixed percentage (threshold) of the total spectral energy is contained.

Parameters:

samples (Numo::DFloat) —

Input samples (channels x samples)
frame_size (Integer) (defaults to: 2048) —

FFT window size (default: 2048)
hop_length (Integer) (defaults to: 512) —

Step size between frames (default: 512)
sample_rate (Integer) (defaults to: 22_050) —

Sampling rate of the audio signal (default: 22050 Hz)
threshold (Float) (defaults to: 0.85) —

Proportion of spectral energy to retain (default: 0.85)

Returns:

(Numo::DFloat) —

Spectral rolloff matrix (channels x frames)

# File 'lib/awaaz/features.rb', line 440

def spectral_rolloff(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, threshold: 0.85)
  stft_matrix = stft(samples, frame_size: frame_size, hop_length: hop_length).abs
  channels, _freqs_size, frames_size = stft_matrix.shape
  freqs = frequency_bins(frame_size, sample_rate)

  rolloff_matrix = Numo::DFloat.zeros(channels, frames_size)

  frames_size.times do |frame_idx|
    channels.times do |ch|
      rolloff_matrix[ch, frame_idx] = rolloff_for_frame(
        stft_matrix[ch, true, frame_idx], freqs, threshold
      )
    end
  end

  rolloff_matrix
end

#stft(samples, frame_size: 2048, hop_length: 512) ⇒ `Numo::DComplex`

Computes the Short-Time Fourier Transform (STFT) of a multi-channel signal.

This method applies a sliding Hann window to the input signal, computes the FFT for each frame and each channel, and stores the positive frequency bins into a 3D complex-valued matrix.

The resulting STFT matrix has dimensions:

`[channels, frequencies, frames]`

Examples:

Compute STFT for mono audio

samples = Numo::DFloat[[0.0, 1.0, 0.0, -1.0, ...]] # shape: [1, num_samples]
stft_matrix = stft(samples, frame_size: 1024, hop_length: 256)

Parameters:

samples (Numo::NArray) —

a 2D array of shape [channels, samples] containing the audio data.
frame_size (Integer) (defaults to: 2048) —

the size of each FFT frame (default: 2048)
hop_length (Integer) (defaults to: 512) —

the number of samples between successive frames (default: 512)

Returns:

(Numo::DComplex) —

a 3D array of shape ‘[channels, (frame_size / 2 + 1), frames]` containing the complex STFT values

# File 'lib/awaaz/features.rb', line 258

def stft(samples, frame_size: 2048, hop_length: 512)
  samples, ranges, window, channels_count, freqs_size = prepare_for_fft(samples, frame_size: frame_size,
                                                                                 hop_length: hop_length)
  stft_matrix = Numo::DComplex.zeros(channels_count, freqs_size, ranges.size)

  ranges.each_with_index do |range, frame_idx|
    channels_count.times do |ch|
      fft_result = Numo::Pocketfft.fft(samples[ch, range] * window)
      stft_matrix[ch, true, frame_idx] = fft_result[0...freqs_size]
    end
  end

  stft_matrix
end

#total_frames(signal_length, frame_size, hop_length) ⇒ `Integer`

Calculates the total number of frames for a given signal length, frame size, and hop length.

Parameters:

signal_length (Integer) —

Number of samples in the signal.
frame_size (Integer) —

Size of each analysis frame (in samples).
hop_length (Integer) —

Step size between consecutive frames (in samples).

Returns:

(Integer) —

The total number of frames.



15
16
17

# File 'lib/awaaz/features.rb', line 15

def total_frames(signal_length, frame_size, hop_length)
  ((signal_length - frame_size) / hop_length.to_f).ceil + 1
end

#zcr(samples, frame_size: 2048, hop_length: 512) ⇒ `Numo::SFloat`

Calculates the zero-crossing rate (ZCR) of an audio signal frame-by-frame.

The zero-crossing rate is the proportion of consecutive samples in a frame where the signal changes sign (positive to negative or vice versa). It is often used as a simple feature in speech/music analysis.

Examples:

# Stereo signal: 2 channels, 44100 samples
zcr_values = zcr(samples, frame_size: 2048, hop_length: 512)
puts zcr_values.shape  # => [2, n_frames]

Parameters:

samples (Numo::NArray) —

2D array of audio samples. Shape: [n_channels, n_samples].
frame_size (Integer) (defaults to: 2048) —

Size of each analysis frame in samples. Default: 2048.
hop_length (Integer) (defaults to: 512) —

Step size between successive frames in samples. Default: 512.

Returns:

(Numo::SFloat) —

2D array of zero-crossing rates per frame for each channel. Shape: [n_channels, n_frames].

# File 'lib/awaaz/features.rb', line 142

def zcr(samples, frame_size: 2048, hop_length: 512)
  framed_samples, frame_groups = frame_ranges(samples, frame_size: frame_size, hop_length: hop_length)

  n_channels = framed_samples.shape[0]
  zcrs = Numo::SFloat.zeros(n_channels, frame_groups.length)

  frame_groups.each_with_index do |frame_range, idx|
    zcrs[true, idx] = zcr_for_frame(framed_samples[true, frame_range], frame_size)
  end

  zcrs
end

#zcr_for_frame(frame, frame_size) ⇒ `Numo::SFloat`

Calculates the zero-crossing rate for a single frame of audio.

Examples:

frame = samples[true, 0...2048]
single_frame_zcr = zcr_for_frame(frame, 2048)
puts single_frame_zcr  # => Numo::SFloat[0.15, 0.12]

Parameters:

frame (Numo::NArray) —

2D array containing audio samples for a single frame. Shape: [n_channels, frame_size].
frame_size (Integer) —

Number of samples in the frame.

Returns:

(Numo::SFloat) —

1D array of zero-crossing rates for each channel in the frame. Shape: [n_channels].

# File 'lib/awaaz/features.rb', line 167

def zcr_for_frame(frame, frame_size)
  first_part = frame[true, 0...-1]
  second_part = frame[true, 1..-1]
  products = first_part * second_part

  sign_changes = products < 0
  counts = sign_changes.count_true(axis: 1)

  counts / frame_size.to_f
end

#zcr_overall(samples) ⇒ `Numo::SFloat`

Calculates the overall zero-crossing rate (ZCR) of an entire audio signal.

Examples:

# Stereo signal: 2 channels, 44100 samples
overall_zcr = zcr_overall(samples)
puts overall_zcr.shape  # => [2]

Parameters:

samples (Numo::NArray) —

2D array of audio samples. Shape: [n_channels, n_samples].

Returns:

(Numo::SFloat) —

1D array containing the overall ZCR for each channel. Shape: [n_channels].



191
192
193

# File 'lib/awaaz/features.rb', line 191

def zcr_overall(samples)
  ((samples[true, 0...-1] * samples[true, 1..-1]) < 0).count_true(axis: 1) / samples.shape[1].to_f
end

Module: Awaaz::Features

Overview

Instance Method Summary collapse

Instance Method Details

#build_ranges(signal_length, frame_size, hop_length) ⇒ Array<Range>

#compute_bandwidth(freqs, magnitude, centroid, power) ⇒ Float

#compute_centroid(freqs, magnitude) ⇒ Float

#fft(samples) ⇒ Numo::DComplex

#frame_magnitude(frame) ⇒ Numo::DFloat

#frame_ranges(array, frame_size: 2048, hop_length: 512) ⇒ Array<(Numo::NArray, Array<Range>)>

#frames_to_time(frames, hop_length: 512, sample_rate: 22_050) ⇒ Numo::DFloat

Examples:

Using total frame count

Using a spectrogram matrix

#frequency_bins(frame_size, sample_rate) ⇒ Numo::DFloat

#hann_window(frame_size) ⇒ Numo::DFloat

#pad_amount(signal_length, frame_size, hop_length) ⇒ Integer

#pad_right(array, pad_count, axis: 1, with: 0) ⇒ Numo::NArray

#prepare_for_fft(samples, frame_size:, hop_length:) ⇒ Array

Examples:

#rms(samples, frame_size: 2048, hop_length: 512) ⇒ Numo::SFloat

#rms_overall(samples) ⇒ Float

#rolloff_for_frame(spectrum, freqs, threshold) ⇒ Float

#spectral_bandwidth(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, power: 2) ⇒ Numo::DFloat

#spectral_centroids(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050) ⇒ Numo::DFloat

Examples:

#spectral_flatness(samples, frame_size: 2048, hop_length: 512, amin: 1e-10, power: 2) ⇒ Numo::DFloat

Examples:

Compute spectral flatness for an audio clip

#spectral_rolloff(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, threshold: 0.85) ⇒ Numo::DFloat

#stft(samples, frame_size: 2048, hop_length: 512) ⇒ Numo::DComplex

Examples:

Compute STFT for mono audio

#total_frames(signal_length, frame_size, hop_length) ⇒ Integer

#zcr(samples, frame_size: 2048, hop_length: 512) ⇒ Numo::SFloat

Examples:

#zcr_for_frame(frame, frame_size) ⇒ Numo::SFloat

Examples:

#zcr_overall(samples) ⇒ Numo::SFloat

Examples:

#build_ranges(signal_length, frame_size, hop_length) ⇒ `Array<Range>`

#compute_bandwidth(freqs, magnitude, centroid, power) ⇒ `Float`

#compute_centroid(freqs, magnitude) ⇒ `Float`

#fft(samples) ⇒ `Numo::DComplex`

#frame_magnitude(frame) ⇒ `Numo::DFloat`

#frame_ranges(array, frame_size: 2048, hop_length: 512) ⇒ `Array<(Numo::NArray, Array<Range>)>`

#frames_to_time(frames, hop_length: 512, sample_rate: 22_050) ⇒ `Numo::DFloat`

#frequency_bins(frame_size, sample_rate) ⇒ `Numo::DFloat`

#hann_window(frame_size) ⇒ `Numo::DFloat`

#pad_amount(signal_length, frame_size, hop_length) ⇒ `Integer`

#pad_right(array, pad_count, axis: 1, with: 0) ⇒ `Numo::NArray`

#prepare_for_fft(samples, frame_size:, hop_length:) ⇒ `Array`

#rms(samples, frame_size: 2048, hop_length: 512) ⇒ `Numo::SFloat`

#rms_overall(samples) ⇒ `Float`

#rolloff_for_frame(spectrum, freqs, threshold) ⇒ `Float`

#spectral_bandwidth(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, power: 2) ⇒ `Numo::DFloat`

#spectral_centroids(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050) ⇒ `Numo::DFloat`

#spectral_flatness(samples, frame_size: 2048, hop_length: 512, amin: 1e-10, power: 2) ⇒ `Numo::DFloat`

#spectral_rolloff(samples, frame_size: 2048, hop_length: 512, sample_rate: 22_050, threshold: 0.85) ⇒ `Numo::DFloat`

#stft(samples, frame_size: 2048, hop_length: 512) ⇒ `Numo::DComplex`

#total_frames(signal_length, frame_size, hop_length) ⇒ `Integer`

#zcr(samples, frame_size: 2048, hop_length: 512) ⇒ `Numo::SFloat`

#zcr_for_frame(frame, frame_size) ⇒ `Numo::SFloat`

#zcr_overall(samples) ⇒ `Numo::SFloat`