Chord Detection

The chord detection pipeline transforms symbolic note events from Basic Pitch into meaningful chord progressions through multiple processing stages.

Pipeline Overview

Audio Input → Note Detection → Chord Assembly → Post-Processing → Final Output
     ↓              ↓                ↓                ↓              ↓
  WAV/MP3    Basic Pitch ML   Pitch Classes    Smoothing    Chord Symbols
               Note Events     Clustering      Confidence      + Timing

Note Detection Stage

Basic Pitch Integration

The pipeline uses Spotify’s Basic Pitch model, a compact convolutional architecture designed for automatic music transcription in low resource settings:

Model Operation:

  • Operates on a constant Q transform with three bins per semitone

  • Forms approximation of harmonic CQT by vertically shifting spectrogram

  • Produces three time-frequency maps: onsets, sustained notes, multi-pitch activity

  • Onset peaks extracted and matched to sustained activity

  • Notes shorter than ~120ms are removed

Processing Steps:

  1. Audio preprocessing by Basic Pitch model

  2. CQT feature extraction with harmonic alignment

  3. Model inference producing time-frequency maps

  4. Note event extraction with onset/pitch/duration

  5. Post-processing to remove brief notes

Output Format:

{
    "onset": 1.024,      # Start time in seconds
    "offset": 1.536,     # End time in seconds
    "pitch": 64,         # MIDI note number
    "confidence": 0.92   # Detection confidence
}

Chord Assembly Algorithm

Once symbolic events arrive from the server, they are grouped into time windows for chord analysis:

Time Window Clustering

Notes are grouped into time windows for chord analysis:

# Notes are grouped into time windows based on temporal proximity
# Root candidates are tested against pitch class templates
# Chord types from Dictionary are matched against detected pitch classes
# Median filtering with window size 3 is applied for smoothing

Pitch Class Extraction

Convert MIDI notes to pitch classes (0-11):

def extract_pitch_classes(notes):
    """Extract pitch classes from note events."""
    pitch_classes = {}
    for note in notes:
        pc = note.pitch % 12
        weight = note.confidence * note.duration
        pitch_classes[pc] = pitch_classes.get(pc, 0) + weight
    return pitch_classes

Root Detection

Identify the most likely root note:

Algorithm:

  1. Bass Note Priority: Lowest note gets extra weight

  2. Pitch Class Histogram: Count occurrences weighted by confidence

  3. Harmonic Template Matching: Compare against known chord patterns

  4. Context Awareness: Consider previous/next chords

def detect_root(pitch_classes, bass_note=None):
    """Detect chord root from pitch classes."""
    candidates = []

    for root in range(12):
        score = 0
        # Check for major triad
        if root in pitch_classes:
            score += pitch_classes[root] * 2  # Root weight
        if (root + 4) % 12 in pitch_classes:
            score += pitch_classes[(root + 4) % 12]  # Major third
        if (root + 7) % 12 in pitch_classes:
            score += pitch_classes[(root + 7) % 12]  # Fifth

        candidates.append((root, score))

    # Bass note bonus
    if bass_note is not None:
        bass_pc = bass_note % 12
        for i, (root, score) in enumerate(candidates):
            if root == bass_pc:
                candidates[i] = (root, score * 1.5)

    return max(candidates, key=lambda x: x[1])[0]

Chord Quality Detection

Determine chord type from pitch classes:

Chord Templates:

CHORD_TEMPLATES = {
    'major': [0, 4, 7],
    'minor': [0, 3, 7],
    'dim': [0, 3, 6],
    'aug': [0, 4, 8],
    'maj7': [0, 4, 7, 11],
    'dom7': [0, 4, 7, 10],
    'min7': [0, 3, 7, 10],
    'dim7': [0, 3, 6, 9],
    # ... more templates
}

Matching Algorithm:

def detect_chord_quality(pitch_classes, root):
    """Match pitch classes against chord templates."""
    best_match = None
    best_score = 0

    # Transpose pitch classes relative to root
    relative_pcs = transpose_pitch_classes(pitch_classes, -root)

    for chord_type, template in CHORD_TEMPLATES.items():
        score = calculate_template_match(relative_pcs, template)
        if score > best_score:
            best_score = score
            best_match = chord_type

    return best_match, best_score

Post-Processing

Temporal Smoothing

A median filtering step with window size 3 removes brief fluctuations in chord detection:

def smooth_chords(chords, window_size=3):
    """Apply median filter to smooth chord sequence."""
    smoothed = []

    for i in range(len(chords)):
        window_start = max(0, i - window_size // 2)
        window_end = min(len(chords), i + window_size // 2 + 1)
        window = chords[window_start:window_end]

        # Vote on most common chord in window
        chord_votes = {}
        for chord in window:
            key = (chord.root, chord.quality)
            weight = chord.confidence * chord.duration
            chord_votes[key] = chord_votes.get(key, 0) + weight

        # Select winning chord
        best_chord = max(chord_votes, key=chord_votes.get)
        smoothed.append(best_chord)

    return smoothed

Confidence Scoring

Calculate overall confidence for detected chords:

Factors:

  1. Note Detection Confidence: Average confidence of constituent notes

  2. Template Match Score: How well pitch classes match chord template

  3. Temporal Stability: Consistency with neighboring chords

  4. Harmonic Context: Likelihood given key and progression

def calculate_chord_confidence(chord, notes, context):
    """Calculate confidence score for detected chord."""
    # Note confidence
    note_conf = np.mean([n.confidence for n in notes])

    # Template match confidence
    template_conf = chord.template_score

    # Temporal stability
    stability = calculate_stability(chord, context.previous, context.next)

    # Harmonic likelihood
    harmonic_conf = calculate_harmonic_likelihood(chord, context.key)

    # Weighted average
    weights = [0.3, 0.3, 0.2, 0.2]
    scores = [note_conf, template_conf, stability, harmonic_conf]

    return np.average(scores, weights=weights)

Chord Filtering

Remove low-confidence or spurious detections:

def filter_chords(chords, min_confidence=0.5, min_duration=0.1):
    """Filter out low-quality chord detections."""
    filtered = []

    for chord in chords:
        if chord.confidence >= min_confidence and \
           chord.duration >= min_duration:
            filtered.append(chord)
        elif filtered and chord.duration < min_duration:
            # Extend previous chord
            filtered[-1].end_time = chord.end_time

    return filtered

Advanced Techniques

Jazz Chord Extensions

Detect extended and altered chords:

  • 9ths, 11ths, 13ths

  • Altered tensions (♭9, ♯11, etc.)

  • Slash chords (inversions)

  • Polychords

Borrowed Chords

Identify chords from parallel modes:

  • Modal interchange

  • Secondary dominants

  • Neapolitan chords

  • Augmented sixth chords

Voice Leading Analysis

Track individual voice movements:

  • Smooth voice leading detection

  • Parallel motion identification

  • Contrary motion analysis

Performance Metrics

The chord detection pipeline includes:

  • Server Processing: Advanced chord inference with Krumhansl-Schmuckler key detection

  • Three Analysis Modes: HTTP Server (primary), CoreML Local, Simulation

  • Robust Audio Loading: Multiple fallback methods to handle various formats

  • Audio Preprocessing: Padding and format conversion as needed

  • Median Filtering: Window size 3 for chord smoothing

Limitations

Current limitations include:

  • Difficulty with dense orchestral arrangements

  • Challenges with extreme registers

  • Reduced accuracy for chromatic passages

  • Lower performance on atonal music

Future Improvements

Planned enhancements:

  • Deep learning chord recognition model

  • Genre-specific templates

  • User-guided correction

  • Real-time adaptation

  • Microtonal support