================ Chord Detection ================ The chord detection pipeline transforms symbolic note events from Basic Pitch into meaningful chord progressions through multiple processing stages. Pipeline Overview ================= .. code-block:: text Audio Input → Note Detection → Chord Assembly → Post-Processing → Final Output ↓ ↓ ↓ ↓ ↓ WAV/MP3 Basic Pitch ML Pitch Classes Smoothing Chord Symbols Note Events Clustering Confidence + Timing Note Detection Stage ==================== Basic Pitch Integration ----------------------- The pipeline uses Spotify's Basic Pitch model, a compact convolutional architecture designed for automatic music transcription in low resource settings: **Model Operation:** * Operates on a constant Q transform with three bins per semitone * Forms approximation of harmonic CQT by vertically shifting spectrogram * Produces three time-frequency maps: onsets, sustained notes, multi-pitch activity * Onset peaks extracted and matched to sustained activity * Notes shorter than ~120ms are removed **Processing Steps:** 1. Audio preprocessing by Basic Pitch model 2. CQT feature extraction with harmonic alignment 3. Model inference producing time-frequency maps 4. Note event extraction with onset/pitch/duration 5. Post-processing to remove brief notes **Output Format:** .. code-block:: python { "onset": 1.024, # Start time in seconds "offset": 1.536, # End time in seconds "pitch": 64, # MIDI note number "confidence": 0.92 # Detection confidence } Chord Assembly Algorithm ======================== Once symbolic events arrive from the server, they are grouped into time windows for chord analysis: Time Window Clustering ---------------------- Notes are grouped into time windows for chord analysis: .. code-block:: python # Notes are grouped into time windows based on temporal proximity # Root candidates are tested against pitch class templates # Chord types from Dictionary are matched against detected pitch classes # Median filtering with window size 3 is applied for smoothing Pitch Class Extraction ---------------------- Convert MIDI notes to pitch classes (0-11): .. code-block:: python def extract_pitch_classes(notes): """Extract pitch classes from note events.""" pitch_classes = {} for note in notes: pc = note.pitch % 12 weight = note.confidence * note.duration pitch_classes[pc] = pitch_classes.get(pc, 0) + weight return pitch_classes Root Detection -------------- Identify the most likely root note: **Algorithm:** 1. **Bass Note Priority**: Lowest note gets extra weight 2. **Pitch Class Histogram**: Count occurrences weighted by confidence 3. **Harmonic Template Matching**: Compare against known chord patterns 4. **Context Awareness**: Consider previous/next chords .. code-block:: python def detect_root(pitch_classes, bass_note=None): """Detect chord root from pitch classes.""" candidates = [] for root in range(12): score = 0 # Check for major triad if root in pitch_classes: score += pitch_classes[root] * 2 # Root weight if (root + 4) % 12 in pitch_classes: score += pitch_classes[(root + 4) % 12] # Major third if (root + 7) % 12 in pitch_classes: score += pitch_classes[(root + 7) % 12] # Fifth candidates.append((root, score)) # Bass note bonus if bass_note is not None: bass_pc = bass_note % 12 for i, (root, score) in enumerate(candidates): if root == bass_pc: candidates[i] = (root, score * 1.5) return max(candidates, key=lambda x: x[1])[0] Chord Quality Detection ----------------------- Determine chord type from pitch classes: **Chord Templates:** .. code-block:: python CHORD_TEMPLATES = { 'major': [0, 4, 7], 'minor': [0, 3, 7], 'dim': [0, 3, 6], 'aug': [0, 4, 8], 'maj7': [0, 4, 7, 11], 'dom7': [0, 4, 7, 10], 'min7': [0, 3, 7, 10], 'dim7': [0, 3, 6, 9], # ... more templates } **Matching Algorithm:** .. code-block:: python def detect_chord_quality(pitch_classes, root): """Match pitch classes against chord templates.""" best_match = None best_score = 0 # Transpose pitch classes relative to root relative_pcs = transpose_pitch_classes(pitch_classes, -root) for chord_type, template in CHORD_TEMPLATES.items(): score = calculate_template_match(relative_pcs, template) if score > best_score: best_score = score best_match = chord_type return best_match, best_score Post-Processing =============== Temporal Smoothing ------------------ A median filtering step with window size 3 removes brief fluctuations in chord detection: .. code-block:: python def smooth_chords(chords, window_size=3): """Apply median filter to smooth chord sequence.""" smoothed = [] for i in range(len(chords)): window_start = max(0, i - window_size // 2) window_end = min(len(chords), i + window_size // 2 + 1) window = chords[window_start:window_end] # Vote on most common chord in window chord_votes = {} for chord in window: key = (chord.root, chord.quality) weight = chord.confidence * chord.duration chord_votes[key] = chord_votes.get(key, 0) + weight # Select winning chord best_chord = max(chord_votes, key=chord_votes.get) smoothed.append(best_chord) return smoothed Confidence Scoring ------------------ Calculate overall confidence for detected chords: **Factors:** 1. **Note Detection Confidence**: Average confidence of constituent notes 2. **Template Match Score**: How well pitch classes match chord template 3. **Temporal Stability**: Consistency with neighboring chords 4. **Harmonic Context**: Likelihood given key and progression .. code-block:: python def calculate_chord_confidence(chord, notes, context): """Calculate confidence score for detected chord.""" # Note confidence note_conf = np.mean([n.confidence for n in notes]) # Template match confidence template_conf = chord.template_score # Temporal stability stability = calculate_stability(chord, context.previous, context.next) # Harmonic likelihood harmonic_conf = calculate_harmonic_likelihood(chord, context.key) # Weighted average weights = [0.3, 0.3, 0.2, 0.2] scores = [note_conf, template_conf, stability, harmonic_conf] return np.average(scores, weights=weights) Chord Filtering --------------- Remove low-confidence or spurious detections: .. code-block:: python def filter_chords(chords, min_confidence=0.5, min_duration=0.1): """Filter out low-quality chord detections.""" filtered = [] for chord in chords: if chord.confidence >= min_confidence and \ chord.duration >= min_duration: filtered.append(chord) elif filtered and chord.duration < min_duration: # Extend previous chord filtered[-1].end_time = chord.end_time return filtered Advanced Techniques =================== Jazz Chord Extensions --------------------- Detect extended and altered chords: * 9ths, 11ths, 13ths * Altered tensions (♭9, ♯11, etc.) * Slash chords (inversions) * Polychords Borrowed Chords --------------- Identify chords from parallel modes: * Modal interchange * Secondary dominants * Neapolitan chords * Augmented sixth chords Voice Leading Analysis ---------------------- Track individual voice movements: * Smooth voice leading detection * Parallel motion identification * Contrary motion analysis Performance Metrics =================== The chord detection pipeline includes: * **Server Processing**: Advanced chord inference with Krumhansl-Schmuckler key detection * **Three Analysis Modes**: HTTP Server (primary), CoreML Local, Simulation * **Robust Audio Loading**: Multiple fallback methods to handle various formats * **Audio Preprocessing**: Padding and format conversion as needed * **Median Filtering**: Window size 3 for chord smoothing Limitations =========== Current limitations include: * Difficulty with dense orchestral arrangements * Challenges with extreme registers * Reduced accuracy for chromatic passages * Lower performance on atonal music Future Improvements =================== Planned enhancements: * Deep learning chord recognition model * Genre-specific templates * User-guided correction * Real-time adaptation * Microtonal support