Chord Detection
The chord detection pipeline transforms symbolic note events from Basic Pitch into meaningful chord progressions through multiple processing stages.
Pipeline Overview
Audio Input → Note Detection → Chord Assembly → Post-Processing → Final Output
↓ ↓ ↓ ↓ ↓
WAV/MP3 Basic Pitch ML Pitch Classes Smoothing Chord Symbols
Note Events Clustering Confidence + Timing
Note Detection Stage
Basic Pitch Integration
The pipeline uses Spotify’s Basic Pitch model, a compact convolutional architecture designed for automatic music transcription in low resource settings:
Model Operation:
Operates on a constant Q transform with three bins per semitone
Forms approximation of harmonic CQT by vertically shifting spectrogram
Produces three time-frequency maps: onsets, sustained notes, multi-pitch activity
Onset peaks extracted and matched to sustained activity
Notes shorter than ~120ms are removed
Processing Steps:
Audio preprocessing by Basic Pitch model
CQT feature extraction with harmonic alignment
Model inference producing time-frequency maps
Note event extraction with onset/pitch/duration
Post-processing to remove brief notes
Output Format:
{
"onset": 1.024, # Start time in seconds
"offset": 1.536, # End time in seconds
"pitch": 64, # MIDI note number
"confidence": 0.92 # Detection confidence
}
Chord Assembly Algorithm
Once symbolic events arrive from the server, they are grouped into short time windows for chord analysis:
Time Window Clustering
Notes are grouped into time windows for chord analysis:
# Notes are grouped into time windows (approximately 49 short windows)
# Root candidates are tested against pitch class templates
# Chord types from Dictionary are matched against detected pitch classes
Pitch Class Extraction
Convert MIDI notes to pitch classes (0-11):
def extract_pitch_classes(notes):
"""Extract pitch classes from note events."""
pitch_classes = {}
for note in notes:
pc = note.pitch % 12
weight = note.confidence * note.duration
pitch_classes[pc] = pitch_classes.get(pc, 0) + weight
return pitch_classes
Root Detection
Identify the most likely root note:
Algorithm:
Bass Note Priority: Lowest note gets extra weight
Pitch Class Histogram: Count occurrences weighted by confidence
Harmonic Template Matching: Compare against known chord patterns
Context Awareness: Consider previous/next chords
def detect_root(pitch_classes, bass_note=None):
"""Detect chord root from pitch classes."""
candidates = []
for root in range(12):
score = 0
# Check for major triad
if root in pitch_classes:
score += pitch_classes[root] * 2 # Root weight
if (root + 4) % 12 in pitch_classes:
score += pitch_classes[(root + 4) % 12] # Major third
if (root + 7) % 12 in pitch_classes:
score += pitch_classes[(root + 7) % 12] # Fifth
candidates.append((root, score))
# Bass note bonus
if bass_note is not None:
bass_pc = bass_note % 12
for i, (root, score) in enumerate(candidates):
if root == bass_pc:
candidates[i] = (root, score * 1.5)
return max(candidates, key=lambda x: x[1])[0]
Chord Quality Detection
Determine chord type from pitch classes:
Chord Templates:
CHORD_TEMPLATES = {
'major': [0, 4, 7],
'minor': [0, 3, 7],
'dim': [0, 3, 6],
'aug': [0, 4, 8],
'maj7': [0, 4, 7, 11],
'dom7': [0, 4, 7, 10],
'min7': [0, 3, 7, 10],
'dim7': [0, 3, 6, 9],
# ... more templates
}
Matching Algorithm:
def detect_chord_quality(pitch_classes, root):
"""Match pitch classes against chord templates."""
best_match = None
best_score = 0
# Transpose pitch classes relative to root
relative_pcs = transpose_pitch_classes(pitch_classes, -root)
for chord_type, template in CHORD_TEMPLATES.items():
score = calculate_template_match(relative_pcs, template)
if score > best_score:
best_score = score
best_match = chord_type
return best_match, best_score
Post-Processing
Temporal Smoothing
A small smoothing step removes brief fluctuations in chord detection:
def smooth_chords(chords, window_size=3):
"""Apply median filter to smooth chord sequence."""
smoothed = []
for i in range(len(chords)):
window_start = max(0, i - window_size // 2)
window_end = min(len(chords), i + window_size // 2 + 1)
window = chords[window_start:window_end]
# Vote on most common chord in window
chord_votes = {}
for chord in window:
key = (chord.root, chord.quality)
weight = chord.confidence * chord.duration
chord_votes[key] = chord_votes.get(key, 0) + weight
# Select winning chord
best_chord = max(chord_votes, key=chord_votes.get)
smoothed.append(best_chord)
return smoothed
Confidence Scoring
Calculate overall confidence for detected chords:
Factors:
Note Detection Confidence: Average confidence of constituent notes
Template Match Score: How well pitch classes match chord template
Temporal Stability: Consistency with neighboring chords
Harmonic Context: Likelihood given key and progression
def calculate_chord_confidence(chord, notes, context):
"""Calculate confidence score for detected chord."""
# Note confidence
note_conf = np.mean([n.confidence for n in notes])
# Template match confidence
template_conf = chord.template_score
# Temporal stability
stability = calculate_stability(chord, context.previous, context.next)
# Harmonic likelihood
harmonic_conf = calculate_harmonic_likelihood(chord, context.key)
# Weighted average
weights = [0.3, 0.3, 0.2, 0.2]
scores = [note_conf, template_conf, stability, harmonic_conf]
return np.average(scores, weights=weights)
Chord Filtering
Remove low-confidence or spurious detections:
def filter_chords(chords, min_confidence=0.5, min_duration=0.1):
"""Filter out low-quality chord detections."""
filtered = []
for chord in chords:
if chord.confidence >= min_confidence and \
chord.duration >= min_duration:
filtered.append(chord)
elif filtered and chord.duration < min_duration:
# Extend previous chord
filtered[-1].end_time = chord.end_time
return filtered
Advanced Techniques
Jazz Chord Extensions
Detect extended and altered chords:
9ths, 11ths, 13ths
Altered tensions (♭9, ♯11, etc.)
Slash chords (inversions)
Polychords
Borrowed Chords
Identify chords from parallel modes:
Modal interchange
Secondary dominants
Neapolitan chords
Augmented sixth chords
Voice Leading Analysis
Track individual voice movements:
Smooth voice leading detection
Parallel motion identification
Contrary motion analysis
Performance Metrics
The chord detection pipeline achieves:
Accuracy: ~75% on pop/rock music
Latency: <2 seconds for 3-minute song
Precision: 85% for major/minor triads
Recall: 70% for complex jazz chords
Limitations
Current limitations include:
Difficulty with dense orchestral arrangements
Challenges with extreme registers
Reduced accuracy for chromatic passages
Lower performance on atonal music
Future Improvements
Planned enhancements:
Deep learning chord recognition model
Genre-specific templates
User-guided correction
Real-time adaptation
Microtonal support