Technical Background
This section provides the technical foundation behind Amadeus, detailing the research and engineering decisions that inform the current implementation.
Automatic Chord Recognition Overview
Automatic chord recognition (ACR) is a well-established field in music information retrieval that aims to identify chord symbols from audio signals. The task involves analysing complex polyphonic audio and reducing it to symbolic harmonic information that musicians can understand and use.
Historical Context
Traditional Approaches
Early ACR systems typically followed a pattern recognition approach:
Feature Extraction: Convert audio to pitch class profiles (chromagrams)
Template Matching: Compare features against known chord templates
Post-processing: Apply smoothing and context-aware corrections
These methods worked reasonably well for simple musical textures but struggled with:
Dense polyphonic arrangements
Overlapping harmonies
Non-harmonic tones
Varying timbres and dynamics
Machine Learning Era
The introduction of machine learning, particularly deep learning, has significantly improved ACR performance. Modern systems can learn complex patterns from large datasets and generalise better across different musical styles and recording conditions.
Basic Pitch Model
Architecture Overview
Amadeus uses Spotify’s Basic Pitch model, specifically chosen for its lightweight design and robust performance. The model is a compact convolutional architecture designed for automatic music transcription in resource-constrained settings.
Key Characteristics:
Convolutional Neural Network: Optimised for time-frequency pattern recognition
Multi-task Learning: Simultaneously detects onsets, frames, and velocity
Compact Design: Suitable for deployment in mobile applications
Genre Agnostic: Trained on diverse musical content
Technical Implementation
Constant-Q Transform Frontend
The model operates on a constant-Q transform (CQT) with three bins per semitone. This provides:
Logarithmic frequency spacing that matches musical perception
High frequency resolution in lower registers
Compact representation suitable for neural network processing
Harmonic CQT Approximation
Basic Pitch forms an approximation of a harmonic CQT by vertically shifting the spectrogram to align harmonically related frequencies. This technique:
Emphasises harmonic relationships in the input representation
Provides the model access to local patterns reflecting pitched sound structure
Reduces the complexity of learning harmonic relationships
Multi-Stream Output
The model produces three time-frequency maps:
Onset Map: Indicates note beginnings
Frame Map: Shows sustained note activity
Velocity Map: Estimates note intensities
Post-Processing Pipeline
Onset peaks are extracted from the onset map
Peaks are matched to sustained activity in the frame map
Notes shorter than approximately 120ms are filtered out
Output format: symbolic note events with onset time, pitch (MIDI), and duration
Why Basic Pitch for Amadeus?
Temporary Solution
It’s crucial to understand that Basic Pitch integration represents a temporary solution for the current development phase. The choice was pragmatic rather than aspirational:
Advantages for Rapid Prototyping:
Proven Performance: Extensively tested across diverse musical content
Ready Deployment: Available as a Python package with minimal setup
Consistent Output: Reliable symbolic note event format
Documentation: Well-documented API and usage patterns
Strategic Flexibility:
The architecture of Amadeus deliberately separates transcription from harmonic analysis. This design choice means:
The transcription layer can be replaced without affecting other components
Different models can be A/B tested easily
Custom models can be integrated when resources permit
The system remains model-agnostic at the architectural level
Limitations and Future Directions
Current Limitations
While Basic Pitch provides a solid foundation, it has known limitations:
Complex Textures: Performance degrades with dense instrumental arrangements
Extreme Registers: Less accurate in very high or low frequency ranges
Percussive Content: Not optimised for non-pitched instruments
Real-time Constraints: Not designed for low-latency applications
Research Directions (2026+)
Custom Model Development
Future development will focus on training custom models specifically for chord recognition:
Domain-Specific Training: Models trained on chord-focused datasets
Multi-Task Learning: Joint training on transcription and harmonic analysis
Efficient Architectures: Mobile-optimised designs for on-device inference
Source Separation Integration
The planned source separation component addresses current texture complexity issues:
Harmonic Isolation: Extract chord-carrying instruments from mixes
Stem-Based Analysis: Analyse individual instrument groups separately
User Control: Allow musicians to focus on specific harmonic content
Engineering Considerations
Mobile Deployment Challenges
Deploying ACR on mobile devices presents unique constraints:
Computational Limits
Limited processing power compared to servers
Battery consumption considerations
Memory constraints for model storage
Real-time performance requirements
Model Optimisation Techniques
Quantisation: Reduce model precision while maintaining accuracy
Pruning: Remove unnecessary model parameters
Knowledge Distillation: Train smaller models from larger teachers
Hardware Acceleration: Leverage GPU/NPU capabilities
Current Server-Based Approach
The decision to deploy Basic Pitch server-side rather than on-device was driven by:
Model Size: Basic Pitch requires significant storage space
Consistency: Ensure identical results across all devices
Flexibility: Easy model updates without app store approval
Development Speed: Faster iteration during research phase
Quality Assurance
Evaluation Metrics
ACR systems are typically evaluated using:
Chord-Level Metrics
Chord Accuracy: Percentage of correctly identified chords
Root Accuracy: Accuracy of chord root identification
Quality Accuracy: Accuracy of chord quality (major, minor, etc.)
Time-Aware Metrics
Weighted Chord Accuracy: Accuracy weighted by chord duration
Segmentation Accuracy: Correctness of chord boundary detection
Overlap Metrics: Intersection over union for temporal segments
Perceptual Evaluation
Musician Studies: Subjective evaluation by human experts
Practice Utility: Effectiveness for musical practice and learning
Error Analysis: Classification of failure modes and their impact
Performance Characteristics
Current System Performance
Based on informal testing with the current Basic Pitch implementation:
Accuracy by Genre:
Pop/Rock: ~70-75% chord accuracy on clear recordings
Jazz Standards: ~60-65% accuracy (complex harmony challenges)
Classical: Variable (50-70% depending on texture complexity)
Folk/Acoustic: ~75-80% accuracy (simpler arrangements)
Temporal Resolution:
Processing Speed: ~2-5 seconds for 3-minute audio file
Latency: Server round-trip typically <3 seconds
Chord Granularity: Minimum chord duration ~0.5 seconds
Known Failure Modes:
Dense orchestral arrangements
Heavily distorted recordings
Atonal or chromatic music
Solo percussion
Future Research Integration
N8 CiR Collaboration
The project benefits from upcoming research funding through N8 CiR (N8 Centre of Excellence in Computationally Intensive Research). This collaboration will enable:
Algorithm Development: Custom ACR algorithms optimised for mobile deployment
Dataset Creation: Curated training data for chord recognition tasks
Evaluation Framework: Systematic testing across musical genres and recording conditions
Publication Pipeline: Research contributions to the ACR field
This academic partnership ensures that Amadeus will evolve beyond the current prototype towards a research-informed, production-ready system that advances the state of the art in mobile music analysis.