Technical Background

This section provides the technical foundation behind Amadeus, detailing the research and engineering decisions that inform the current implementation.

Automatic Chord Recognition Overview

Automatic chord recognition (ACR) is a well-established field in music information retrieval that aims to identify chord symbols from audio signals. The task involves analysing complex polyphonic audio and reducing it to symbolic harmonic information that musicians can understand and use.

Historical Context

Traditional Approaches

Early ACR systems typically followed a pattern recognition approach:

Feature Extraction: Convert audio to pitch class profiles (chromagrams)
Template Matching: Compare features against known chord templates
Post-processing: Apply smoothing and context-aware corrections

These methods worked reasonably well for simple musical textures but struggled with:

Dense polyphonic arrangements
Overlapping harmonies
Non-harmonic tones
Varying timbres and dynamics

Machine Learning Era

The introduction of machine learning, particularly deep learning, has significantly improved ACR performance. Modern systems can learn complex patterns from large datasets and generalise better across different musical styles and recording conditions.

Basic Pitch Model

Architecture Overview

Amadeus uses Spotify’s Basic Pitch model, specifically chosen for its lightweight design and robust performance. The model is a compact convolutional architecture designed for automatic music transcription in resource-constrained settings.

Key Characteristics:

Convolutional Neural Network: Optimised for time-frequency pattern recognition
Multi-task Learning: Simultaneously detects onsets, frames, and velocity
Compact Design: Suitable for deployment in mobile applications
Genre Agnostic: Trained on diverse musical content

Technical Implementation

Constant-Q Transform Frontend

The model operates on a constant-Q transform (CQT) with three bins per semitone. This provides:

Logarithmic frequency spacing that matches musical perception
High frequency resolution in lower registers
Compact representation suitable for neural network processing

Harmonic CQT Approximation

Basic Pitch forms an approximation of a harmonic CQT by vertically shifting the spectrogram to align harmonically related frequencies. This technique:

Emphasises harmonic relationships in the input representation
Provides the model access to local patterns reflecting pitched sound structure
Reduces the complexity of learning harmonic relationships

Multi-Stream Output

The model produces three time-frequency maps:

Onset Map: Indicates note beginnings
Frame Map: Shows sustained note activity
Velocity Map: Estimates note intensities

Post-Processing Pipeline

Onset peaks are extracted from the onset map
Peaks are matched to sustained activity in the frame map
Notes shorter than approximately 120ms are filtered out
Output format: symbolic note events with onset time, pitch (MIDI), and duration

Why Basic Pitch for Amadeus?

Temporary Solution

It’s crucial to understand that Basic Pitch integration represents a temporary solution for the current development phase. The choice was pragmatic rather than aspirational:

Advantages for Rapid Prototyping:

Proven Performance: Extensively tested across diverse musical content
Ready Deployment: Available as a Python package with minimal setup
Consistent Output: Reliable symbolic note event format
Documentation: Well-documented API and usage patterns

Strategic Flexibility:

The architecture of Amadeus deliberately separates transcription from harmonic analysis. This design choice means:

The transcription layer can be replaced without affecting other components
Different models can be A/B tested easily
Custom models can be integrated when resources permit
The system remains model-agnostic at the architectural level

Limitations and Future Directions

Current Limitations

While Basic Pitch provides a solid foundation, it has known limitations:

Complex Textures: Performance degrades with dense instrumental arrangements
Extreme Registers: Less accurate in very high or low frequency ranges
Percussive Content: Not optimised for non-pitched instruments
Real-time Constraints: Not designed for low-latency applications

Research Directions (2026+)

Custom Model Development

Future development will focus on training custom models specifically for chord recognition:

Domain-Specific Training: Models trained on chord-focused datasets
Multi-Task Learning: Joint training on transcription and harmonic analysis
Efficient Architectures: Mobile-optimised designs for on-device inference

Source Separation Integration

The planned source separation component addresses current texture complexity issues:

Harmonic Isolation: Extract chord-carrying instruments from mixes
Stem-Based Analysis: Analyse individual instrument groups separately
User Control: Allow musicians to focus on specific harmonic content

Engineering Considerations

Mobile Deployment Challenges

Deploying ACR on mobile devices presents unique constraints:

Computational Limits

Limited processing power compared to servers
Battery consumption considerations
Memory constraints for model storage
Real-time performance requirements

Model Optimisation Techniques

Quantisation: Reduce model precision while maintaining accuracy
Pruning: Remove unnecessary model parameters
Knowledge Distillation: Train smaller models from larger teachers
Hardware Acceleration: Leverage GPU/NPU capabilities

Current Server-Based Approach

The decision to deploy Basic Pitch server-side rather than on-device was driven by:

Model Size: Basic Pitch requires significant storage space
Consistency: Ensure identical results across all devices
Flexibility: Easy model updates without app store approval
Development Speed: Faster iteration during research phase

Quality Assurance

Evaluation Metrics

ACR systems are typically evaluated using:

Chord-Level Metrics

Chord Accuracy: Percentage of correctly identified chords
Root Accuracy: Accuracy of chord root identification
Quality Accuracy: Accuracy of chord quality (major, minor, etc.)

Time-Aware Metrics

Weighted Chord Accuracy: Accuracy weighted by chord duration
Segmentation Accuracy: Correctness of chord boundary detection
Overlap Metrics: Intersection over union for temporal segments

Perceptual Evaluation

Musician Studies: Subjective evaluation by human experts
Practice Utility: Effectiveness for musical practice and learning
Error Analysis: Classification of failure modes and their impact

Performance Characteristics

Current System Performance

Based on informal testing with the current Basic Pitch implementation:

Accuracy by Genre:

Pop/Rock: ~70-75% chord accuracy on clear recordings
Jazz Standards: ~60-65% accuracy (complex harmony challenges)
Classical: Variable (50-70% depending on texture complexity)
Folk/Acoustic: ~75-80% accuracy (simpler arrangements)

Temporal Resolution:

Processing Speed: ~2-5 seconds for 3-minute audio file
Latency: Server round-trip typically <3 seconds
Chord Granularity: Minimum chord duration ~0.5 seconds

Known Failure Modes:

Dense orchestral arrangements
Heavily distorted recordings
Atonal or chromatic music
Solo percussion

Future Research Integration

N8 CiR Collaboration

The project benefits from upcoming research funding through N8 CiR (N8 Centre of Excellence in Computationally Intensive Research). This collaboration will enable:

Algorithm Development: Custom ACR algorithms optimised for mobile deployment
Dataset Creation: Curated training data for chord recognition tasks
Evaluation Framework: Systematic testing across musical genres and recording conditions
Publication Pipeline: Research contributions to the ACR field

This academic partnership ensures that Amadeus will evolve beyond the current prototype towards a research-informed, production-ready system that advances the state of the art in mobile music analysis.