Design Rationale

Project Evolution

Amadeus has undergone significant architectural changes and experimentation since the start of the project in early August. This page documents the key design decisions and their rationales.

Early Experiments - NMF, CQT and an Initial Plan for Live Detection

The project began with an ambition to detect chords in real time. The first prototypes used non-negative matrix factorisation on short windows of audio. The method worked in simple cases but broke down as soon as the texture increased. It often produced incomplete or unstable decompositions and was too unpredictable to form the core of an application intended for musicians.

I then tested a constant Q transform front end that produced chroma features. This was more stable but still struggled with dense mixes and did not recover the clarity required for reliable chord identification. These experiments made clear that real time multi-pitch tracking is difficult to do well, especially on mobile devices, and that most musicians gain more from analysing recorded material than from attempting to track harmony while playing.

Later Shift to File-Based Analysis

At this stage the project shifted to a file-based system with a clean separation between transcription and harmonic analysis. This choice allowed evaluation of different transcription methods without disturbing the rest of the codebase.

The decision to move away from real-time detection was based on several factors:

Technical Complexity: Real-time polyphonic tracking proved difficult to implement reliably, especially on mobile devices
User Needs: Most musicians gain more from analysing recorded material than from attempting to track harmony while playing
System Stability: File-based processing allows for more robust analysis and better user experience

Stage 3: Incorporating a Lightweight Transcription Model

For the present submission the app uses Basic Pitch by Spotify which is a compact convolutional architecture designed to perform automatic music transcription in low resource settings. The model operates on a constant Q transform with three bins per semitone and forms an approximation of a harmonic CQT by vertically shifting the spectrogram to align harmonically related frequencies.

This gives the model access to local patterns that reflect the structure of pitched sound. It produces three time-frequency maps that indicate onsets, sustained notes and multi-pitch activity. Onset peaks are extracted and matched to sustained activity by a post-processing procedure. Notes shorter than roughly one hundred and twenty milliseconds are removed. The output is a set of note events defined by onset time, pitch and duration.

Important Note on Temporary Implementation

It is important to stress that this is a temporary solution used only for this stage of development. The architecture of Amadeus does not depend on Basic Pitch itself but on the fact that it produces symbolic note events in a consistent format. This makes it possible to replace the transcription layer with a custom model or a more advanced method when time and resources allow.

Chord Assembly Algorithm

Custom Post-Processing

Rather than using Basic Pitch’s chord detection, we developed custom logic:

Rationale:

Domain Control: Fine-tune for our use cases
Transparency: Understandable algorithm
Flexibility: Easy to modify rules
Integration: Better iOS integration

Algorithm Features:

Pitch class histograms
Confidence weighting
Temporal smoothing
Root note detection
Chord quality inference

Music Theory Library

Standalone Feature

The theory library operates independently of ML:

Benefits:

Educational Value: Learning resource
Offline Access: No server required
Reference Tool: Quick lookups
User Engagement: Increases app value

Implementation:

Static Swift data structures
Programmatic chord generation
Interactive visualizations
Audio synthesis for playback

Future Development (2026)

Source Separation Integration

Testing showed that the system performs well on recordings with relatively sparse textures but struggles when several instruments mask one another. This motivates the introduction of a source separation stage that extracts a cleaner harmonic component before transcription. This would replace the unused Live Detection view with a stem preview and selection interface.

Since the rest of the pipeline expects only symbolic note events, this addition fits naturally into the current design.

Architectural Foundation

Amadeus is arranged as three independent layers:

Transcription Layer: Converts audio into symbolic events
Analysis Layer: Interprets these events as harmony
SwiftUI Views: Present the results

This structure is a direct consequence of the failed spectral prototypes and now provides a stable foundation for future work.

Technical Foundations

Harmonic Analysis Pipeline

Once the symbolic events arrive from the server they are grouped into short time windows. Root candidates are tested, and the pitch classes are matched against templates for the chord types stored in the Dictionary. A small smoothing step removes brief fluctuations. Key estimation is performed on a pitch class profile aggregated over the piece.

Transposition is carried out by shifting the pitch classes modulo twelve before regenerating the labels. The pipeline is kept separate from the view layer so that any improvement in the transcription model automatically flows into the rest of the system.

Technical Debt

Areas for improvement:

Error handling robustness
Comprehensive test coverage
Performance profiling
Documentation completeness
Accessibility features

Validation

The current architecture has been validated through:

Successful processing of diverse audio
Positive user feedback on accuracy
Reasonable response times (<5s)
Stable operation across devices
Clear upgrade path for enhancements