In the proposed architecture the time-frequency trade-off is taken into account by the introduction of two different resolutions, one for signal components concentrated in frequency (partial trackers) and another one for those concentrated in time (onset detector). The task of each partial tracker is to track a partial tone originating from an onset. They are realized as frequency-locked loops with gammatone tracking filters of variable bandwidth. The number of partial trackers may vary in dependence on the extracted properties of the input signal. The second kind of module used in the architecture is the so-called master module. The master module's task is to create or delete partial trackers triggered by the detection of an onset or offset. Central to this module is a wavelet transformer performing the preprocessing required by the onset detector and a noise floor estimator. After the detection of an onset by threshold level crossing, new partials are roughly localized in frequency through the analysis of the wavelet filter bank output. Then, the system steps back to the onset location and runs a second pass taking the newly gained insight into account. Each partial tracker continuously reports its own partial tone back to the master module, where an overall residual signal is formed by subtraction from the overall input signal. This adaptive feedback cancellation mechanism facilitates noise floor estimation, onset detection and the separation of partials lying close to each other in frequency. Automated threshold adaptation and continuous noise floor estimation serve for keeping the rate of false onset alarms low. The total system threshold is continuously updated taking previous signal onsets and noise floor estimates into account. Through this design, the different system components - partial trackers, onset detector and noise floor estimator - do not operate independently from each other. Instead they cooperate, each one taking advantage of the insight acquired by its collaborators.
Although signal-theoretic considerations rather than physiological or psychoacoustic findings were followed as guidelines in the development of the architecture, the proposed approach leads to a system bearing some similarities with properties of the human auditory system, most notably temporal and spectral masking. The system's ability to localize signal components precisely in time and frequency is examined in various experiments. Several examples are given to further illustrate the capabilities of the architecture.