Publications | Khaled Koutini

2023

Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation

Florian Schmid, Khaled Koutini, and Gerhard Widmer

In 2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), Jun 2023

HTML PDF

2022

Learning General Audio Representations With Large-Scale Training of Patchout Audio Transformers

Khaled Koutini, Shahed Masoudian, Florian Schmid, Hamid Eghbal-zadeh, Jan Schlüter, and Gerhard Widmer

NeurIPS challenge, Holistic Evaluation of Audio Representations (HEAR). Proceedings of Machine Learning Research, Jun 2022

PDF
Efficient Training of Audio Transformers with Patchout

Khaled Koutini, Jan Schlüter, Hamid Eghbal-zadeh, and Gerhard Widmer

In Interspeech 2022, 23nd Annual Conference of the International Speech Communication Association, Jun 2022

HTML PDF
CP-JKU Submission to DCASE22: Distilling Knowledge for Low-Complexity Convolutional Neural Networks From a Patchout Audio Transformer

Florian Schmid, Shahed Masoudian, Khaled Koutini, and Gerhard Widmer

Jun 2022

PDF
Knowledge Distillation from Transformers for Low-Complexity Acoustic Scene Classification

Florian Schmid, Shahed Masoudian, Khaled Koutini, and Gerhard Widmer

In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2022 Workshop, Nov 2022

PDF

2021

Receptive Field Regularization Techniques for Audio Classification and Tagging with Deep Convolutional Neural Networks

Khaled Koutini, Hamid Eghbal-zadeh, and Gerhard Widmer

IEEE/ACM Transactions on Audio, Speech, and Language Processing, Nov 2021

HTML PDF
Over-Parameterization and Generalization in Audio Classification

Khaled Koutini, Hamid Eghbal-zadeh, Florian Henkel, Jan Schlüter, and Gerhard Widmer

In The International Conference of Machine Learning ICML Workshop on Overparameterization Pitfalls and Opportunities, Nov 2021

PDF
CPJKU Submission to DCASE21: Cross-Device Audio Scene Classification with Wide Sparse Frequency-Damped CNNs

Khaled Koutini, Schlüter Jan, and Gerhard Widmer

Jun 2021

PDF

2020

CP-JKU Submissions to DCASE’20: Low-Complexity Cross-Device Acoustic Scene Classification with RF-Regularized CNNs

Khaled Koutini, Florian Henkel, Hamid Eghbal-zadeh, and Gerhard Widmer

Jun 2020

PDF
Receptive-Field Regularized CNNs for Music Classification and Tagging

Khaled Koutini, Hamid Eghbal-Zadeh, Verena Haunschmid, Paul Primus, Shreyan Chowdhury, and Gerhard Widmer

CoRR, Jun 2020

PDF
Low-Complexity Models for Acoustic Scene Classification Based on Receptive Field Regularization and Frequency Damping

Khaled Koutini, Florian Henkel, Hamid Eghbal-zadeh, and Gerhard Widmer

In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop, Nov 2020

Abs PDF

Deep Neural Networks are known to be very demanding in terms of computing and memory requirements. Due to the ever increasing use of embedded systems and mobile devices with a limited resource budget, designing low-complexity models without sacrificing too much of their predictive performance gained great importance. In this work, we investigate and compare several well-known methods to reduce the number of parameters in neural networks. We further put these into the context of a recent study on the effect of the Receptive Field (RF) on a model’s performance, and empirically show that we can achieve high-performing low-complexity models by applying specific restrictions on the RFs, in combination with parameter reduction methods. Additionally, we propose a filter-damping technique for regularizing the RF of models, without altering their architecture and changing their parameter counts. We will show that incorporating this technique improves the performance in various low-complexity settings such as pruning and decomposed convolution. Using our proposed filter damping, we achieved the 1st rank at the DCASE-2020 Challenge in the task of Low-Complexity Acoustic Scene Classification.

2019

The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification

Khaled Koutini, Hamid Eghbal-zadeh, Matthias Dorfer, and Gerhard Widmer

In 27th European Signal Processing Conference, EUSIPCO 2019, A Coruña, Spain, September 2-6, 2019, Nov 2019

HTML PDF
Receptive-Field-Regularized CNN Variants for Acoustic Scene Classification

Khaled Koutini, Hamid Eghbal-zadeh, and Gerhard Widmer

In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop , Oct 2019

PDF
Acoustic Scene Classification with Reject Option Based on ResNets

Bernhard Lehner, Khaled Koutini, Christopher Schwarzlmüller, Thomas Gallien, and Gerhard Widmer

Jun 2019

PDF
Exploiting Parallel Audio Recordings to Enforce Device Invariance in CNN-based Acoustic Scene Classification

Paul Primus, Hamid Eghbal-zadeh, David Eitelsebner, Khaled Koutini, Andreas Arzt, and Gerhard Widmer

In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop , Oct 2019
Emotion and Theme Recognition in Music with Frequency-Aware RF-Regularized CNNs

Khaled Koutini, Shreyan Chowdhury, Verena Haunschmid, Hamid Eghbal-Zadeh, and Gerhard Widmer

In Proceedings of the MediaEval 2019 Workshop, Sophia Antipolis, France, 27-30 October 2019, Oct 2019

PDF
CP-JKU submissions to DCASE’19: Acoustic Scene Classification and Audio Tagging with Receptive-Field-Regularized CNNs

Khaled Koutini, Hamid Eghbal-zadeh, and Gerhard Widmer

Jun 2019

Abs PDF

In this report, we detail the CP-JKU submissions to the DCASE-2019 challenge Task 1 (acoustic scene classification) and Task 2 (audio tagging with noisy labels and minimal supervision). In all of our submissions, we use fully convolutional deep neural networks architectures that are regularized with Receptive Field (RF) adjustments. We adjust the RF of variants of Resnet and Densenet architectures to best fit the various audio processing tasks that use the spectrogram features as input. Additionally, we propose novel CNN layers such as Frequency-Aware CNNs, and new noise compensation techniques such as Adaptive Weighting for Learning from Noisy Labels to cope with the complexities of each task. We prepared all of our submissions without the use of any external data. Our focus in this year’s submissions is to provide the best-performing single-model submission, using our proposed approaches.

2018

Iterative Knowledge Distillation in R-CNNs for Weakly-Labeled Semi-Supervised Sound Event Detection

Khaled Koutini, Hamid Eghbal-zadeh, and Gerhard Widmer

In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop, Nov 2018

2017

Classifying Short Acoustic Scenes with I-Vectors and CNNs: Challenges and Optimisations for the 2017 DCASE ASC Task

Bernhard Lehner, Hamid Eghbal-zadeh, Matthias Dorfer, Filip Korzeniowski, Khaled Koutini, and Gerhard Widmer

Jun 2017

Abs PDF

This report describes the CP-JKU team’s submissions for Task 1 (Acoustic Scene Classification, ASC) of the DCASE-2017 challenge, and discusses some observations we made about the data and the classification setup. Our approach is based on the methodology that achieved ranks 1 and 2 in the 2016 ASC challenge: a fusion of i-vector modelling using MFCC features derived from left and right audio channels, and deep convolutional neural networks (CNNs) trained on raw spectrograms. The data provided for the 2017 ASC task presented some new challenges–in particular, audio stimuli of very short duration. These will be discussed in detail, and our measures for addressing them will be described. The result of our experiments is a classification system that achieves classification accuracies of around 90% on the provided development data, as estimated via the prescribed four-fold cross-validation scheme. On the unseen evaluation data, our best performing method achieved 73.8% and 5th place in the team ranking.
MediaEval 2017 AcousticBrainz Genre Task: Multilayer Perceptron Approach

Khaled Koutini, Alina Imenina, Matthias Dorfer, Alexander Gruber, and Markus Schedl

In Proceedings of the MediaEval 2017 Workshop co-located with the Conference and Labs of the Evaluation Forum (CLEF 2017), Dublin, Ireland, September 13-15, 2017, Jun 2017