Papers
What follows is a list of 700 papers that mention Freesound or use Freesound data for research.
This list is created automatically by finding articles that cite one of the main Freesound
reference papers. Some entries have also been added manually. Papers are sorted by year of publication
and alphabetically by first author surname.
If you have a paper which should be
on the list and is not, please send us an email at freesound@freesound.org.
2023 (89)
- . Microphone-Based Context Awareness And Coverage Planner For A Service Robot Using Deep Learning Techniques. Mathematics (2023).
- Ambuj Mehrish, Navonil Majumder, Rishabh Bhardwaj, Soujanya Poria. A Review Of Deep Learning Techniques For Speech Processing. Information Fusion (2023).
- Anam Bansal, N. Garg. Environmental Sound Classification Using Hybrid Ensemble Model. Procedia Computer Science (2023).
- Angélica S. Z. Suárez, Clément Laroche, L. Clemmensen, Sneha Das. On Crowdsourcing-Design With Comparison Category Rating For Evaluating Speech Enhancement Algorithms. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023).
- Ankit Shah, Shuyi Chen, Kejun Zhou, Yue Chen, B. Raj. Approach To Learning Generalized Audio Representation Through Batch Embedding Covariance Regularization And Constant-Q Transforms. ArXiv (2023).
- Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha. Unfused: Unsupervised Finetuning Using Self Supervised Distillation. ArXiv (2023).
- B. Weck, Xavier Serra. Data Leakage In Cross-Modal Retrieval Training: A Case Study. ArXiv (2023).
- Bac Nguyen, S. Uhlich, Fabien Cardinaux. Improving Self-Supervised Learning For Audio Representations By Feature Diversity And Decorrelation. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023).
- Dianwen Ng, Ruixiong Zhang, Jia Qi Yip, Zhao Yang, Jinjie Ni, Chong Zhang, Yukun Ma, Chongjia Ni, E. Chng, B. Ma. Dehubert: Disentangling Noise In A Self-Supervised Model For Robust Speech Recognition. ArXiv (2023).
- Dianwen Ng, Ruixiu Zhang, Jia Qi Yip, Zhao Yang, Jinjie Ni, Chong Zhang, Yukun Ma, Chongjia Ni, E. Chng, Bin Ma. De’Hubert: Disentangling Noise In A Self-Supervised Model For Robust Speech Recognition. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023).
- E. Thoret, S. Ystad, R. Kronland-Martinet. Hearing As Adaptive Cascaded Envelope Interpolation. Communications biology (2023).
- Fuhu Song, Jifeng Hu, Che Wang, Jiao Huang, Haowen Zhang, Yi Wang. Cross-Modal Audio-Text Retrieval Via Sequential Feature Augmentation. CACML (2023).
- G. Peruzzi, A. Pozzebon, Mattia Van Der Meer. Fight Fire With Fire: Detecting Forest Fires With Embedded Machine Learning Models Dealing With Audio And Images On Low Power Iot Devices. Sensors (2023).
- H. Tran, J. Hong, Hyeryung Jang, Jinhwan Jung, Jongmok Kim, Joonki Hong, Minji Lee, J. Kim, C. Kushida, Dongheon Lee, Daewoo Kim, I. Yoon. Prediction Of Sleep Stages Via Deep Learning Using Smartphone Audio Recordings In Home Environments: Model Development And Validation. Journal of medical Internet research (2023).
- Haitao Xu, L. Wei, Jie Zhang, Jianming Yang, Yannan Wang, Tian Gao, Xin Fang, Lirong Dai. A Multi-Scale Feature Aggregation Based Lightweight Network For Audio-Visual Speech Enhancement. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023).
- Han Liu, H. Liu. When Evil Calls: Targeted Adversarial Voice Over Ip Network (2023).
- Han Yin, Jisheng Bai, Mou Wang, S. Huang, Yafei Jia, Jianfeng Chen. Convolutional Recurrent Neural Network With Attention For 3D Speech Enhancement (2023).
- Han Yin, Jisheng Bai, S. Huang, Mou Wang, Yafei Jia, Jianfeng Chen. Two-Stage Autoencoder Neural Network For 3D Speech Enhancement. ArXiv (2023).
- Ho-Hsiang Wu, Oriol Nieto, J. Bello, J. Salamon. Audio-Text Models Do Not Yet Leverage Natural Language. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023).
- Irene Martín-Morató, A. Mesaros. Strong Labeling Of Sound Events Using Crowdsourced Weak Labels And Annotator Competence Estimation. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023).
- Jaesung Huh, Jacob Chalk, E. Kazakos, D. Damen, A. Zisserman. Epic-Sounds: A Large-Scale Dataset Of Actions That Sound. ArXiv (2023).
- Jinhua Liang, Xubo Liu, Haohe Liu, Huy Phan, Emmanouil Benetos, MarkD . Plumbley, Wenwu Wang. Adapting Language-Audio Models As Few-Shot Audio Learners. ArXiv (2023).
- Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Hojeong Lee, Ankit Shah, Shuo Han, YUNYANG ZENG, Amanda Shu, Haohui Liu, Xuankai Chang, Hamza Khalid, Minseon Gwak, Kawon Lee, Minjeong Kim, B. Raj. Improving Perceptual Quality, Intelligibility, And Acoustics On Voip Platforms. ArXiv (2023).
- Junhong Shen, Liam Li, L. Dery, Corey Staten, M. Khodak, Graham Neubig, Ameet S. Talwalkar. Cross-Modal Fine-Tuning: Align Then Refine. ArXiv (2023).
- Junhong Shen, Liam Li, L. Dery, Corey Staten, M. Khodak, Graham Neubig, Ameet Talwalkar. Cross-Modal Fine-Tuning: Align Then Refine. ArXiv (2023).
- Junhyeok Lee, Hyeonuk Nam, Yong-Hwa Park. Vifs: An End-To-End Variational Inference For Foley Sound Synthesis (2023).
- Kai-Wei Chang, Yu-Kai Wang, Hua Shen, Iu-thing Kang, W. Tseng, Shang-Wen Li, Hung-yi Lee. Speechprompt V2: Prompt Tuning For Speech Classification Tasks (2023).
- Karen Gissell Rosero Jacome, Felipe Grijalva, B. Masiero. Sound Events Localization And Detection Using Bio-Inspired Gammatone Filters And Temporal Convolutional Neural Networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023).
- Keunwoo Choi, Jae-Yeol Im, L. Heller, Brian McFee, Keisuke Imoto, Yuki Okamoto, M. Lagrange, Shinosuke Takamichi. Foley Sound Synthesis At The Dcase 2023 Challenge. ArXiv (2023).
- Krishna Teja Chitty-Venkata, M. Emani, V. Vishwanath, Arun Somani. Neural Architecture Search Benchmarks: Insights And Survey. IEEE Access (2023).
- L. Turchet, Carlo Zanotto, J. Pauwels. “Give Me Happy Pop Songs In C Major And With A Fast Tempo”: A Vocal Assistant For Content-Based Queries To Online Music Repositories. International Journal of Human-Computer Studies (2023).
- L. Turchet, M. Lagrange, C. Rottondi, György Fazekas, Nils Peters, J. Ostergaard, F. Font, T. Backstrom, C. Fischione. The Internet Of Sounds: Convergent Trends, Insights, And Future Directions. IEEE Internet of Things Journal (2023).
- Luciano S. Martínez Rau, José O. Chelotti, M. Ferrero, J. Galli, S. Utsumi, A. Planisich, H. Rufiner, L. Giovanini. A Noise-Robust Acoustic Method For Recognition Of Foraging Activities Of Grazing Cattle. ArXiv (2023).
- Mahmoud Salhab, H. Harmanani. Araspot: Arabic Spoken Command Spotting. ArXiv (2023).
- Marek Kadlcík, Adam H'ajek, Jürgen Kieslich, Radoslaw Winiecki. A Whisper Transformer For Audio Captioning Trained With Synthetic Captions And Transfer Learning. ArXiv (2023).
- Meelan Bandara, Roshinie Jayasundara, Isuru Ariyarathne, D. Meedeniya, Charith Perera. Forest Sound Classification Dataset: Fsc22. Sensors (2023).
- Michael Nigro, S. Krishnan. Sardbscene: Dataset And Resnet Baseline For Audio Scene Source Counting And Analysis. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023).
- Mimoun Lamrini, M. Chkouri, A. Touhafi. Evaluating The Performance Of Pre-Trained Convolutional Neural Network For Audio Classification On Embedded Systems For Anomaly Detection In Smart Cities. Sensors (2023).
- Muhammad Mamunur Rashid, Guiqing Li, Chengrui Du. Nonspeech7K Dataset: Classification And Analysis Of Human Non‐Speech Sound. IET Signal Processing (2023).
- N. Shashaank, Berker Banar, M. Izadi, J. Kemmerer, Shuo Zhang, Chuanzeng Huang. Hissnet: Sound Event Detection And Speaker Identification Via Hierarchical Prototypical Networks For Low-Resource Headphones. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023).
- Nikhil Singh, Chih-Wei Wu, Iroro Orife, M. Kalayeh. Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs For Audiovisual Representation Learning. ArXiv (2023).
- Paul Primus, G. Widmer. On Frequency-Wise Normalizations For Better Recording Device Generalization In Audio Spectrogram Transformers. ArXiv (2023).
- Peng Wang, Shijie Wang, Junyang Lin, Shuai Bai, Xiaohuan Zhou, Jingren Zhou, Xinggang Wang, Chang Zhou. One-Peace: Exploring One General Representation Model Toward Unlimited Modalities. ArXiv (2023).
- Peyman Goli, S. van de Par. Deep Learning-Based Speech Specific Source Localization By Using Binaural And Monaural Microphone Arrays In Hearing Aids. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023).
- Prateek Verma, C. Chafe. A Content Adaptive Learnable Time-Frequency Representation For Audio Signal Processing. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023).
- Prateek Verma, C. Chafe. Content Adaptive Front End For Audio Signal Processing (2023).
- Qiu-shi Zhu, J. Zhang, Zitian Zhang, Lirong Dai. A Joint Speech Enhancement And Self-Supervised Representation Learning Framework For Noise-Robust Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023).
- Qiuqiang Kong, K. Chen, Haohe Liu, Xingjian Du, Taylor Berg-Kirkpatrick, S. Dubnov, MarkD . Plumbley. Universal Source Separation With Weakly Labelled Data. ArXiv (2023).
- R. Serizel, Samuele Cornell, Nicolas Turpault. Performance Above All? Energy Consumption Vs. Performance, A Study On Sound Event Detection With Heterogeneous Data. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023).
- R. Serizel, Samuele Cornell, Nicolas Turpault. Performance Above All ? Energy Consumption Vs. Performance For Machine Listening, A Study On Dcase Task 4 Baseline (2023).
- Rajapantula Kranthi, Vasundhara. A Robust Adaptive Filter For Diffusion Strategy-Based Distributed Active Noise Control. IETE Journal of Research (2023).
- Rajat Hebbar, Digbalay Bose, Krishna Somandepalli, Veena Vijai, Shrikanth S. Narayanan. A Dataset For Audio-Visual Sound Event Detection In Movies. ArXiv (2023).
- Rishabh Garg, Ruohan Gao, K. Grauman. Visually-Guided Audio Spatialization In Video With Geometry-Aware Multi-Task Learning. International Journal of Computer Vision (2023).
- Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra. Imagebind: One Embedding Space To Bind Them All. ArXiv (2023).
- Ruchika Chavhan, H. Gouk, Jan Stuehmer, Calum Heggan, Mehrdad Yaghoobi, Timothy M. Hospedales. Amortised Invariance Learning For Contrastive Self-Supervision. ICLR (2023).
- Ruchika Chavhan, Henry G. R. Gouk, Jan Stuehmer, Calum Heggan, Mehrdad Yaghoobi, Timothy M. Hospedales. Amortised Invariance Learning For Contrastive Self-Supervision (2023).
- S. Huang, Jianfeng Chen, Jisheng Bai, Yafei Jia, Dongzhe Zhang. Dynamic Kernel Convolution Network With Scene-Dedicate Training For Sound Event Localization And Detection (2023).
- Saksham Singh Kushwaha, Magdalena Fuentes. A Multimodal Prototypical Approach For Unsupervised Sound Classification (2023).
- Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola García, Yoshiki Masuyama, Zhongqiu Wang, S. Squartini, S. Khudanpur. The Chime-7 Dasr Challenge: Distant Meeting Transcription With Multiple Devices In Diverse Scenarios. ArXiv (2023).
- Sandipana Dowerah, R. Serizel, D. Jouvet, Mohammad MohammadAmini, D. Matrouf. Joint Optimization Of Diffusion Probabilistic-Based Multichannel Speech Enhancement With Far-Field Speaker Verification. 2022 IEEE Spoken Language Technology Workshop (SLT) (2023).
- Sarthak Yadav, S. Theodoridis, Lars Kai Hansen, Z. Tan. Masked Autoencoders With Multi-Window Attention Are Better Audio Learners (2023).
- Seong-Gyun Leem, D. Fulford, J. Onnela, David E Gard, C. Busso. Computation And Memory Efficient Noise Adaptation Of Wav2Vec2.0 For Noisy Speech Emotion Recognition With Skip Connection Adapters (2023).
- Shayan Gharib, Minh Tran, Diep Luong, K. Drossos, T. Virtanen. Adversarial Representation Learning For Robust Privacy Preservation In Audio. ArXiv (2023).
- Shuai Tao, Himavanth Reddy, J. Jensen, M. G. Christensen. Frequency Bin-Wise Single Channel Speech Presence Probability Estimation Using Multiple Dnns. ArXiv (2023).
- Sihan Chen, Handong Li, Qunbo Wang, Zijia Zhao, Ming-Ting Sun, Xinxin Zhu, J. Liu. Vast: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model And Dataset. ArXiv (2023).
- Soham Deshmukh, Benjamin Elizalde, Rita Singh, Huaming Wang. Pengi: An Audio Language Model For Audio Tasks. ArXiv (2023).
- Sunghyun Kim, Yong-Hoon Choi. Wavebyol: Self-Supervised Learning For Audio Representation From Raw Waveforms. IEEE Access (2023).
- Swapnil Bhosale, Rupayan Chakraborty, S. Kopparapu. A Novel Metric For Evaluating Audio Caption Similarity. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023).
- Tanmay Khandelwal, Rohan Kumar Das. A Multi-Task Learning Framework For Sound Event Detection Using High-Level Acoustic Characteristics Of Sounds (2023).
- Vasudha Kowtha, Miquel Espi Marques, Jonathan Huang, Yichi Zhang, C. Avendaño. Learning To Detect Novel And Fine-Grained Acoustic Sequences Using Pretrained Audio Representations. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023).
- Vu Linh Le, Daewoo Kim, Eunsung Cho, Hyeryung Jang, Roben Delos Reyes, Hyunggug Kim, Dongheon Lee, I. Yoon, Joonki Hong, J. Kim. Real-Time Detection Of Sleep Apnea Based On Breathing Sounds And Prediction Reinforcement Using Home Noises: Algorithm Development And Validation.. Journal of medical Internet research (2023).
- Wataru Kawabe, Yuri Nakao, Akihisa Shitara, Yusuke Sugano. Technical Understanding From Iml Hands-On Experience: A Study Through A Public Event For Science Museum Visitors. ArXiv (2023).
- Wei-xin Xie, Yanxiong Li, Qianhua He, Wenchang Cao. Few-Shot Class-Incremental Audio Classification Via Discriminative Prototype Learning. Expert Systems with Applications (2023).
- Xian Li, Nian Shao, Xiaofei Li. Self-Supervised Audio Teacher-Student Transformer For Both Clip-Level And Frame-Level Tasks. ArXiv (2023).
- Xiao-Yuan Guo, Chun-Xian Gao, Hui Liu. Voice Activity Detection In The Presence Of Transient Based On Graph. EURASIP Journal on Audio, Speech, and Music Processing (2023).
- Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, MarkD . Plumbley, Yuexian Zou, Wenwu Wang. Wavcaps: A Chatgpt-Assisted Weakly-Labelled Audio Captioning Dataset For Audio-Language Multimodal Research. ArXiv (2023).
- Xiyuxing Zhang, Yuntao Wang, Jingru Zhang, Yaqing Yang, Shwetak N. Patel, Yuanchun Shi. Earcough: Enabling Continuous Subject Cough Event Detection On Hearables. CHI Extended Abstracts (2023).
- Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang. Bubogpt: Enabling Visual Grounding In Multi-Modal Llms (2023).
- Youngjun Heo, Sunggu Lee. Supervised Contrastive Learning For Voice Activity Detection. Electronics (2023).
- Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James Glass. Listen, Think, And Understand. ArXiv (2023).
- Yuancheng Wang, Zeqian Ju, Xuejiao Tan, Lei He, Zhizheng Wu, Jiang Bian, Sheng Zhao. Audit: Audio Editing By Following Instructions With Latent Diffusion Models. ArXiv (2023).
- Yuhang He, A. Markham. Soundsynp: Sound Source Detection From Raw Waveforms With Multi-Scale Synperiodic Filterbanks. AISTATS (2023).
- Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryotaro Nagase, Takahiro Fukumori, Y. Yamashita. Environmental Sound Conversion From Vocal Imitations And Sound Event Labels. ArXiv (2023).
- Yunhao Chen, Yunjie Zhu, Zihui Yan, Jian Shen, Zhen Ren, Yifan Huang. Data Augmentation For Environmental Sound Classification Using Diffusion Probabilistic Model With Top-K Selection Discriminator. ArXiv (2023).
- Yusun Shul, Byeongil Ko, Jung-Woo Choi. Divided Spectro-Temporal Attention For Sound Event Localization And Detection In Real Scenes For Dcase2023 Challenge (2023).
- Zhenze Xie, Xinquan Liang, Canale Roberto. Learning-Based Robotic Grasping: A Review. Frontiers in Robotics and AI (2023).
- Zhepei Wang, Cem Subakan, K. Subramani, Junkai Wu, T. Tavares, Fabio Ayres, P. Smaragdis. Unsupervised Improvement Of Audio-Text Cross-Modal Representations. ArXiv (2023).
- Zhongqiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeonghak Kim, Shinji Watanabe. Neural Speech Enhancement With Very Low Algorithmic Latency And Complexity Via Integrated Full- And Sub-Band Modeling. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023).
- Zirun Zhu, Hemin Yang, M. Tang, Ziyi Yang, S. Eskimez, Huaming Wang. Real-Time Audio-Visual End-To-End Speech Enhancement. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023).
2022 (176)
- A. Laptev, Boris Ginsburg. Fast Entropy-Based Methods Of Word-Level Confidence Estimation For End-To-End Automatic Speech Recognition. 2022 IEEE Spoken Language Technology Workshop (SLT) (2022).
- A. Madhu, S. K.. Envgan: A Gan-Based Augmentation To Improve Environmental Sound Classification. Artificial Intelligence Review (2022).
- A. Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, T. Virtanen. Starss22: A Dataset Of Spatial Recordings Of Real Scenes With Spatiotemporal Annotations Of Sound Events. ArXiv (2022).
- A. Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, T. Virtanen. Starss22: A Dataset Of Spatial Recordings Of Real Scenes With Spatiotemporal Annotations Of Sound Events. ArXiv (2022).
- A. Pompili, Tiago Luís, Nuno Monteiro, João Miranda, Carlos Mendes, S. Paulo. On The Detection Of Acoustic Events For Public Security: The Challenges Of The Counter-Terrorism Domain. IberSPEECH 2022 (2022).
- Ahmed Omran, Neil Zeghidour, Zalán Borsos, F. D. C. Quitry, M. Slaney, M. Tagliasacchi. Disentangling Speech From Surroundings In A Neural Audio Codec. ArXiv (2022).
- Ahmed Omran, Neil Zeghidour, Zalán Borsos, Félix de Chaumont Quitry, M. Slaney, M. Tagliasacchi. Disentangling Speech From Surroundings With Neural Embeddings. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2022).
- Alexander Alexander Ponomarchuk Ponomarchuk, Ilya Ilya Burenko Burenko, Elian Elian Malkin Malkin, Ivan Ivan Nazarov Nazarov, Vladimir Vladimir Kokh Kokh, Manvel Manvel Avetisian Avetisian, Leonid Leonid Zhukov Zhukov. Project Achoo: A Practical Model And Application For Covid-19 Detection From Recordings Of Breath, Voice, And Cough. Ieee Journal of Selected Topics in Signal Processing (2022).
- Alexandre D'efossez, Jade Copet, Gabriel Synnaeve, Yossi Adi. High Fidelity Neural Audio Compression. ArXiv (2022).
- Alison B. Ma, Alexander Lerch. Representation Learning For The Automatic Indexing Of Sound Effects Libraries (2022).
- Ammar Ahmed, Y. Serrestou, K. Raoof, J. Diouris. Empirical Mode Decomposition-Based Feature Extraction For Environmental Sound Classification. Sensors (2022).
- Ana Elisa Méndez Méndez, M. Cartwright, J. Bello, O. Nov. Eliciting Confidence For Improving Crowdsourced Audio Annotations. Proceedings of the ACM on Human-Computer Interaction (2022).
- Ana Filipa Rodrigues Nogueira, Hugo S. Oliveira, J. Machado, J. M. R. Tavares. Sound Classification And Processing Of Urban Environments: A Systematic Literature Review. Sensors (2022).
- Anam Bansal, N. Garg. Environmental Sound Classification: A Descriptive Review Of The Literature. Intelligent Systems with Applications (2022).
- Andong Li, Guochen Yu, C. Zheng, Wenzhe Liu, Xiaodong Li. A General Unfolding Speech Enhancement Method Motivated By Taylor'S Theorem (2022).
- Arsha Nagrani, P. H. Seo, Bryan Seybold, Anja Hauth, Santiago Manén, Chen Sun, C. Schmid. Learning Audio-Video Modalities From Image Captions. ArXiv (2022).
- Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha. Slicer: Learning Universal Audio Representations Using Low-Resource Self-Supervised Pre-Training. ArXiv (2022).
- Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha. Slicer: Learning Universal Audio Representations Using Low-Resource Self-Supervised Pre-Training. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2022).
- B. Weck, Miguel P'erez Fern'andez, Holger Kirchhoff, Xavier Serra. Matching Text And Audio Embeddings: Exploring Transfer-Learning Strategies For Language-Based Audio Retrieval. DCASE (2022).
- Benjamin Elizalde, Soham Deshmukh, Mahmoud Al Ismail, Huaming Wang. Clap: Learning Audio Concepts From Natural Language Supervision. ArXiv (2022).
- Byeongil Ko, Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Seung-Deok Choi, Yong-Hwa Park. Data Augmentation And Squeeze-And-Excitation Network On Multiple Dimension For Sound Event Localization And Detection In Real Scenes (2022).
- Calum Heggan, S. Budgett, Timothy M. Hospedales, Mehrdad Yaghoobi. Metaaudio: A Few-Shot Audio Classification Benchmark. ArXiv (2022).
- Carlo Aironi, Samuele Cornell, E. Principi, S. Squartini. Graph Node Embeddings For Ontology-Aware Sound Event Classification: An Evaluation Study. 2022 30th European Signal Processing Conference (EUSIPCO) (2022).
- Carlotta Anemuller, O. Thiergart, Emanuël Habets. A Data-Driven Approach To Audio Decorrelation. IEEE Signal Processing Letters (2022).
- Chi-Chang Lee, Cheng-Hung Hu, Yu-Chen Lin, Chu-Song Chen, Hsin-Min Wang, Yu Tsao. Nastar: Noise Adaptive Speech Enhancement With Target-Conditional Resampling. ArXiv (2022).
- Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, N. Harada, K. Kashino. Introducing Auxiliary Text Query-Modifier To Content-Based Audio Retrieval. ArXiv (2022).
- Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, N. Harada, K. Kashino. Masked Spectrogram Modeling Using Masked Autoencoders For Learning General-Purpose Audio Representation. ArXiv (2022).
- Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, N. Harada, K. Kashino. Masked Spectrogram Modeling Using Masked Autoencoders For Learning General-Purpose Audio Representation. ArXiv (2022).
- Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, N. Harada, K. Kashino. Byol For Audio: Exploring Pre-Trained General-Purpose Audio Representations. ArXiv (2022).
- Daniel Lin. Contrastive Feature Learning For Audio Classification (2022).
- Darius Petermann, G. Wichern, A. Subramanian, Zhong-Qiu Wang, Jonathan Le Roux. Tackling The Cocktail Fork Problem For Separation And Transcription Of Real-World Soundtracks. ArXiv (2022).
- David Schindler, S. Spors, Burcu Demiray, Frank Krüger. Automatic Behavior Assessment From Uncontrolled Everyday Audio Recordings By Deep Learning. Sensors (2022).
- Dianwen Ng, Jia Qi Yip, Tanmay Surana, Zhao Yang, Chong Zhang, Yukun Ma, Chongjia Ni, Chng Eng Siong, B. Ma. I2Cr: Improving Noise Robustness On Keyword Spotting Using Inter-Intra Contrastive Regularization. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (2022).
- Diego de Benito-Gorrón, Kateřina Žmolíková, D. Toledano. Source Separation For Sound Event Detection In Domestic Environments Using Jointly Trained Models. 2022 International Workshop on Acoustic Signal Enhancement (IWAENC) (2022).
- E. Guizzo, C. Marinoni, Marco Pennese, Xinlei Ren, Xiguang Zheng, Chen Zhang, B. Masiero, A. Uncini, D. Comminiello. L3Das22 Challenge: Learning 3D Audio Sources In A Real Office Environment. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2022).
- E. Guizzo, C. Marinoni, Marco Pennese, Xinlei Ren, Xiguang Zheng, Chen Zhang, B. Masiero, A. Uncini, D. Comminiello. L3Das22 Challenge: Learning 3D Audio Sources In A Real Office Environment (2022).
- Efthymios Tzinis, G. Wichern, P. Smaragdis, Jonathan Le Roux. Optimal Condition Training For Target Source Separation. ArXiv (2022).
- Efthymios Tzinis, Yossi Adi, V. Ithapu, Buye Xu, P. Smaragdis, Anurag Kumar. Remixit: Continual Self-Training Of Speech Enhancement Models Via Bootstrapped Remixing. IEEE Journal of Selected Topics in Signal Processing (2022).
- Efthymios Tzinis, Yossi Adi, V. Ithapu, Buye Xu, P. Smaragdis, Anurag Kumar. Remixit: Continual Self-Training Of Speech Enhancement Models Via Bootstrapped Remixing (2022).
- Eleonora Grassucci, Gioia Mancini, Christian Brignone, A. Uncini, D. Comminiello. Dual Quaternion Ambisonics Array For Six-Degree-Of-Freedom Acoustic Representation. ArXiv (2022).
- Emilian Postolache, Jordi Pons, Santiago Pascual, J. Serrà. Adversarial Permutation Invariant Training For Universal Sound Separation. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2022).
- Emilian Postolache, Jordi Pons, Santiago Pascual, Joan Serra. Adversarial Permutation Invariant Training For Universal Sound Separation. ArXiv (2022).
- Enric Gus'o, Jordi Pons, Santiago Pascual, J. Serrà. On Loss Functions And Evaluation Metrics For Music Source Separation (2022).
- Felix Kreuk, Gabriel Synnaeve, A. Polyak, Uriel Singer, Alexandre D'efossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi. Audiogen: Textually Guided Audio Generation. ICLR (2022).
- Felix Kreuk, Gabriel Synnaeve, A. Polyak, Uriel Singer, Alexandre D'efossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi. Audiogen: Textually Guided Audio Generation. ArXiv (2022).
- Francesca Incitti, Federico Urli, L. Snidaro. Beyond Word Embeddings: A Survey. Information Fusion (2022).
- Francesca Ronchini, R. Serizel. A Benchmark Of State-Of-The-Art Sound Event Detection Systems Evaluated On Synthetic Soundscapes. ArXiv (2022).
- Francesca Ronchini, Samuele Cornell, R. Serizel, Nicolas Turpault, Eduardo Fonseca, D. Ellis. Description And Analysis Of Novelties Introduced In Dcase Task 4 2022 On The Baseline System. DCASE (2022).
- Gasser Elbanna, Neil Scheidwasser-Clow, M. Kegler, P. Beckmann, Karl El Hajal, M. Cernak. Byol-S: Learning Self-Supervised Speech Representations By Bootstrapping. ArXiv (2022).
- Gasser Elbanna, Neil Scheidwasser-Clow, M. Kegler, P. Beckmann, Karl El Hajal, M. Cernak. Byol-S: Learning Self-Supervised Speech Representations By Bootstrapping. ArXiv (2022).
- Gasser Elbanna, Neil Scheidwasser-Clow, M. Kegler, P. Beckmann, Karl El Hajal, M. Cernak. Byol-S: Learning Self-Supervised Speech Representations By Bootstrapping (2022).
- Grant Van Horn, Rui Qian, Kimberly Wilber, Hartwig Adam, Oisin Mac Aodha, S. Belongie. Exploring Fine-Grained Audiovisual Categorization With The Ssw60 Dataset. ArXiv (2022).
- H. Jleed, M. Bouchard. Incremental Multiclass Open-Set Audio Recognition. International Journal of Advances in Intelligent Informatics (2022).
- H. Taherian, S. Eskimez, Takuya Yoshioka. Breaking The Trade-Off In Personalized Speech Enhancement With Cross-Task Knowledge Distillation. ArXiv (2022).
- Han Liu, H. Liu. When Evil Calls: Targeted Adversarial Voice Over Ip Network (2022).
- Han Liu, Zhiyuan Yu, Mingming Zha, Xiaofeng Wang, W. Yeoh, Yevgeniy Vorobeychik, Ning Zhang. When Evil Calls: Targeted Adversarial Voice Over Ip Network. CCS (2022).
- Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, MarkD . Plumbley. Ontology-Aware Learning And Evaluation For Audio Tagging. ArXiv (2022).
- Haohe Liu, Xubo Liu, Qiuqiang Kong, Wenwu Wang, MarkD . Plumbley. Learning The Spectrogram Temporal Resolution For Audio Classification. ArXiv (2022).
- Heinrich Dinkel, Yongqing Wang, Zhiyong Yan, Junbo Zhang, Yujun Wang. Unikw-At: Unified Keyword Spotting And Audio Tagging. INTERSPEECH (2022).
- Heinrich Dinkel, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang. An Empirical Study Of Weakly Supervised Audio Tagging Embeddings For General Audio Representations. Odyssey (2022).
- Heinrich Dinkel, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang. Pseudo Strong Labels For Large Scale Weakly Supervised Audio Tagging. ICASSP (2022).
- Helin Wang, Dongchao Yang, Chao Weng, Jia-yi Yu, Yuexian Zou. Improving Target Sound Extraction With Timestamp Information. ArXiv (2022).
- Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, B. Raj, Rita Singh. Describing Emotions With Acoustic Property Prompts For Speech Emotion Recognition. ArXiv (2022).
- Hoang-Thi Nguyen-Vo, Huy Nguycn-Gia, Hoan-Duy Nguyen-Tran, Hoang Pham-Minh, Hung Vo-Thanh, Hao Do-Due. Marblenet: A Deep Neural Network Solution For Vietnamese Voice Activity Detection. 2022 9th NAFOSTED Conference on Information and Computer Science (NICS) (2022).
- Huang Xie, O. Räsänen, T. Virtanen. On Negative Sampling For Contrastive Audio-Text Retrieval. ArXiv (2022).
- Huang Xie, Samuel Lipping, T. Virtanen. Language-Based Audio Retrieval Task In Dcase 2022 Challenge. DCASE (2022).
- Huang Xie, Samuel Lipping, T. Virtanen. Dcase 2022 Challenge Task 6B: Language-Based Audio Retrieval Technical (2022).
- Huang Xie, Samuel Lipping, T. Virtanen. Dcase 2022 Challenge Task 6B: Language-Based Audio Retrieval (2022).
- Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, M. Tang, Jong Won Shin, Shujie Liu. Exploring Wavlm On Speech Enhancement. 2022 IEEE Spoken Language Technology Workshop (SLT) (2022).
- Il-Young Jeong, Jeongsoon Park. Cochlscene: Acquisition Of Acoustic Scene Data Using Crowdsourcing. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (2022).
- J. Rulff, Fábio Miranda, Maryam Hosseini, Marcos Lage, M. Cartwright, Graham Dove, J. Bello, Cláudio T. Silva. Urban Rhapsody: Large‐Scale Exploration Of Urban Soundscapes. Comput. Graph. Forum (2022).
- J. Rulff, Fábio Miranda, Maryam Hosseini, Marcos Lage, M. Cartwright, Graham Dove, J. Bello, Cláudio T. Silva. Urban Rhapsody: Large-Scale Exploration Of Urban Soundscapes. ArXiv (2022).
- Janek Ebbers, R. Serizel, Reinhold Haeb-Umbach. Threshold Independent Evaluation Of Sound Event Detection Scores. ArXiv (2022).
- Jinbo Hu, Yin Cao, Ming Wu, Qiuqiang Kong, Feiran Yang, MarkD . Plumbley, J. Yang. Sound Event Localization And Detection For Real Spatial Sound Scenes: Event-Independent Network And Data Augmentation Chains. DCASE (2022).
- Jingdong Li, Yuanyuan Zhu, Dawei Luo, Yun Liu, Guohui Cui, Zhaoxia Li. The Pcg-Aiid System For L3Das22 Challenge: Mimo And Miso Convolutional Recurrent Network For Multi Channel Speech Enhancement And Speech Recognition (2022).
- Jinhua Liang, Huy Phan, Emmanouil Benetos. Learning From Taxonomy: Multi-Label Few-Shot Classification For Everyday Sound Recognition. ArXiv (2022).
- Jinhua Liang, Huy Phan, Emmanouil Benetos. Leveraging Label Hierachies For Few-Shot Everyday Sound Recognition. DCASE (2022).
- Johann Kay Ann Tan, Y. Hasegawa, S. Lau. A Comprehensive Environmental Sound Categorization Scheme Of An Urban City. Applied Acoustics (2022).
- Jonathan Svirsky, O. Lindenbaum. Sg-Vad: Stochastic Gates Based Speech Activity Detection. ArXiv (2022).
- Joseph P. Turian, Jordie Shier, H. Khan, B. Raj, Björn Schuller, C. Steinmetz, C. Malloy, G. Tzanetakis, Gissel Velarde, K. McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, J. Salamon, P. Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk. Hear 2021: Holistic Evaluation Of Audio Representations. ArXiv (2022).
- Joseph P. Turian, Jordie Shier, H. Khan, B. Raj, Björn Schuller, C. Steinmetz, C. Malloy, G. Tzanetakis, Gissel Velarde, K. McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, J. Salamon, P. Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk. Hear: Holistic Evaluation Of Audio Representations (2022).
- Ju-ho Kim, Ju-Sung Heo, Hyun-seo Shin, Chanmann Lim, Ha-jin Yu. Integrated Parameter-Efficient Tuning For General-Purpose Audio Models. ArXiv (2022).
- Julia Berezutskaya, L. Ambrogioni, N. Ramsey, M. Gerven. Towards Naturalistic Speech Decoding From Intracranial Brain Data. 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) (2022).
- Jun Shen, M. Khodak, Ameet S. Talwalkar. Efficient Architecture Search For Diverse Tasks. ArXiv (2022).
- Jun Shen, M. Khodak, Ameet S. Talwalkar. Efficient Architecture Search For Diverse Tasks. ArXiv (2022).
- Karn Nichakarn Watcharasupat, Kenneth Ooi, Bhan Lam, Trevor Wong, Zhen-Ting Ong, W. Gan. Autonomous In-Situ Soundscape Augmentation Via Joint Selection Of Masker And Gain. ArXiv (2022).
- Karn Nichakarn Watcharasupat, Kenneth Ooi, Bhan Lam, Trevor Wong, Zhen-Ting Ong, W. Gan. Autonomous In-Situ Soundscape Augmentation Via Joint Selection Of Masker And Gain. IEEE Signal Processing Letters (2022).
- Kenneth Ooi, Bhan Lam, J. Hong, Karn Nichakarn Watcharasupat, Zhen-Ting Ong, W. Gan. Singapore Soundscape Site Selection Survey (S5): Identification Of Characteristic Soundscapes Of Singapore Via Weighted K-Means Clustering. Sustainability (2022).
- Kenneth Ooi, Zhen-Ting Ong, Karn Nichakarn Watcharasupat, Bhan Lam, J. Hong, Woon-Seng Gan Nanyang Technological University, Singapore, C. University, Daejeon, R. Korea. Araus: A Large-Scale Dataset And Baseline Models Of Affective Responses To Augmented Urban Soundscapes. ArXiv (2022).
- Kenneth Ooi, Zhen-Ting Ong, Karn Nichakarn Watcharasupat, Bhan Lam, J. Hong, Woon-Seng Gan Nanyang Technological University, Singapore, Chungnam National University, Daejeon, R. Korea. Araus: A Large-Scale Dataset And Baseline Models Of Affective Responses To Augmented Urban Soundscapes. IEEE Transactions on Affective Computing (2022).
- Kevin Kilgour, Beat Gfeller, Qingqing Huang, A. Jansen, Scott Wisdom, M. Tagliasacchi. Text-Driven Separation Of Arbitrary Sounds. ArXiv (2022).
- Khaled Koutini, Shahed Masoudian, Florian Schmid, Hamid Eghbal-zadeh, Jan Schluter, G. Widmer. Learning General Audio Representations With Large-Scale Training Of Patchout Audio Transformers. ArXiv (2022).
- Kohei Suzuki, Shoki Sakamoto, T. Taniguchi, H. Kameoka. Speak Like A Dog: Human To Non-Human Creature Voice Conversion. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (2022).
- Kohei Suzuki, Shoki Sakamoto, T. Taniguchi, H. Kameoka. Speak Like A Dog: Human To Non-Human Creature Voice Conversion (2022).
- Kuan-Po Huang, Yu-Kuan Fu, Tsung-Yuan Hsu, Fabian Ritter Gutierrez, Fan Wang, Liang-Hsuan Tseng, Yu Zhang, Hung-yi Lee. Improving Generalizability Of Distilled Self-Supervised Speech Processing Models Under Distorted Settings. 2022 IEEE Spoken Language Technology Workshop (SLT) (2022).
- Kuan-Po Huang, Yuanbin Fu, Yu Zhang, Hung-yi Lee. Improving Distortion Robustness Of Self-Supervised Speech Processing Tasks With Domain Adaptation. ArXiv (2022).
- Kuan-Po Huang, Yuanbin Fu, Yu Zhang, Hung-yi Lee. Improving Distortion Robustness Of Self-Supervised Speech Processing Tasks With Domain Adaptation. ArXiv (2022).
- L. Delebecque, R. Serizel, Nicolas Furnon. Towards An Efficient Computation Of Masks For Multichannel Speech Enhancement (2022).
- L. Turchet, Marco Carraro, Matteo Tomasetti. Freesoundvr: Soundscape Composition In Virtual Reality Using Online Sound Repositories. Virtual Reality (2022).
- Luke Dzwonczyk. Source Separation Methods For Computer-Assisted Orchestration (2022).
- Léo Cances, E. Labbé, Thomas Pellegrini. Comparison Of Semi-Supervised Deep Learning Algorithms For Audio Classification. EURASIP Journal on Audio, Speech, and Music Processing (2022).
- M. Abdollahi, R. Serizel, A. Rakotomamonjy, G. Gasso. Integrating Isolated Examples With Weakly-Supervised Sound Event Detection: A Direct Approach. DCASE (2022).
- M. Neri, F. Battisti, A. Neri, M. Carli. Sound Event Detection For Human Safety And Security In Noisy Environments. IEEE Access (2022).
- Madhurananda Pahar, M. Klopper, B. Reeve, R. Warren, G. Theron, A. Diacon, T. Niesler. Automatic Tuberculosis And Covid-19 Cough Classification Using Deep Learning. 2022 International Conference on Electrical, Computer and Energy Technologies (ICECET) (2022).
- Madhurananda Pahar, M. Klopper, Byron Reeve, R. Warren, G. Theron, A. Diacon, T. Niesler. Automatic Tuberculosis And Covid-19 Cough Classification Using Deep Learning. ArXiv (2022).
- Manthan Thakker, S. Eskimez, T. Yoshioka, Huaming Wang. Fast Real-Time Personalized Speech Enhancement: End-To-End Enhancement Network (E3Net) And Knowledge Distillation. ArXiv (2022).
- Marc Delcroix, Jorge Bennasar V'azquez, Tsubasa Ochiai, K. Kinoshita, Yasunori Ohishi, S. Araki. Soundbeam: Target Sound Extraction Conditioned On Sound-Class Labels And Enrollment Clues For Increased Performance And Continuous Learning. ArXiv (2022).
- Marc Delcroix, Jorge Bennasar V'azquez, Tsubasa Ochiai, K. Kinoshita, Yasunori Ohishi, S. Araki. Soundbeam: Target Sound Extraction Conditioned On Sound-Class Labels And Enrollment Clues For Increased Performance And Continuous Learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2022).
- Masato Hagiwara. Aves: Animal Vocalization Encoder Based On Self-Supervision. ArXiv (2022).
- Mashrur M. Morshed, Ahmad Omar Ahsan, Hasan Mahmud, Md. Kamrul Hasan. Learning Audio Representations With Mlps. ArXiv (2022).
- Matthew Groh, Aruna Sankaranarayanan, Nikhil Singh, Dong Young Kim, A. Lippman, Rosalind W. Picard. Human Detection Of Political Speech Deepfakes Across Transcripts, Audio, And Video (2022).
- Michela Cantarini, L. Gabrielli, S. Squartini. Few-Shot Emergency Siren Detection. Sensors (2022).
- Michelle Charette, Elizabeth Lima, Denielle Elliott. Sonic Stories, Sensory Ethnography, And Listening With An Injured Mind. Multimodality & Society (2022).
- Mohammad MohammadAmini, D. Matrouf, J. Bonastre, Sandipana Dowerah, R. Serizel, D. Jouvet. A Comprehensive Exploration Of Noise Robustness And Noise Compensation In Resnet And Tdnn-Based Speaker Recognition Systems (2022).
- Mohammad MohammadAmini, D. Matrouf, J. Bonastre, Sandipana Dowerah, R. Serizel, D. Jouvet. Learning Noise Robust Resnet-Based Speaker Embedding For Speaker Recognition. Odyssey (2022).
- Moreno La Quatra, L. Vaiani, Alkis Koudounas, Luca Cagliero, P. Garza, Elena Baralis. How Much Attention Should We Pay To Mosquitoes?. ACM Multimedia (2022).
- Muhammad Asif, Muhammad Usaid, Munaf Rashid, Tabarka Rajab, S. Hussain, Sarwar Wasi. Large-Scale Audio Dataset For Emergency Vehicle Sirens And Road Noises. Scientific Data (2022).
- Nico M. Schmidt, Jordi Pons, M. Miron. Podcastmix: A Dataset For Separating Music And Speech In Podcasts. ArXiv (2022).
- Nikhil Singh, Guillermo Bernal, D. Savchenko, Elena L. Glassman. A Selective Summary Of Where To Hide A Stolen Elephant: Leaps In Creative Writing With Multimodal Machine Intelligence. IN2WRITING (2022).
- Oleg Rybakov, M. Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang. A S ] 2 8 Ju L 2 02 2 Real Time Spectrogram Inversion Onmobile Phone (2022).
- Oleg Rybakov, M. Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy. Real Time Spectrogram Inversion On Mobile Phone. ArXiv (2022).
- Oleg Rybakov, M. Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy. Real Time Spectrogram Inversion On Mobile Phone. ArXiv (2022).
- P. Tremblay, Gerard Roma, Owen Green. Enabling Programmatic Data Mining As Musicking: The Fluid Corpus Manipulation Toolkit. Computer Music Journal (2022).
- Pranay Manocha, Zeyu Jin, A. Finkelstein. Sqapp: No-Reference Speech Quality Assessment Via Pairwise Preference (2022).
- Pritam Sarkar, A. Etemad. Xkd: Cross-Modal Knowledge Distillation With Domain Alignment For Video Representation Learning. ArXiv (2022).
- Pritam Sarkar, A. Etemad. Xkd: Cross-Modal Knowledge Distillation With Domain Alignment For Video Representation Learning (2022).
- Qingqing Huang, A. Jansen, Joonseok Lee, R. Ganti, Judith Yue Li, D. Ellis. Mulan: A Joint Embedding Of Music Audio And Natural Language (2022).
- Qiu-shi Zhu, J. Zhang, Zitian Zhang, Lirong Dai. Joint Training Of Speech Enhancement And Self-Supervised Model For Noise-Robust Asr. ArXiv (2022).
- Qiu-shi Zhu, Jie Zhang, Zi-qiang Zhang, Ming Wu, Xin Fang, Lirong Dai. A Noise-Robust Self-Supervised Pre-Training Model Based Speech Representation Learning For Automatic Speech Recognition (2022).
- R. B. Singh, H. Zhuang. Measurements, Analysis, Classification, And Detection Of Gunshot And Gunshot-Like Sounds. Sensors (2022).
- R. Biswas, K. Nathwani. Optimal Near-End Speech Intelligibility Improvement Using Clpso-Based Voice Transformation In Realistic Noisy Environments. Circuits, Systems, and Signal Processing (2022).
- Rajapantula Kranthi, Vasundhara. Distributed Active Noise Control Based On Inverse Tangent Robust Least Mean Logarithmic Square. 2022 IEEE International Symposium on Smart Electronic Systems (iSES) (2022).
- Roberto San Millán-Castillo, L. Martino, E. Morgado, F. Llorente. An Exhaustive Variable Selection Study For Linear Models Of Soundscape Emotions: Rankings And Gibbs Analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2022).
- Robin Scheibler, Tatsuya Komatsu, Yusuke Fujita, Michael Hentschel. On Sorting And Padding Multiple Targets For Sound Event Localization And Detection With Permutation Invariant And Location-Based Training. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (2022).
- S. Budgett, Mehrdad Yaghoobi. M Eta A Udio : A F Ew -S Hot A Udio C Lassification B Enchmark ∗ (2022).
- S. Eskimez, Takuya Yoshioka, Alex Ju, M. Tang, Tanel Pärnamaa, Huaming Wang. Real-Time Joint Personalized Speech Enhancement And Acoustic Echo Cancellation With E3Net. ArXiv (2022).
- S. Eskimez, Takuya Yoshioka, Alex Ju, M. Tang, Tanel Pärnamaa, Huaming Wang. Real-Time Joint Personalized Speech Enhancement And Acoustic Echo Cancellation With E3Net. ArXiv (2022).
- Samuel Lipping, Parthasaarathy Sudarsanam, K. Drossos, T. Virtanen. Clotho-Aqa: A Crowdsourced Dataset For Audio Question Answering. ArXiv (2022).
- Sandeep Reddy Kothinti, Dimitra Emmanouilidou. Investigations In Audio Captioning: Addressing Vocabulary Imbalance And Evaluating Suitability Of Language-Centric Performance Metrics. ArXiv (2022).
- Sandeep Reddy Kothinti, Dimitra Emmanouilidou. Investigations In Audio Captioning: Addressing Vocabulary Imbalance And Evaluating Suitability Of Language-Centric Performance Metrics. ArXiv (2022).
- Sandipana Dowerah, R. Serizel, D. Jouvet, Mohammad MohammadAmini, D. Matrouf. How To Leverage Dnn-Based Speech Enhancement For Multi-Channel Speaker Verification?. ArXiv (2022).
- Sandipana Dowerah, R. Serizel, D. Jouvet, Mohammad, Mohammadamini, D. Matrouf. Compensating Noise And Reverberation In Far-Field Multichannel Speaker Verification (2022).
- Shrishail Baligar, S. Newsam. Cossd - An End-To-End Framework For Multi-Instance Source Separation And Detection. 2022 30th European Signal Processing Conference (EUSIPCO) (2022).
- Shubo Lv, Yihui Fu, Yukai Jv, Linfu Xie, Weixin Zhu, Wei Rao, Yannan Wang. Spatial-Dccrn: Dccrn Equipped With Frame-Level Angle Feature And Hybrid Filtering For Multi-Channel Speech Enhancement. 2022 IEEE Spoken Language Technology Workshop (SLT) (2022).
- Shuozhen Yang, Long Zhang, Yuhua Wei, Hengyuan Zhang. Multi-Scale Convolution For Sound Event Detection Technology. 2022 IEEE 2nd International Conference on Mobile Networks and Wireless Communications (ICMNWC) (2022).
- Shwetank Choudhary, C. Karthik, Punuru Sri Lakshmi, Sumit Kumar. Lean: Light And Efficient Audio Classification Network. 2022 IEEE 19th India Council International Conference (INDICON) (2022).
- Slawomir Kapka, J. Tkaczuk. Coloc: Conditioned Localizer And Classifier For Sound Event Localization And Detection. DCASE (2022).
- Sreyan Ghosh, Ashish Seth, S. Umesh. Delores: Decorrelating Latent Spaces For Low-Resource Audio Representation Learning. ArXiv (2022).
- Sreyan Ghosh, Ashish Seth, S. Umesh. Delores: Decorrelating Latent Spaces For Low-Resource Audio Representation Learning. ArXiv (2022).
- Sunghyun Yoon. Reflection Of Conditional Independence Structure To Noise Variability For Noise Robust Text Dependent Speaker Verification. IEEE Access (2022).
- Swapnil Bhosale, Rupayan Chakraborty, S. Kopparapu. Automatic Audio Captioning Using Attention Weighted Event Based Embeddings. ArXiv (2022).
- Swapnil Bhosale, Rupayan Chakraborty, S. Kopparapu. Text-To-Audio Grounding Based Novel Metric For Evaluating Audio Caption Similarity. ArXiv (2022).
- T. K. Chan, R. Das. Cross-Stitch Network With Adaptive Loss Weightage For Sound Event Localization And Detection. L3DAS22: Machine Learning for 3D Audio Signal Processing (2022).
- Takuya Koumura, Hiroki Terashima, S. Furukawa. Human-Like Modulation Sensitivity Emerging Through Optimization To Natural Sound Recognition. The Journal of Neuroscience (2022).
- Tara Vanhatalo, P. Legrand, M. Desainte-Catherine, P. Hanna, Antoine Brusco, Guillaume Pille, Yann Bayle. A Review Of Neural Network-Based Emulation Of Guitar Amplifiers. Applied Sciences (2022).
- Tung-Yu Wu, Chen-An Li, Tzu-Han Lin, Tsung-Yuan Hsu, Hung-yi Lee. The Efficacy Of Self-Supervised Speech Models For Audio Representations (2022).
- Tung-Yu Wu, Chen-An Li, Tzu-Han Lin, Tsung-Yuan Hsu, Hung-yi Lee. The Ability Of Self-Supervised Speech Models For Audio Representations. ArXiv (2022).
- Xiaokang Zhao, Qiu-shi Zhu, J. Zhang. Speech Enhancement Using Self-Supervised Pre-Trained Model And Vector Quantization. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (2022).
- Xinhao Mei, Xubo Liu, MarkD . Plumbley, Wenwu Wang. Automated Audio Captioning: An Overview Of Recent Progress And New Challenges (2022).
- Xinhao Mei, Xubo Liu, MarkD . Plumbley, Wenwu Wang. Automated Audio Captioning: An Overview Of Recent Progress And New Challenges. EURASIP Journal on Audio, Speech, and Music Processing (2022).
- Xuenan Xu, Mengyue Wu, K. Yu. A Comprehensive Survey Of Automated Audio Captioning. ArXiv (2022).
- Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe. Towards Low-Distortion Multi-Channel Speech Enhancement: The Espnet-Se Submission To The L3Das22 Challenge (2022).
- Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhongqiu Wang, Yu Tsao, Y. Qian, Shinji Watanabe. Espnet-Se++: Speech Enhancement For Robust Speech Recognition, Translation, And Understanding. ArXiv (2022).
- Yu Wang, M. Cartwright, J. Bello. Active Few-Shot Learning For Sound Event Detection. INTERSPEECH (2022).
- Yuan Gong, Jingbo Yu, James R. Glass. Vocalsound: A Dataset For Improving Human Vocal Sounds Recognition. ICASSP (2022).
- Yuan Gong, Sameer Khurana, Andrew Rouditchenko, James R. Glass. Cmkd: Cnn/Transformer-Based Cross-Model Knowledge Distillation For Audio Classification. ArXiv (2022).
- Yun Jung Lee, Hwayeon Joh, Suhyeon Yoo, U. Oh. Accesscomics2: Understanding The User Experience Of An Accessible Comic Book Reader For Blind People With Textual Sound Effects. ACM Transactions on Accessible Computing (2022).
- Yunjung Lee, Hwayeon Joh, Suhyeon Yoo, U. Oh. Accesscomics2: Understanding The User Experience Of An Accessible Comic Book Reader For Blind People With Textual Sound Effects. ACM Transactions on Accessible Computing (2022).
- Yusong Wu, K. Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, S. Dubnov. Large-Scale Contrastive Language-Audio Pretraining With Feature Fusion And Keyword-To-Caption Augmentation. ArXiv (2022).
- Yusong Wu, K. Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, S. Dubnov. Large-Scale Contrastive Language-Audio Pretraining With Feature Fusion And Keyword-To-Caption Augmentation. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2022).
- Zexu Pan, G. Wichern, Franccois G. Germain, A. Subramanian, Jonathan Le Roux. Towards End-To-End Speaker Diarization In The Wild. ArXiv (2022).
- Zhong-Qiu Wang, G. Wichern, Shinji Watanabe, Jonathan Le Roux. Stft-Domain Neural Speech Enhancement With Very Low Algorithmic Latency. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2022).
- Zhong-Qiu Wang, G. Wichern, Shinji Watanabe, Jonathan Le Roux. Stft-Domain Neural Speech Enhancement With Very Low Algorithmic Latency. ArXiv (2022).
- Zhongqiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeonghak Kim, Shinji Watanabe. Tf-Gridnet: Integrating Full- And Sub-Band Modeling For Speech Separation. ArXiv (2022).
- Zhongqiu Wang, Shinji Watanabe. Improving Frame-Online Neural Speech Enhancement With Overlapped-Frame Prediction. IEEE Signal Processing Letters (2022).
- Zubayer Islam, M. Abdel-Aty. Deep Convolutional Neural Network For Roadway Incident Surveillance Using Audio Data. ArXiv (2022).
2021 (134)
- . R Evisiting Transposed Convolutions For In Terpreting Raw Waveform Sound Event Recog Nition Cnn S By Sonification (2021).
- A. Aleluia, G. Cabral. Rapid Prototyping: Using Wizard Of Oz To Emulate Machine Learning Features For Interactive Artistic Applications. Anais do XVIII Simpósio Brasileiro de Computação Musical (SBCM 2021) (2021).
- A. Copiaco, C. Ritz, S. Fasciani, N. Abdulaziz. Dasee A Synthetic Database Of Domestic Acoustic Scenes And Events In Dementia Patients Environment. ArXiv (2021).
- A. Correya, Jorge Marcos-Fernández, Luis Joglar-Ongay, Pablo Alonso-Jiménez, X. Serra, D. Bogdanov. Audio And Music Analysis On The Web Using Essentia.Js. Trans. Int. Soc. Music. Inf. Retr. (2021).
- A. Jensenius. Best Versus Good Enough Practices For Open Music Research. Empirical Musicology Review (2021).
- A. Madhu, S. Kumaraswamy. Envgan: Adversarial Synthesis Of Environmental Sounds For Data Augmentation. ArXiv (2021).
- A. P. Mishra, N. S. Harper, J. Schnupp. Exploring The Distribution Of Statistical Feature Parameters For Natural Sound Textures. PloS one (2021).
- A. S. Koepke, Andreea-Maria Oncescu, João F. Henriques, Zeynep Akata, Samuel Albanie. Audio Retrieval With Natural Language Queries: A Benchmark Study. IEEE Transactions on Multimedia (2021).
- A. Shams, M. Raihan, Md. Mohi Uddin Khan, Ocean Monjur, Rahat Bin Preo. Telehealthcare And Telepathology In Pandemic: A Noninvasive, Low-Cost Micro-Invasive And Multimodal Real-Time Online Application For Early Diagnosis Of Covid-19 Infection (Preprint) (2021).
- Aaron Valero Puche, Sukhan Lee. Caesynth: Real-Time Timbre Interpolation And Pitch Control With Conditional Autoencoders. 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP) (2021).
- Abdulaziz Saleh Ba Wazir, H. A. Karim, Mohd Haris Lye Abdullah, Nouar AlDahoul, Sarina Mansor, M. F. A. Fauzi, John See, Ahmad Syazwan Naim. Design And Implementation Of Fast Spoken Foul Language Recognition With Different End-To-End Deep Neural Network Architectures. Sensors (2021).
- Adri'an Barahona-R'ios, Tom Collins. Specsingan: Sound Effect Variation Synthesis Using Single-Image Gans. ArXiv (2021).
- Adri'an Barahona-R'ios, Tom Collins. Specsingan: Sound Effect Variation Synthesis Using Single-Image Gans. ArXiv (2021).
- Alexander Ponomarchuk, I. Burenko, Elian Malkin, I. Nazarov, V. Kokh, Manvel Avetisian, L. Zhukov. Project Achoo: A Practical Model And Application For Covid-19 Detection From Recordings Of Breath, Voice, And Cough. IEEE Journal of Selected Topics in Signal Processing (2021).
- Alexander Ponomarchuk, I. Burenko, Elian Malkin, Ivan Nazarov, V. Kokh, Manvel Avetisian, L. Zhukov. Project Achoo: A Practical Model And Application For Covid-19 Detection From Recordings Of Breath, Voice, And Cough. ArXiv (2021).
- Andreea-Maria Oncescu, A. S. Koepke, João F. Henriques, Zeynep Akata, Samuel Albanie. Audio Retrieval With Natural Language Queries. Interspeech 2021 (2021).
- Anis Haron. Tone Color 音色排序的计算分类 (2021).
- Anna Xambó. A Live Coding Session With The Cloud And A Virtual Agent (2021).
- Anna Xambó, Gerard Roma, Sam Roig, Eduard Solaz. Live Coding With The Cloud And A Virtual Agent (2021).
- Archiki Prasad, P. Jyothi, R. Velmurugan. An Investigation Of End-To-End Models For Robust Speech Recognition. ArXiv (2021).
- Ariane Stolfi, D. P. S. D. Novais. Improvisation In Isolation: Quarentena Liv(R)E And Noise Symphony With The Playsound Online Music Making Tool (2021).
- Aswin Sivaraman, Minje Kim. Efficient Personalized Speech Enhancement Through Self-Supervised Learning. IEEE Journal of Selected Topics in Signal Processing (2021).
- Aswin Sivaraman, Sunwoo Kim, Minje Kim. Personalized Speech Enhancement Through Self-Supervised Data Augmentation And Purification. Interspeech 2021 (2021).
- B. Weck, Xavier Favory, Konstantinos Drossos, X. Serra. Evaluating Off-The-Shelf Machine Listening And Natural Language Models For Automated Audio Captioning. ArXiv (2021).
- Chandan K. A. Reddy, Vishak Gopa, Harishchandra Dubey, Sergiy Matusevych, Ross Cutler, R. Aichner. Musicnet: Compact Convolutional Neural Network For Real-Time Background Music Detection. ArXiv (2021).
- Chandan K.A. Reddy, Vishak Gopa, Harishchandra Dubey, Sergiy Matusevych, Ross Cutler, R. Aichner. Musicnet: Compact Convolutional Neural Network For Real-Time Background Music Detection. ArXiv (2021).
- Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda. Noisy-To-Noisy Voice Conversion Framework With Denoising Model. ArXiv (2021).
- Clarity, Xi Chen, Yupeng Shi, Wei Xiao, Tingzhao Wu, Meng Wang, Shidong Shang, N. Zheng, Q. Meng. A Cascaded Speech Enhancement For Hearing Aids In Noisy-Reverberant Conditions (2021).
- D. Arteaga, J. Pons. Multichannel-Based Learning For Audio Object Extraction. ArXiv (2021).
- D. Arteaga, Jordi Pons. Multichannel-Based Learning For Audio Object Extraction. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021).
- D. Jain. Protosound: A Personalized And Scalable Sound Recognition System For Deaf And Hard-Of-Hearing Users (2021).
- Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, N. Harada, K. Kashino. Byol For Audio: Self-Supervised Learning For General-Purpose Audio Representation. 2021 International Joint Conference on Neural Networks (IJCNN) (2021).
- Darius Petermann, G. Wichern, Zhong-Qiu Wang, Jonathan Le Roux. The Cocktail Fork Problem: Three-Stem Audio Separation For Real-World Soundtracks. ICASSP (2021).
- Darius Petermann, G. Wichern, Zhong-Qiu Wang, Jonathan Le Roux. The Cocktail Fork Problem: Three-Stem Audio Separation For Real-World Soundtracks. ArXiv (2021).
- Diego De Benito-Gorrón, Daniel Ramos, D. Toledano. A Multi-Resolution Crnn-Based Approach For Semi-Supervised Sound Event Detection In Dcase 2020 Challenge. IEEE Access (2021).
- Diego de Benito-Gorrón, Daniel Ramos, D. Toledano. An Analysis Of Sound Event Detection Under Acoustic Degradation Using Multi-Resolution Systems. IberSPEECH (2021).
- E. Guizzo, C. Marinoni, Marco Pennese, Xinlei Ren, Xiguang Zheng, Chen Zhang, B. Masiero, D. Comminiello. L3Das22 Challenge: Machine Learning For 3D Audio Signal Processing (2021).
- E. Guizzo, Riccardo F. Gramaccioni, Saeid Jamili, C. Marinoni, Edoardo Massaro, Claudia Medaglia, Giuseppe Nachira, Leonardo Nucciarelli, Ludovica Paglialunga, M. Pennese, Sveva Pepe, Enrico Rocchi, A. Uncini, D. Comminiello. L3Das21 Challenge: Machine Learning For 3D Audio Signal Processing. 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP) (2021).
- E. Guizzo, Riccardo F. Gramaccioni, Saeid Jamili, C. Marinoni, Edoardo Massaro, Claudia Medaglia, Giuseppe Nachira, Leonardo Nucciarelli, Ludovica Paglialunga, Marco Pennese, Sveva Pepe, Enrico Rocchi, A. Uncini, D. Comminiello. L3Das21 Challenge: Machine Learning For 3D Audio Signal Processing. 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP) (2021).
- E. Gómez. Deep Noise Suppression For Real Time Speech Enhancement In A Single Channel Wide Band Scenario (2021).
- Eduardo Fonseca, Andrés Ferraro, Xavier Serra. J Ul 2 02 1 Improving Sound Event Classification By Increasing Shift Invariance In Convolutional Neural Networks (2021).
- Eduardo Fonseca, Andrés Ferraro, Xavier Serra. Improving Sound Event Classification By Increasing Shift Invariance In Convolutional Neural Networks. ArXiv (2021).
- Efthymios Tzinis, Jonah Casebeer, Zhepei Wang, P. Smaragdis. Separate But Together: Unsupervised Federated Learning For Speech Enhancement From Non-Iid Data. ArXiv (2021).
- Efthymios Tzinis, Yossi Adi, V. Ithapu, Buye Xu, Anurag Kumar. Continual Self-Training With Bootstrapped Remixing For Speech Enhancement. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021).
- Efthymios Tzinis, Yossi Adi, V. Ithapu, Buye Xu, Anurag Kumar. Continual Self-Training With Bootstrapped Remixing For Speech Enhancement. ArXiv (2021).
- Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar. Continual Self-Training With Bootstrapped Remixing For Speech Enhancement. ArXiv (2021).
- F. Font. Source: A Freesound Community Music Sampler. Audio Mostly Conference (2021).
- Francesc Lluís, V. Chatziioannou, A. Hofmann. Music Source Separation Conditioned On 3D Point Clouds. ArXiv (2021).
- Francesca Ronchini, R. Serizel, Nicolas Turpault, Samuele Cornell. The Impact Of Non-Target Events In Synthetic Soundscapes For Sound Event Detection. ArXiv (2021).
- Félix Gontier, Vincent Lostanlen, M. Lagrange, N. Fortin, C. Lavandier, J. Petiot. Polyphonic Training Set Synthesis Improves Self-Supervised Urban Sound Classification.. The Journal of the Acoustical Society of America (2021).
- Gonzalo Montero, F. Corbera. Generating Sound Palettes For A Freesound Concatenative Synthesizer To Support Creativity (2021).
- Haron Anis, Chee Onn Wong, Soon Hin Hew. Algorithmic Identification Of Tone Color: A Comparison Of Algorithmic Identification And Identification By Survey Respondents. 10th International Conference on Digital and Interactive Arts (2021).
- Hassan Taherian, S. Eskimez, T. Yoshioka, Huaming Wang, Zhuo Chen, Xuedong Huang. One Model To Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement. ArXiv (2021).
- Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, J. Bello. Wav2Clip: Learning Robust Audio Representations From Clip. ArXiv (2021).
- Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, J. Bello. Wav2Clip: Learning Robust Audio Representations From Clip. ArXiv (2021).
- J. Abeßer. Usm-Sed - A Dataset For Polyphonic Sound Event Detection In Urban Sound Monitoring Scenarios. ArXiv (2021).
- J. Abeßer, Saichand Gourishetti, Andr'as K'atai, Tobias Clauss, Prachi Sharma, Judith Liebetrau. Idmt-Traffic: An Open Benchmark Dataset For Acoustic Traffic Monitoring Research. ArXiv (2021).
- Jialu Li, M. Hasegawa-Johnson, Nancy L. McElwain. Analysis Of Acoustic And Voice Quality Features For The Classification Of Infant And Mother Vocalizations. Speech Commun. (2021).
- Joseph P. Turian, Jordie Shier, G. Tzanetakis, K. McNally, Max Henry. One Billion Audio Sounds From Gpu-Enabled Modular Synthesis. ArXiv (2021).
- Juliette Millet, J. King. Inductive Biases, Pretraining And Fine-Tuning Jointly Account For Brain Responses To Speech. ArXiv (2021).
- Jun Deng, Chunhui Gao, Qian Feng, Xinzhou Xu, Zhaopeng Chen. Adaptive Generalized Cross-Entropy Loss For Sound Event Classification With Noisy Labels. 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2021).
- Jurgen Vandendriessche, Nick Wouters, Bruno da Silva, Mimoun Lamrini, Mohamed Yassin Chkouri, Abdellah Touhafi. Environmental Sound Recognition On Embedded Systems: From Fpgas To Tpus. Electronics (2021).
- Karn Nichakarn Watcharasupat, Thi Ngoc Tho Nguyen, Ngoc Khanh Nguyen, Zhen Jian Lee, Douglas L. Jones, W. Gan. Improving Polyphonic Sound Event Detection On Multichannel Recordings With The Sørensen-Dice Coefficient Loss And Transfer Learning. ArXiv (2021).
- Kenneth Ooi, Karn N. Watcharasupat, Santi Peksi, Furi Andi Karnapi, Zhen-Ting Ong, Danny Chua, Hui-Wen Leow, Li-Long Kwok, Xin-Lei Ng, Zhen-Ann Loh, W. Gan. A Strongly-Labelled Polyphonic Dataset Of Urban Sounds With Spatiotemporal Context. ArXiv (2021).
- Khaled Koutini, Jan Schlüter, Hamid Eghbal-zadeh, G. Widmer. Efficient Training Of Audio Transformers With Patchout. INTERSPEECH (2021).
- Khaled Koutini, Jan Schlüter, Hamid Eghbal-zadeh, G. Widmer. Efficient Training Of Audio Transformers With Patchout. ArXiv (2021).
- Kwanghee Choi, Martin Kersner, Jacob Morton, Buru Chang. Temporal Knowledge Distillation For On-Device Audio Classification. ArXiv (2021).
- Kwanghee Choi, Martin Kersner, Jacob Morton, Buru Chang. Temporal Knowledge Distillation For On-Device Audio Classification. ArXiv (2021).
- Lijian Gao, Qirong Mao, Jingjing Chen, Ming Dong, R. Chinnam, L. Sassatelli, Miguel Fabian Romero-Rondón, Ujjwal Sharma. Reproducibility Companion Paper: On Learning Disentangled Representation For Acoustic Event Detection. ACM Multimedia (2021).
- Léo Cances, E. Labbé, T. Pellegrini. Improving Deep-Learning-Based Semi-Supervised Audio Tagging With Mixup. ArXiv (2021).
- Léo Cances, E. Labbé, Thomas Pellegrini. Comparison Of Semi-Supervised Deep Learning Algorithms For Audio Classification. EURASIP Journal on Audio, Speech, and Music Processing (2021).
- M. Delcroix, Jorge Bennasar V'azquez, Tsubasa Ochiai, K. Kinoshita, S. Araki. Few-Shot Learning Of New Sound Classes For Target Sound Extraction. Interspeech 2021 (2021).
- M. Geravanchizadeh, Sepideh Akhtari Khosroshahi, S. Zakeri. Extraction Of Weighted Saliency Maps In Modelling Bottom-Up Auditory Attention (2021).
- M. Neumann, Ngoc Thang Vu. Investigations On Audiovisual Emotion Recognition In Noisy Conditions. 2021 IEEE Spoken Language Technology Workshop (SLT) (2021).
- Madhurananda Pahar, M. Klopper, Robin Warren, T. Niesler. Covid-19 Detection In Cough, Breath And Speech Using Deep Transfer Learning And Bottleneck Features (2021).
- Madhurananda Pahar, T. Niesler. Deep Transfer Learning Based Covid-19 Detection In Cough, Breath And Speech Using Bottleneck Features (2021).
- Marc C. Green, MarkD . Plumbley. Federated Learning With Highly Imbalanced Audio Data. ArXiv (2021).
- Michael Taenzer, S. Mimilakis, J. Abeßer. Deep Learning-Based Music Instrument Recognition: Exploring Learned Feature Representations (2021).
- Mohammad Mohammadamini, D. Matrouf, J. Bonastre, R. Serizel, Sandipana Dowerah, Denis, Jouvet. Compensate Multiple Distortions For Speaker Recognition Systems (2021).
- Motohiro Sunouchi, Masaharu Yoshioka. Proposal Of The Aesthetic Experience-Oriented Evaluation Framework For Field-Recording Sound Retrieval System: Experiments Using Acoustic Feature Signatures Based On Multiscale Fractal Dimension. IVSP (2021).
- Motohiro Sunouchi, Masaharu Yoshioka. Diversity-Robust Acoustic Feature Signatures Based On Multiscale Fractal Dimension For Similarity Search Of Environmental Sounds. IEICE Transactions on Information and Systems (2021).
- Motohiro Sunouchi, Masaharu Yoshioka. Diversity-Robust Acoustic Feature Signatures Based On Multiscale Fractal Dimension For Similarity Search Of Environmental Sounds. ArXiv (2021).
- Muddsair Sharif, Mayur Hotwani, Huseyin Seker, Gero Lückemeyer. Imobilakou: The Role Of Machine Listening To Detect Vehicle Using Sound Acoustics. ICAAI (2021).
- N. Orio, B. D. Carolis, Francesco Liotard. Locate Your Soundscape: Interacting With The Acoustic Environment. Multim. Tools Appl. (2021).
- N. Orio, B. De Carolis, Francesco Liotard. Locate Your Soundscape: Interacting With The Acoustic Environment. Multimedia tools and applications (2021).
- N. Siminski, S. Böhme, M. Herrmann. Bnst And Amygdala Activation To Threat: Effects Of Temporal Predictability And Threat Mode. Behavioural Brain Research (2021).
- N. Singh. The Sound Sketchpad: Expressively Combining Large And Diverse Audio Collections. IUI (2021).
- Nicolas Furnon, R. Serizel, S. Essid, I. Illina. Attention-Based Distributed Speech Enhancement For Unconstrained Microphone Arrays With Varying Number Of Nodes. ArXiv (2021).
- Pablo Zinemanas, Martín Rocamora, M. Miron, F. Font, X. Serra. An Interpretable Deep Learning Model For Automatic Sound Classification (2021).
- Pranay Manocha, Buye Xu, Anurag Kumar. Noresqa - A Framework For Speech Quality Assessment Using Non-Matching References. ArXiv (2021).
- Prateek Verma. Attention Is All You Need? Good Embeddings With Statistics Are Enough Audio Understanding Without Convolutions/Transformers/Berts/Mixers/Attention/Rnns (2021).
- Prateek Verma. Large Scale Audio Understanding Without Transformers/ Convolutions/ Berts/ Mixers/ Attention/ Rnns Or. ArXiv (2021).
- Prateek Verma. Large Scale Audio Understanding Without Transformers/ Convolutions/ Berts/ Mixers/ Attention/ Rnns Or. ArXiv (2021).
- Prateek Verma, J. Berger. Audio Transformers: Transformer Architectures For Large Scale Audio Understanding. Adieu Convolutions. ArXiv (2021).
- Qichen Han, Weiqiang Yuan, Dong Liu, X. Li, Zhen Yang. Automated Audio Captioning With Weakly Supervised Pre-Training And Word Selection Methods. DCASE (2021).
- Qiuying Shi, Jiqing Han. Semantic Feature Extraction Based On Subspace Learning With Temporal Constraints For Acoustic Event Recognition. Digit. Signal Process. (2021).
- Renbo Tu, M. Khodak, Nicholas Roberts, Ameet S. Talwalkar. Nas-Bench-360: Benchmarking Diverse Tasks For Neural Architecture Search. ArXiv (2021).
- Renbo Tu, Nicholas Roberts, M. Khodak, Jun Shen, Frederic Sala, Ameet S. Talwalkar. Nas-Bench-360: Benchmarking Neural Architecture Search On Diverse Tasks (2021).
- Renbo Tu, Nicholas Roberts, M. Khodak, Jun Shen, Frederic Sala, Ameet S. Talwalkar. Nas-Bench-360: Benchmarking Neural Architecture Search On Diverse Tasks (2021).
- Ria Sinha. Digital Assistant For Sound Classification Using Spectral Fingerprinting. International Journal for Research in Applied Science and Engineering Technology (2021).
- Rishabh Garg, Ruohan Gao, Kristen Grauman. Geometry-Aware Multi-Task Learning For Binaural Audio Generation From Video (2021).
- Robert Müller, Steffen Illium, C. Linnhoff-Popien. A Deep And Recurrent Architecture For Primate Vocalization Classification. Interspeech (2021).
- S. Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang. Personalized Speech Enhancement: New Models And Comprehensive Evaluation. ArXiv (2021).
- S. Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, T. Yoshioka. Human Listening And Live Captioning: Multi-Task Training For Speech Enhancement. Interspeech 2021 (2021).
- S. Graetzer, Jon Barker, T. Cox, M. Akeroyd, J. Culling, G. Naylor, Eszter Porter, Rhoddy Viveros Muñoz. Clarity-2021 Challenges: Machine Learning Challenges For Advancing Hearing Aid Processing. Interspeech 2021 (2021).
- Sangwoo Park, David K. Han, Mounya Elhilali. Cross-Referencing Self-Training Network For Sound Event Detection In Audio Mixtures. ArXiv (2021).
- Sarthak Yadav, M. Foster. Gise-51: A Scalable Isolated Sound Events Dataset. ArXiv (2021).
- Sean Perry, Vaibhav Tiwari, Nishant Balaji, Erika Joun, Jacob Ayers, M. Tobler, Ian Ingram, Ryan Kastner, C. Schurgers. Pyrenote: A Web-Based, Manual Annotation Tool For Passive Acoustic Monitoring. 2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS) (2021).
- Seokjin Lee, Minhan Kim, S. Shin, Sooyoung Park, Youngho Jeong. Data-Dependent Feature Extraction Method Based On Non-Negative Matrix Factorization For Weakly Supervised Domestic Sound Event Detection. Applied Sciences (2021).
- Siddharth Gururani, Alexander Lerch. Semi-Supervised Audio Classification With Partially Labeled Data. 2021 IEEE International Symposium on Multimedia (ISM) (2021).
- Sreyan Ghosh, Ashish Seth, S. Umesh. Decorrelating Feature Spaces For Learning General-Purpose Audio Representations. IEEE Journal of Selected Topics in Signal Processing (2021).
- Steven M. Goodman, Ping Liu, Emma J. McDonnell, Jon Froehlich, Steven M. Goodman, Ping Liu, Dhruv Jain, Emma J. McDonnell, Jon Froehlich. Toward User-Driven Sound Recognizer Personalization With People Who Are D/Deaf Or Hard Of Hearing. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. (2021).
- Tiago B. Lacerda, Péricles B. C. Miranda, André Câmara, Ana Paula C. Furtado. Deep Learning And Mel-Spectrograms For Physica Violence Detection In Audio. Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2021) (2021).
- Tony Liu, A. Amirsoleimani, Jianxiong Xu, F. Alibart, Y. Beilliard, S. Ecoffey, Dominique Drouin, R. Genov. Codex: Stochastic Encoding Method To Relax Resistive Crossbar Accelerator Design Requirements. IEEE Transactions on Circuits and Systems II: Express Briefs (2021).
- Turab Iqbal, Yin Cao, A. Bailey, MarkD . Plumbley, Wenwu Wang. Arca23K: An Audio Dataset For Investigating Open-Set Label Noise. DCASE (2021).
- Turab Iqbal, Yin Cao, Andrew Bailey, MarkD . Plumbley, Wenwu Wang. Arca23K: An Audio Dataset For Investigating Open-Set Label Noise. ArXiv (2021).
- Valeria Mordoh, Y. Zigel. Audio Source Separation To Reduce Sleeping Partner Sounds: A Simulation Study. Physiological measurement (2021).
- Vasileios Tsouvalas, Aaqib Saeed, T. Ozcelebi. Federated Self-Training For Semi-Supervised Audio Recognition. ArXiv (2021).
- Vasileios Tsouvalas, Aaqib Saeed, T. Ozcelebi. Federated Self-Training For Semi-Supervised Audio Recognition. ACM Transactions on Embedded Computing Systems (2021).
- W. Kleijn, Andrew Storus, M. Chinen, T. Denton, Felicia S. C. Lim, Alejandro Luebs, J. Skoglund, Hengchin Yeh. Generative Speech Coding With Predictive Variance Regularization. ArXiv (2021).
- Wookey Lee, Jessica Jiwon Seong, Busra Ozlu, B. Shim, Azizbek Marakhimov, Suan Lee. Biosignal Sensors And Deep Learning-Based Speech Recognition: A Review. Sensors (2021).
- Xubo Liu, Turab Iqbal, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang. Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning. 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP) (2021).
- Y. Campos-Roca. Multidisciplinary Project-Based Learning: Improving Student Motivation For Learning Signal Processing. IEEE Signal Processing Magazine (2021).
- Yanling Li, Jun-yi Cai, Qidi Dong, Linjia Wu, Qibing Chen. Psychophysiological Responses Of Young People To Soundscapes In Actual Rural And City Environments. Journal of the Audio Engineering Society (2021).
- Yanpeng Zhao, Jack Hessel, Youngjae Yu, Ximing Lu, Rowan Zellers, Yejin Choi. Connecting The Dots Between Audio And Text Without Parallel Data Through Visual Knowledge Transfer. ArXiv (2021).
- Yanpeng Zhao, Jack Hessel, Youngjae Yu, Ximing Lu, Rowan Zellers, Yejin Choi. Connecting The Dots Between Audio And Text Without Parallel Data Through Visual Knowledge Transfer. ArXiv (2021).
- Yasha Iravantchi, Karan Ahuja, Mayank Goel, Chris Harrison, A. Sample. Privacymic: Utilizing Inaudible Frequencies For Privacy Preserving Daily Activity Recognition. CHI (2021).
- Yu Wang, Nicholas J. Bryan, J. Salamon, M. Cartwright, J. Bello. Who Calls The Shots? Rethinking Few-Shot Learning For Audio. ArXiv (2021).
- Yuan Gong, Yu-An Chung, James R. Glass. Psla: Improving Audio Tagging With Pretraining, Sampling, Labeling, And Aggregation. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2021).
- Yui Sudo, Katsutoshi Itoyama, Kenji Nishida, K. Nakadai. Multichannel Environmental Sound Segmentation. Appl. Intell. (2021).
- Z. Mnasri, S. Rovetta, F. Masulli. Anomalous Sound Event Detection: A Survey Of Machine Learning Based Methods And Applications. Multimedia Tools and Applications (2021).
- Zhong-Qiu Wang, G. Wichern, Jonathan Le Roux. Leveraging Low-Distortion Target Estimates For Improved Speech Enhancement. ArXiv (2021).
- Ziqiang Shi, Liu Liu, Huibin Lin, R. Liu. Hodge And Podge: Hybrid Supervised Sound Event Detection With Multi-Hot Mixmatch And Composition Consistence Training. 2020 28th European Signal Processing Conference (EUSIPCO) (2021).
- Ziyang Chen, Xixi Hu, Andrew Owens. Structure From Silence: Learning Scene Structure From Ambient Sound. ArXiv (2021).
2020 (101)
- A. Correya, D. Bogdanov, Luis Joglar-Ongay, X. Serra. Essentia.Js: A Javascript Library For Music And Audio Analysis On The Web. ISMIR (2020).
- Abdulaziz Saleh Ba Wazir, H. A. Karim, Mohd Haris Lye Abdullah, Sarina Mansor, Nouar AlDahoul, M. Fauzi, John See. Spectrogram-Based Classification Of Spoken Foul Language Using Deep Cnn. 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP) (2020).
- Alessandro Ragano, Emmanouil Benetos, A. Hines. Audio Impairment Recognition Using A Correlation-Based Feature Representation. 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX) (2020).
- Alessandro Ragano, Emmanouil Benetos, Andrew Hines. Audio Impairment Recognition Using A Correlation-Based Feature Representation. 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX) (2020).
- Ambika P. Mishra, N. S. Harper, Jan W. H. Schnupp. Exploring The Distribution Of Statistical Feature Parameters For Natural Sound Textures (2020).
- Andreas Hüwel, K. Adiloglu, Jörg-Hendrik Bach. Hearing Aid Research Data Set For Acoustic Environment Recognition. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020).
- Andrey Guzhov, Federico Raue, J. Hees, Andreas Dengel. Esresnet: Environmental Sound Classification Based On Visual Domain Models. ArXiv (2020).
- Ant'onio Ramires, F. Font, D. Bogdanov, Jordan B. L. Smith, Yi-Hsuan Yang, Joann Ching, B. Chen, Yueh-Kao Wu, Hsu Wei-Han, X. Serra. The Freesound Loop Dataset And Annotation Tool. ArXiv (2020).
- Ant'onio Ramires, Gilberto Bernardes, M. Davies, X. Serra. Tiv.Lib: An Open-Source Library For The Tonal Description Of Musical Audio. ArXiv (2020).
- Ant'onio Ramires, Pritish Chandna, Xavier Favory, E. Gómez, X. Serra. Neural Percussive Synthesis Parameterised By High-Level Timbral Features. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020).
- António Ramires, F. Font, D. Bogdanov, Jordan B. L. Smith, Yi-Hsuan Yang, Joann Ching, Bo-Yu Chen, Yueh-Kao Wu, Hsu Wei-Han, X. Serra. The Freesound Loop Dataset And Annotation Tool. ISMIR (2020).
- Beat Gfeller, Dominik Roblek, M. Tagliasacchi. One-Shot Conditional Audio Filtering Of Arbitrary Sounds. (2020).
- Beat Gfeller, Dominik Roblek, M. Tagliasacchi. One-Shot Conditional Audio Filtering Of Arbitrary Sounds. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020).
- Bowei Hou, Kacper Radzikowski, A. Farid. Fine-Tuning Using Grid Search & Gradient Visualization Technical Report (2020).
- C. Asplund, Takashi Obana, P. Bhatnagar, Xun Quan Koh, Simon T. Perrault. It’S All In The Timing. ACM Trans. Comput. Hum. Interact. (2020).
- Charles Bales, C. John, Hasan Farooq, Usama Masood, Muhammad Nabeel, A. Imran. Can Machine Learning Be Used To Recognize And Diagnose Coughs?. 2020 International Conference on e-Health and Bioengineering (EHB) (2020).
- Charles Bales, Charles N. John, H. Farooq, Usama Masood, M. Nabeel, A. Imran. Can Machine Learning Be Used To Recognize And Diagnose Coughs?. 2020 International Conference on e-Health and Bioengineering (EHB) (2020).
- Chung-il Kim, Yongjang Cho, Seung-Won Jung, Jehyeok Rew, Eenjun Hwang. Animal Sounds Classification Scheme Based On Multi-Feature Network With Mixed Datasets. KSII Transactions on Internet and Information Systems (2020).
- D. Elliott, Evan Martino, C. Otero, Anthony O. Smith, A. Peter, Benjamin Luchterhand, Eric Lam, S. Leung. Cyber-Physical Analytics: Environmental Sound Classification At The Edge. 2020 IEEE 6th World Forum on Internet of Things (WF-IoT) (2020).
- D. Liang, Wenting Song, E. Thomaz. Characterizing The Effect Of Audio Degradation On Privacy Perception And Inference Performance In Audio-Based Human Activity Recognition. MobileHCI (2020).
- Daiki Takeuchi, Y. Koizumi, Y. Ohishi, N. Harada, Kunio Kashino. Effects Of Word-Frequency Based Pre- And Post- Processings For Audio Captioning. ArXiv (2020).
- Danula Hettiachchi, Zhanna Sarsenbayeva, F. Allison, N. V. Berkel, Tilman Dingler, Gabriele Marini, V. Kostakos, J. Gonçalves. 'Hi! I Am The Crowd Tasker' Crowdsourcing Through Digital Voice Assistants. CHI (2020).
- Dhruv Jain, Hung Q. Ngo, P. Patel, Steven Goodman, Leah Findlater, Jon Froehlich. Soundwatch: Exploring Smartwatch-Based Deep Learning Approaches To Support Sound Awareness For Deaf And Hard Of Hearing Users. ASSETS (2020).
- Dhruv Jain, Kelly Mack, Akli Amrous, Matt Wright, S. Goodman, Leah Findlater, Jon Froehlich. Homesound: An Iterative Field Deployment Of An In-Home Sound Awareness System For Deaf Or Hard Of Hearing Users. CHI (2020).
- E. Fonseca, Diego Ortego, K. McGuinness, N. O'Connor, X. Serra. Unsupervised Contrastive Learning Of Sound Event Representations. ArXiv (2020).
- E. Fonseca, Shawn Hershey, M. Plakal, D. Ellis, A. Jansen, R. C. Moore. Addressing Missing Labels In Large-Scale Sound Event Recognition Using A Teacher-Student Framework With Loss Masking. IEEE Signal Processing Letters (2020).
- E. Fonseca, Xavier Favory, J. Pons, F. Font, X. Serra. Fsd50K: An Open Dataset Of Human-Labeled Sound Events. ArXiv (2020).
- Eduardo Fonseca, Shawn Hershey, M. Plakal, D. Ellis, A. Jansen, R. C. Moore. Addressing Missing Labels In Large-Scale Sound Event Recognition Using A Teacher-Student Framework With Loss Masking. IEEE Signal Processing Letters (2020).
- Eduardo Fonseca, Xavier Favory, Jordi Pons, F. Font, X. Serra. Fsd50K: An Open Dataset Of Human-Labeled Sound Events. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2020).
- Etienne Richan, J. Rouat. A Proposal And Evaluation Of New Timbre Visualization Methods For Audio Sample Browsers. Personal and Ubiquitous Computing (2020).
- Etienne Richan, Jean Rouat. A Proposal And Evaluation Of New Timbre Visualization Methods For Audio Sample Browsers. Personal and Ubiquitous Computing (2020).
- F. Naccari, I. Guarneri, S. Curti, A. Savi. Embedded Acoustic Scene Classification For Low Power Microcontroller Devices. DCASE (2020).
- Fei Jia, Somshubra Majumdar, B. Ginsburg. Marblenet: Deep 1D Time-Channel Separable Convolutional Neural Network For Voice Activity Detection. ArXiv (2020).
- Felicia Lim, W. Kleijn, M. Chinen, J. Skoglund. Robust Low Rate Speech Coding Based On Cloned Networks And Wavenet. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020).
- Francisco Bernardo. Interactive Machine Learning For User-Innovation Toolkits : An Action Design Research Approach (2020).
- G. Lavrentyeva, M. Volkova, A. Avdeeva, S. Novoselov, Artem Gorlanov, Tseren Andzhukaev, A. Ivanov, A. Kozlov. Blind Speech Signal Quality Estimation For Speaker Verification Systems. INTERSPEECH (2020).
- Gabriel Meseguer-Brocal, Alice Cohen-Hadria, Geoffroy Peeters. Creating Dali, A Large Dataset Of Synchronized Audio, Lyrics, And Notes. Trans. Int. Soc. Music. Inf. Retr. (2020).
- H. Xie, T. Virtanen. Zero-Shot Audio Classification Via Semantic Embeddings. (2020).
- Hitham Jleed, M. Bouchard. Open Set Audio Recognition For Multi-Class Classification With Rejection. IEEE Access (2020).
- Honglie Chen, Weidi Xie, A. Vedaldi, Andrew Zisserman. Vggsound: A Large-Scale Audio-Visual Dataset. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020).
- Huang Xie, Tuomas Virtanen. Zero-Shot Audio Classification Via Semantic Embeddings. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2020).
- Hyeong-Seok Choi, Hye-Seong Heo, J. H. Lee, K. Lee. Phase-Aware Single-Stage Speech Denoising And Dereverberation With U-Net. ArXiv (2020).
- Ivo Trowitzsch. Robust Sound Event Detection In Binaural Computational Auditory Scene Analysis (2020).
- J. Balam, Jocelyn Huang, V. Lavrukhin, Slyne Deng, Somshubra Majumdar, B. Ginsburg. Improving Noise Robustness Of An End-To-End Neural Model For Automatic Speech Recognition (2020).
- Jae-Bin Kim, Seongkyu Mun, Myungwoo Oh, Soyeon Choe, Yong-Hyeok Lee, Hyung-Min Park. Overcoming Label Noise In Audio Event Detection Using Sequential Labeling. ArXiv (2020).
- Jiale Yang, Ying Zhang, Yang Hai. Retrieval And Management System For Layer Sound Effect Library (2020).
- Jin Sean Lim. Ensemble Learning Of High Dimension Datasets (2020).
- Jinta Zheng, Shih-Hsuan Hung, Kyle Hiebel, Y. Zhang. Real-Time Rendering Of Decorative Sound Textures For Soundscapes. ACM Trans. Graph. (2020).
- Joann Ching, Ant'onio Ramires, Y. Yang. Instrument Role Classification: Auto-Tagging For Loop Based Music (2020).
- Joseph P. Turian, M. Henry. I'M Sorry For Your Loss: Spectrally-Based Audio Distances Are Bad At Pitch. ArXiv (2020).
- João Pedro Duarte Galileu. Urban Sound Event Classification For Audio-Based Surveillance Systems (2020).
- K. He, Yu-Han Shen, W. Zhang, J. Liu. Staged Training Strategy And Multi-Activation For Audio Tagging With Noisy And Sparse Multi-Label Data. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020).
- K. Miyazaki, Tatsuya Komatsu, T. Hayashi, Shinji Watanabe, T. Toda, K. Takeda. Weakly-Supervised Sound Event Detection With Self-Attention. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020).
- K. Prinz, A. Flexer. End-To-End Adversarial White Box Attacks On Music Instrument Classification. ArXiv (2020).
- K. Prinz, A. Flexer, G. Widmer. The Impact Of Label Noise On A Music Tagger. ArXiv (2020).
- Kohki Mametani, Xavier Favory, Co-Supervisor Frederic Font. Learning Sound Representations Using Triplet-Loss (2020).
- Konstantinos Drossos, Samuel Lipping, T. Virtanen. Clotho: An Audio Captioning Dataset. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020).
- L. Delphin-Poulat, R. Nicol, Cyril Plapous, Katell Peron. Comparative Assessment Of Data Augmentation For Semi-Supervised Polyphonic Sound Event Detection. 2020 27th Conference of Open Innovations Association (FRUCT) (2020).
- L. Gao, Kele Xu, H. Wang, Yu-xing Peng. Multi-Representation Knowledge Distillation For Audio Classification. ArXiv (2020).
- L. Turchet. Cloud-Smart Musical Instrument Interactions: Querying A Large Music Collection With A Smart Guitar (2020).
- L. Turchet, G. Fazekas, M. Lagrange, H. S. Ghadikolaei, C. Fischione. The Internet Of Audio Things: State Of The Art, Vision, And Challenges. IEEE Internet of Things Journal (2020).
- L. Turchet, Jhonny Hueller. Promoting Awareness On Sustainable Behavior Through An Ar-Based Art Gallery. AVR (2020).
- L. Wijayasingha, J. Stankovic. Robustness To Noise For Speech Emotion Classification Using Cnns And Attention Mechanisms (2020).
- L. Zhang, Ziqiang Shi, Jiqing Han. Pyramidal Temporal Pooling With Discriminative Mapping For Audio Classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2020).
- Lu Cao, Yu-long Chen, Dandan Huang, Y. Zhang. Investigating Rich Feature Sources For Conceptual Representation Encoding. COGALEX (2020).
- Luca Turchet, Alex Zanetti. Voice-Based Interface For Accessible Soundscape Composition: Composing Soundscapes By Vocally Querying Online Sounds Repositories. Audio Mostly Conference (2020).
- Luca Turchet, J. Pauwels, C. Fischione, György Fazekas. Cloud-Smart Musical Instrument Interactions. ACM Trans. Internet Things (2020).
- M. Tagliasacchi, Y. Li, Karolis Misiunas, Dominik Roblek. Seanet: A Multi-Modal Speech Enhancement Network. INTERSPEECH (2020).
- M. Tagliasacchi, Yunpeng Li, Karolis Misiunas, Dominik Roblek. Seanet: A Multi-Modal Speech Enhancement Network. INTERSPEECH (2020).
- Michael Wand, Jiirgen Schmidhuber. Fusion Architectures For Word-Based Audiovisual Speech Recognition. INTERSPEECH (2020).
- Michela Cantarini, L. Serafini, L. Gabrielli, E. Principi, S. Squartini. Emergency Siren Recognition In Urban Scenarios: Synthetic Dataset And Deep Learning Models. ICIC (2020).
- Nicolas Furnon, Romain Serizel, I. Illina, S. Essid. Dnn-Based Mask Estimation For Distributed Speech Enhancement In Spatially Unconstrained Microphone Arrays (2020).
- Nicolas Turpault, Romain Serizel. Training Sound Event Detection On A Heterogeneous Dataset. ArXiv (2020).
- Nicolas Turpault, Romain Serizel, E. Vincent. Limitations Of Weak Labels For Embedding And Tagging. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020).
- Nicolas Turpault, Romain Serizel, Scott T. Wisdom, H. Erdogan, J. Hershey, E. Fonseca, P. Seetharaman, Justin Salamon. Sound Event Detection And Separation: A Benchmark On Desed Synthetic Soundscapes. ArXiv (2020).
- Nicolas Turpault, S. Wisdom, H. Erdogan, J. Hershey, Romain Serizel, E. Fonseca, P. Seetharaman, Justin Salamon. Improving Sound Event Detection In Domestic Environments Using Sound Separation. ArXiv (2020).
- R. Guo, Y. Yang, Johnson Kuang, X. Bin, Dhruv Jain, Steven Goodman, Leah Findlater, Jon Froehlich. Holosound: Combining Speech And Sound Identification For Deaf Or Hard Of Hearing Users On A Head-Mounted Display. ASSETS (2020).
- Romain Serizel, Nicolas Turpault, Ankit Shah, Justin Salamon. Sound Event Detection In Synthetic Domestic Environments. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020).
- S. Barbosa, P. Chen, Alfredo Cuzzocrea, Xiaoyong Du, Orhun Kara, Ting Liu, K. Sivalingam, D. Slezak, T. Washio, Xiaokang Yang, J. Yuan, R. Prates, S. Bernardi, V. Vittorini, Francesco Flammini, R. Nardone, S. Marrone, R. Adler, Daniel Schneider, P. Schleiss, Nicola Nostro, R. Olsen, Amleto Di Salle, P. Masci. Dependable Computing - Edcc 2020 Workshops: Ai4Rails, Dreams, Dsogri, Serene 2020, Munich, Germany, September 7, 2020, Proceedings. EDCC Workshops (2020).
- S. Deshmukh, B. Raj, R. Singh. Multi-Task Learning For Interpretable Weakly Labelled Sound Event Detection. ArXiv (2020).
- S. Veena, M. Nerisai, J. Remya, S. SaiTejah.. Challenges And Issues Of Sound Archives For Environmental Sound Classification (2020).
- S. Wisdom, Efthymios Tzinis, H. Erdogan, Ron J. Weiss, K. Wilson, J. Hershey. Unsupervised Sound Separation Using Mixtures Of Mixtures. ArXiv (2020).
- S. Wisdom, Efthymios Tzinis, H. Erdogan, Ron J. Weiss, K. Wilson, J. Hershey. Unsupervised Sound Separation Using Mixture Invariant Training. NeurIPS (2020).
- S. Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron J. Weiss, K. Wilson, J. Hershey. Unsupervised Sound Separation Using Mixture Invariant Training. NeurIPS (2020).
- S. Yoon, Min-Sung Koh, Ha-Jin Yu. Fuzzy Restricted Boltzmann Machine Based Probabilistic Linear Discriminant Analysis For Noise-Robust Text-Dependent Speaker Verification On Short Utterances (2020).
- Sangwook Park, Ashwin Bellur, Sandeep Reddy Kothinti, Masoumeh Heidari Kapourchali, M. Elhilali. Joint Acoustic And Supervised Inference For Sound Event Detection Technical Report (2020).
- Scott T. Wisdom, H. Erdogan, D. Ellis, Romain Serizel, Nicolas Turpault, E. Fonseca, Justin Salamon, P. Seetharaman, J. Hershey. What'S All The Fuss About Free Universal Sound Separation Data?. ArXiv (2020).
- Somshubra Majumdar, B. Ginsburg. Matchboxnet: 1D Time-Channel Separable Convolutional Neural Network Architecture For Speech Commands Recognition. INTERSPEECH (2020).
- Somshubra Majumdar, Boris Ginsburg. Matchboxnet: 1D Time-Channel Separable Convolutional Neural Network Architecture For Speech Commands Recognition. INTERSPEECH (2020).
- T. Iqbal, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang. Learning With Out-Of-Distribution Data For Audio Classification. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020).
- Theodoros Psallidas, Alexander Mitsou, George Pikramenos, E. Spyrou, Theodore Giannakopoulos. Archeo: A Dataset For Sound Event Detection In Areas Of Touristic Interest. 2020 15th International Workshop on Semantic and Social Media Adaptation and Personalization (SMA (2020).
- Tom Denton, Alejandro Luebs, Felicia S. C. Lim, Andrew Storus, Hengchin Yeh, W. Kleijn, J. Skoglund. Handling Background Noise In Neural Speech Generation. 2020 54th Asilomar Conference on Signals, Systems, and Computers (2020).
- Tom Mudd, - KatieWilkie, Mckenna, A. Mcpherson, M. Wanderley. Embodied Musical Interaction Body Physiology, Cross Modality, And Sonic Experience (2020).
- Tony Marteau, Sitou Afanou, D. Sodoyer, Sébastien Ambellouis, F. Elbahhar. Audio Events Detection In Noisy Embedded Railway Environments. EDCC Workshops (2020).
- Xavier Favory, F. Font, X. Serra. Search Result Clustering In Collaborative Sound Collections. ICMR (2020).
- Xavier Favory, Konstantinos Drossos, T. Virtanen, X. Serra. Learning Contextual Tag Embeddings For Cross-Modal Alignment Of Audio And Tags. ArXiv (2020).
- Xavier Favory, Konstantinos Drossos, T. Virtanen, X. Serra. Coala: Co-Aligned Autoencoders For Learning Semantically Enriched Audio Representations. ArXiv (2020).
- Y. Koizumi, Ryo Masumura, Kyosuke Nishida, M. Yasuda, S. Saito. A Transformer-Based Audio Captioning Model With Keyword Estimation. INTERSPEECH (2020).
- You-Siang Chen, Zi Jie Lin, Shang-En Li, Chih-Yuan Koh, M. R. Bai, Jen-Tzung Chien, Yi-Wen Liu. Combined Sound Event Detection And Sound Event Separation Networks For Dcase 2020 Task 4 Technical Report (2020).
- Yui Sudo, Katsutoshi Itoyama, Kenji Nishida, K. Nakadai. Multichannel Environmental Sound Segmentation. Applied Intelligence (2020).
- Yuma Koizumi, Ryo Masumura, Kyosuke Nishida, Masahiro Yasuda, S. Saito. A Transformer-Based Audio Captioning Model With Keyword Estimation. INTERSPEECH (2020).
2019 (71)
- . Development Of Algorithms For Gunshot Detection (2019).
- A. Kumar, Ankit Shah, A. Hauptmann, B. Raj. Learning Sound Events From Webly Labeled Data. IJCAI (2019).
- A. Salekin, Shabnam Ghaffarzadegan, Zhe Feng, J. Stankovic. A Real-Time Audio Monitoring Framework With Limited Data For Constrained Devices. 2019 15th International Conference on Distributed Computing in Sensor Systems (DCOSS) (2019).
- A. Tanaka. Embodied Musical Interaction - Body Physiology, Cross Modality, And Sonic Experience. New Directions in Music and Human-Computer Interaction (2019).
- Ant'onio Ramires, X. Serra. Data Augmentation For Instrument Classification Robust To Audio Effects. ArXiv (2019).
- António Ramires, Pritish Chandna, Xavier Favory, Emilia G'omez, X. Serra. Neural Percussive Synthesis Parameterised By High-Level Timbral Features. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019).
- Ariane Stolfi, A. Milo, M. Barthet. Playsound.Space: Improvising In The Browser With Semantic Sound Objects (2019).
- B. Elizalde, Shuayb Zarar, B. Raj. Cross Modal Audio Search And Retrieval With Joint Embeddings Based On Text And Audio. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019).
- B. H. D. Koh, W. L. Woo. Multi-View Temporal Ensemble For Classification Of Non-Stationary Signals. IEEE Access (2019).
- B. McFee, J. Kim, M. Cartwright, Justin Salamon, Rachel M. Bittner, J. Bello. Open-Source Practices For Music Signal Processing Research: Recommendations For Transparent, Sustainable, And Reproducible Audio Research. IEEE Signal Processing Magazine (2019).
- B. Silva, Axel W. Happi, An Braeken, A. Touhafi. Evaluation Of Classical Machine Learning Techniques Towards Urban Sound Recognitionon Embedded Systems. Applied Sciences (2019).
- B. Zhu, Kele Xu, D. Wang, Mathurin Aché. Detection And Classification Of Acoustic Scenes And Events 2019 Challenge Multi-Label Audio Tagging With Noisy Labels And Variable Length Technical Report (2019).
- Boyang Zhang Jared Leitner, Samuel Thornton. Audio Recognition Using Mel Spectrograms And Convolution Neural Networks (2019).
- C. Kim, Byeongchang Kim, Hyunmin Lee, Gunhee Kim. Audiocaps: Generating Captions For Audios In The Wild. NAACL (2019).
- Ceren Can. Automatic Discrimination Of Domestic Cat Sounds And Imitations (2019).
- Chenliang Xu. Preprint-Work In Progress (2019).
- D. Liang, E. Thomaz. Audio-Based Activities Of Daily Living (Adl) Recognition With Large-Scale Acoustic Embeddings From Online Videos. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. (2019).
- Dimitra Emmanouilidou, H. Gamper. The Effect Of Room Acoustics On Audio Event Classification (2019).
- E. Fonseca, F. Font, Xavier Serra. Model-Agnostic Approaches To Handling Noisy Labels When Training Sound Event Classifiers. 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2019).
- E. Fonseca, M. Plakal, D. Ellis, F. Font, Xavier Favory, X. Serra. Learning Sound Event Classifiers From Web Audio With Noisy Labels. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019).
- E. Fonseca, M. Plakal, F. Font, D. Ellis, X. Serra. Audio Tagging With Noisy Labels And Minimal Supervision. ArXiv (2019).
- Eero-Pekka Damskägg, Lauri Juvela, Etienne Thuillier, V. Välimäki. Deep Learning For Tube Amplifier Emulation. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019).
- Etienne Richan, J. Rouat. A Study Comparing Shape, Colour And Texture As Visual Labels In Audio Sample Browsers. Audio Mostly Conference (2019).
- Evren Kanalici, Gokhan Bilgin. Scattering Wavelet Hash Fingerprints For Musical Audio Recognition (2019).
- F. J. M. Ortega, Sergio I. Giraldo, A. Pérez, R. Ramírez. Phrase-Level Modeling Of Expression In Violin Performances. Front. Psychol. (2019).
- H. Koh, W. L. Woo. Multi-View Temporal Ensemble For Classification Of Non-Stationary Signals (2019).
- H. Xie, T. Virtanen. Zero-Shot Audio Classification Based On Class Label Embeddings. 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2019).
- Haikun Huang, M. Solah, Dingzeyu Li, Lap-Fai Yu. Audible Panorama: Automatic Spatial Audio Generation For Panorama Imagery. CHI (2019).
- Harishchandra Dubey, Dimitra Emmanouilidou, I. Tashev. Cure Dataset: Ladder Networks For Audio Event Classification. 2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) (2019).
- Harsh Purohit, R. Tanabe, K. Ichige, T. Endo, Y. Nikaido, Kaori Suefusa, Y. Kawaguchi. Mimii Dataset: Sound Dataset For Malfunctioning Industrial Machine Investigation And Inspection. ArXiv (2019).
- Ivo Trowitzsch, Jalil Taghia, Youssef Kashef, K. Obermayer. The Nigens General Sound Events Database. ArXiv (2019).
- J. He, Penghao Rao, B. Sun, Lejun Yu. Audio Tagging With Minimal Supervision Based On Mean Teacher For Dcase 2019 Challenge Task 2 Technical Report (2019).
- J. Pons, J. Serrà, X. Serra. Training Neural Audio Classifiers With Few Data. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019).
- J. Ramírez, M. Flores. Machine Learning For Music Genre: Multifaceted Review And Experimentation With Audioset. Journal of Intelligent Information Systems (2019).
- Jonas Margraf. Master'S Thesis: Self-Organizing Maps For Sound Corpus Organization (2019).
- K. Ahmad, N. Conci. How Deep Features Have Improved Event Recognition In Multimedia. ACM Trans. Multim. Comput. Commun. Appl. (2019).
- K. He, Yu-Han Shen, W. Zhang. Multiple Neural Networks With Ensemble Method For Audio Tagging With Noisy Labels And Minimal Supervision (2019).
- K. Prinz, A. Flexer. Weak Multi-Label Audio-Tagging With Class Noise (2019).
- K. Salo. Modular Audio Platform For Youth Engagement In A Museum Context (2019).
- Kele Xu, B. Zhu, Qiuqiang Kong, Haibo Mi, B. Ding, D. Wang, H. Wang. General Audio Tagging With Ensembling Convolutional Neural Network And Statistical Features. The Journal of the Acoustical Society of America (2019).
- Kexin He, Yuhan Shen, W. Zhang. Thuee System For Dcase 2019 Challenge Task 2 Technical Report (2019).
- L. Gao, Haibo Mi, B. Zhu, Da-wei Feng, Yicong Li, Y. Peng. An Adversarial Feature Distillation Method For Audio Classification. IEEE Access (2019).
- L. Gao, Qirong Mao, M. Dong, Y. Jing, R. Chinnam. On Learning Disentangled Representation For Acoustic Event Detection. ACM Multimedia (2019).
- L. Lin, X. Wang, Hong Liu, Yueliang Qian. Guided Learning Convolution System For Dcase 2019 Task 4. ArXiv (2019).
- Lluis Suros. Clustering Of Multiple-Event Online Sound Collections With The Codebook Approach (2019).
- Luca Turchet, M. Barthet. An Ubiquitous Smart Guitar System For Collaborative Musical Practice (2019).
- Léo Cances, T. Pellegrini, Patrice Guyot. Multi-Task Learning And Post Processing Optimization For Sound Event Detection Technical Report (2019).
- M. Cartwright, Ana Elisa Méndez Méndez, J. Cramer, Vincent Lostanlen, G. Dove, Ho-Hsiang Wu, Justin Salamon, Oded Nov, J. Bello. Sonyc Urban Sound Tagging (Sonyc-Ust): A Multilabel Dataset From An Urban Acoustic Sensor Network (2019).
- Masayuki Karasuyama, Masashi Sugiyama. Title Canonical Dependency Analysis Based On Squared-Loss Mutualinformation (2019).
- Md. Rahat-uz-Zaman, Shadmaan Hye, M. Hasan. Audio Future Block Prediction With Conditional Generative Adversarial Network. 2019 3rd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE) (2019).
- Miles Thorogood. Soundscape Generation Systems (2019).
- Miles Thorogood, Jianyu Fan, P. Pasquier. A Framework For Computer-Assisted Sound Design Systems Supported By Modelling Affective And Perceptual Properties Of Soundscape (2019).
- Nicolas Turpault, R. Serizel, Ankit Shah, Justin Salamon. Sound Event Detection In Domestic Environments With Weakly Labeled Data And Soundscape Synthesis (2019).
- Nicolas Turpault, R. Serizel, E. Vincent. Semi-Supervised Triplet Loss Based Learning Of Ambient Audio Embeddings. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019).
- O. Akiyama, J. Sato. Dcase 2019 Task 2: Multitask Learning, Semi-Supervised Learning And Model Ensemble With Noisy Data For Audio Tagging (2019).
- Qiuqiang Kong, Yin Cao, T. Iqbal, Y. Xu, W. Wang, Mark D. Plumbley. Cross-Task Learning For Audio Tagging, Sound Event Detection And Spatial Localization: Dcase 2019 Baseline Systems. ArXiv (2019).
- S. A. Shahriyar, M. Akhand, N. Siddique, T. Shimamura. Speech Enhancement Using Convolutional Denoising Autoencoder. 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE) (2019).
- S. Astapov, G. Svirskiy, A. Lavrentyev, Tatyana Prisyach, D. Popov, Dmitriy Ubskiy, Vladimir Kabarov. Acoustic Event Mixing To Multichannel Ami Data For Distant Speech Recognition And Acoustic Event Classification Benchmarking. SPECOM (2019).
- S. Singh, A. Pankajakshan, Emmanouil Benetos, Events. Audio Tagging Using A Linear Noise Modelling Layer (2019).
- Shota Ikawa, Kunio Kashino. Neural Audio Captioning Based On Conditional Sequence-To-Sequence Model (2019).
- Szu-Yu Chou, Kai-Hsiang Cheng, J. Jang, Y. Yang. Learning To Match Transient Sound Events Using Attentional Similarity For Few-Shot Sound Recognition. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019).
- Tobias Goehring, M. Keshavarzi, R. Carlyon, B. Moore. Using Recurrent Neural Networks To Improve The Perception Of Speech In Non-Stationary Noise By People With Cochlear Implants.. The Journal of the Acoustical Society of America (2019).
- W. Wang, F. Seraj, N. Meratnia, P. Havinga. Privacy-Aware Environmental Sound Classification For Indoor Human Activity Recognition. PETRA (2019).
- Wootaek Lim. Specaugment For Sound Event Detection In Domestic Environments Using Ensemble Of Convolutional Recurrent Neural Networks (2019).
- Wootaek Lim, S. Suh, Sooyoung Park, Youngho Jeong. Sound Event Detection In Domestic Environments Using Ensemble Of Convolutional Recurrent Neural Networks Technical Report (2019).
- Xavier Favory, X. Serra. Multi Web Audio Sequencer: Collaborative Music Making. ArXiv (2019).
- Yapeng Tian, Chenliang Xu, Dingzeyu Li. Deep Audio Prior. ArXiv (2019).
- Yapeng Tian, Chenliang Xu, Dingzeyu Li. Deep Audio Prior. ArXiv (2019).
- Yuma Koizumi, S. Saito, H. Uematsu, N. Harada, Keisuke Imoto. Toyadmos: A Dataset Of Miniature-Machine Operating Sounds For Anomalous Sound Detection. 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2019).
- Z. Podwinska, B. Fazenda, W. Davies. Testing Spatial Aspects Of Auditory Salience (2019).
- Ziqiang Shi, L. Liu, Huibin Lin, R. Liu, Anyan Shi. Hodgepodge: Sound Event Detection Based On Ensemble Of Semi-Supervised Learning Methods. ArXiv (2019).
2018 (37)
- Andreu Boadas Rabassedas. Study Of The Signal Properties Of Music Genres (2018).
- Aniel Rossi. Event Recognition Of Domestic Sounds Using Semi-Supervised Learning (2018).
- Anna Xambó, G. Roma, Alexander Lerch, M. Barthet, György Fazekas. Live Repurposing Of Sounds: Mir Explorations With Personal And Crowdsourced Databases. NIME (2018).
- Ariane de Souza Stolfi, Miguel Ceriani, Luca Turchet, M. Barthet. Playsound.Space: Inclusive Free Music Improvisations Using Audio Commons. NIME (2018).
- Chris Baume. Semantic Audio Tools For Radio Production (2018).
- E. Fonseca, M. Plakal, F. Font, D. Ellis, Xavier Favory, J. Pons, X. Serra. General-Purpose Tagging Of Freesound Audio With Audioset Labels: Task Description, Dataset, And Baseline. ArXiv (2018).
- F. Viola, A. Stolfi, A. Milo, Miguel Ceriani, M. Barthet, György Fazekas. Playsound.Space: Enhancing A Live Music Performance Tool With Semantic Recommendations. SAAM@ISWC (2018).
- F. Viola, Ariane Stolfi, A. Milo, Miguel Ceriani, M. Barthet, György Fazekas. Playsound.Space: Enhancing A Live Performance Tool With Semantic Recommendations (2018).
- G. Roma, Owen Green, Anna Xambó, P. Tremblay. A Javascript Library For Flexible Visualization Of Audio Descriptors (2018).
- Gabriel Meseguer-Brocal, Alice Cohen-Hadria, Geoffroy Peeters. Dali: A Large Dataset Of Synchronized Audio, Lyrics And Notes, Automatically Created Using Teacher-Student Machine Learning Paradigm. ISMIR (2018).
- Gerard Llorach, G. Grimm, Maartje M. E. Hendrikse, V. Hohmann. Towards Realistic Immersive Audiovisual Simulations For Hearing Research: Capture, Virtual Scenes And Reproduction. AVSU@MM (2018).
- Gierad Laput, K. Ahuja, Mayank Goel, C. Harrison. Ubicoustics: Plug-And-Play Acoustic Activity Recognition. UIST (2018).
- Gierad Laput, Karan Ahuja, Mayank Goel, Chris Harrison. Ubicoustics. Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (2018).
- Henry Kvinge, Elin Farnell, M. Kirby, C. Peterson. Monitoring The Shape Of Weather, Soundscapes, And Dynamical Systems: A New Statistic For Dimension-Driven Data Analysis On Large Datasets. 2018 IEEE International Conference on Big Data (Big Data) (2018).
- J. Palomaki, Olivia Rhinehart, Michael Tseng. A Case For A Range Of Acceptable Annotations. SAD/CrowdBias@HCOMP (2018).
- Kele Xu, B. Zhu, D. Wang, Yu-xing Peng, H. Wang, Lilun Zhang, B. Li. Meta Learning Based Audio Tagging (2018).
- Kevin Wilkinghoff. General-Purpose Audio Tagging By Ensembling Convolutional Neural Networks Based On Multiple Features (2018).
- L. Turchet, M. Barthet. Jamming With A Smart Mandolin And Freesound-Based Accompaniment. 2018 23rd Conference of Open Innovations Association (FRUCT) (2018).
- Linus Lexfors, Malte Johansson. Audio Representation For Environmental Sound Classification Using Convolutional Neural Networks (2018).
- M. Dorfer, G. Widmer. Training General-Purpose Audio Tagging Networks With Noisy Labels And Iterative Self-Verification (2018).
- M. Mancas, Christian Frisson, E. al., Noé Tits. Proceedings Of Enterface 2015 Workshop On Intelligent Interfaces. ArXiv (2018).
- MeMAD Deliverable. Memad Deliverable D 2 . 1 Libraries And Tools For Multimodal Content Analysis (2018).
- Michael Wand, Ngoc Thang Vu, J. Schmidhuber. Investigations On End- To-End Audiovisual Fusion. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018).
- Naoya Takahashi, Michael Gygli, L. V. Van Gool. Aenet: Learning Deep Audio Features For Video Analysis. IEEE Transactions on Multimedia (2018).
- Philip Tovstogan. Exploring Music Similarity With Acousticbrainz (2018).
- Shota Ikawa, Kunio Kashino. Acoustic Event Search With An Onomatopoeic Query: Measuring Distance Between Onomatopoeic Words And Sounds (2018).
- Sophie Skach, Anna Xambó, L. Turchet, A. Stolfi, R. Stewart, M. Barthet. Embodied Interactions With E-Textiles And The Internet Of Sounds For Performing Arts. Tangible and Embedded Interaction (2018).
- T. Iqbal, Qiuqiang Kong, Mark D. Plumbley, W. Wang. General-Purpose Audio Tagging From Noisy Labels Using Convolutional Neural Networks (2018).
- T. Malon, G. Roman-Jimenez, Patrice Guyot, S. Chambon, V. Charvillat, A. Crouzil, A. Péninou, J. Pinquier, F. Sèdes, C. Sénac. Toulouse Campus Surveillance Dataset: Scenarios, Soundtracks, Synchronized Videos With Overlapping And Disjoint Views. MMSys (2018).
- Thi Ngoc Tho Nguyen, Ngoc Khanh Nguyen, Douglas L. Jones, W. Gan. Dcase 2018 Task 2: Iterative Training, Label Smoothing, And Background Noise Normalization For Audio Event Tagging. DCASE (2018).
- Tian-Xiang Chen, Udit Gupta. Attention-Based Convolutional Neural Network For Audio Event Classification With Feature Transfer Learning (2018).
- Turab Iqbal, Qiuqiang Kong, D. Plumbley, Mark D. Plumbley. Stacked Convolutional Neural Networks For General-Purpose Audio Tagging Technical Report (2018).
- V. Subramanian, Alexander Lerch. Concert Stitch: Organization And Synchronization Of Crowd Sourced Recordings. ISMIR (2018).
- Venkatesh S. Kadandale. Musical Instrument Recognition In Multi-Instrument Audio Contexts (2018).
- Xavier Favory, E. Fonseca, F. Font, X. Serra. Facilitating The Manual Annotation Of Sounds When Using Large Taxonomies. ArXiv (2018).
- Zhicun Xu. Audio Event Classification Using Deep Learning Methods (2018).
- Zhicun Xu, P. Smit, M. Kurimo. The Aalto System Based On Fine-Tuned Audioset Features For Dcase2018 Task2 - General Purpose Audio Tagging (2018).
2017 (17)
- A. C. D. C. Junior. Mobile Technologies For Music Interaction (2017).
- A. Correya. Retrieving Ambiguous Sounds Using Perceptual Timbral Attributes In Audio Production Environments (2017).
- A. Stolfi, M. Barthet, Fábio Goródscy, A. C. D. C. Junior. Open Band: A Platform For Collective Sound Dialogues. Audio Mostly Conference (2017).
- Akito van Troyer. Score Instruments : A New Paradigm Of Musical Instruments To Guide Musical Wonderers (2017).
- Aleksandr Diment, T. Virtanen. Transfer Learning Of Weakly Labelled Audio. 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2017).
- Ashwin K. Vijayakumar, Ramakrishna Vedantam, D. Parikh. Sound-Word2Vec: Learning Word Representations Grounded In Sounds. EMNLP (2017).
- D. Hernández-Leo, Kostantinos Michos, B. Cabrero, Daniel, A. Martínez-Rodríguez, M. Muñoz, Carla Ten Ventura, K. Sharma, Manaswi Mishra, S. Bhardwaj, Adrian A Perez, Giorgos Neokleous, Pantelis Stylianides, Vibhor Bajpai, N. Delgado, Tessy Troes, Meghana Sudhindra, H. Cuesta. Phd Selection: Factors To Take Into Account (2017).
- Douwe Kiela. Deep Embodiment: Grounding Semantics In Perceptual Modalities (2017).
- Douwe Kiela, Stephen Clark. Learning Neural Audio Embeddings For Grounding Semantics In Auditory Perception. J. Artif. Intell. Res. (2017).
- E. Cherny. A Method For Automatic Whoosh Sound Description (2017).
- E. Fonseca, J. Pons, Xavier Favory, F. Font, D. Bogdanov, Andrés Ferraro, S. Oramas, A. Porter, X. Serra. Freesound Datasets: A Platform For The Creation Of Open Audio Datasets. ISMIR (2017).
- Emiel van Miltenburg. Pragmatic Descriptions Of Perceptual Stimuli. EACL (2017).
- Georgios Paraskevopoulos, Giannis Karamanolakis, E. Iosif, A. Pikrakis, A. Potamianos. Sensory-Aware Multimodal Fusion For Word Semantic Similarity Estimation (2017).
- Hernán Ordiales, Matías Lennie Bruno. Sound Recycling From Public Databases: Another Bigdata Approach To Sound Collections. Audio Mostly Conference (2017).
- M. Briani, A. Cuyt, W. Lee. Validated Exponential Analysis For Harmonic Sounds (2017).
- S. R. Park, J. Lee. A Fully Convolutional Neural Network For Speech Enhancement. INTERSPEECH (2017).
- Vincent Lostanlen. Convolutional Operators In The Time-Frequency Domain (2017).
2016 (20)
- Chris Donahue. Extensions To Convolution For Generalized Cross-Synthesis (2016).
- Chris Donahue, T. Erbe, M. Puckette. Extended Convolution Techniques For Cross-Synthesis. ICMC (2016).
- Douwe Kiela. Mmfeat: A Toolkit For Extracting Multi-Modal Features. ACL (2016).
- Elliot Creager. Musical Source Separation By Coherent Frequency Modulation Cues (2016).
- Emiel van Miltenburg, Benjamin Timmermans, Lora Aroyo. The Vu Sound Corpus: Adding More Fine-Grained Annotations To The Freesound Database. LREC (2016).
- Etto L. Salomons, P. Havinga, H. V. Leeuwen. Inferring Human Activity Recognition With Ambient Sound On Wireless Sensor Nodes. Sensors (2016).
- F. Font, T. Brookes, G. Fazekas, M. Guerber, Amaury La Burthe, David Plans, Mark D. Plumbley, Meir Shaashua, W. Wang, X. Serra. Audio Commons: Bringing Creative Commons Audio Content To The Creative Industries (2016).
- F. Font, X. Serra. Tempo Estimation For Music Loops And A Simple Confidence Measure. ISMIR (2016).
- Giannis Karamanolakis, E. Iosif, A. Zlatintsi, A. Pikrakis, A. Potamianos. Audio-Based Distributional Representations Of Meaning Using A Fusion Of Feature Encodings. INTERSPEECH (2016).
- Giuseppe Bandiera, O. Picas, Hiroshi Tokuda, Wataru Hariya, K. Oishi, X. Serra. Good-Sounds.Org: A Framework To Explore Goodness In Instrumental Sounds. ISMIR (2016).
- H. Meutzner, D. Kolossa. A Non-Speech Audio Captcha Based On Acoustic Event Detection And Classification. 2016 24th European Signal Processing Conference (EUSIPCO) (2016).
- J. R. Delgado-Contreras, J. García-Vázquez, R. Brena. Optimizing The Length Of An Environmental Audio Fingerprint For Place Classification. 2016 International Conference on Electronics, Communications and Computers (CONIELECOMP) (2016).
- J. Serrà, Josep Lluís Arcos. Particle Swarm Optimization For Time Series Motif Discovery. Knowl. Based Syst. (2016).
- Long-Van Nguyen-Dinh. Wearable Activity Recognition With Crowdsourced Annotation (2016).
- M. F. Assaneo, J. Sitt, G. Varoquaux, M. Sigman, L. Cohen, M. Trevisan. Exploring The Anatomical Encoding Of Voice With A Mathematical Model Of The Vocal System. NeuroImage (2016).
- Mark D. Plumbley, C. Kroos, J. Bello, G. Richard, D. Ellis, A. Mesaros. Proceedings Of The Detection And Classification Of Acoustic Scenes And Events 2018 Workshop (Dcase2018) (2016).
- Naoya Takahashi, Michael Gygli, B. Pfister, L. Gool. Deep Convolutional Neural Networks And Data Augmentation For Acoustic Event Recognition. INTERSPEECH (2016).
- Naoya Takahashi, Michael Gygli, B. Pfister, L. Gool. Deep Convolutional Neural Networks And Data Augmentation For Acoustic Event Detection (2016).
- S. Parekh, F. Font, X. Serra. Improving Audio Retrieval Through Loudness Profile Categorization. 2016 IEEE International Symposium on Multimedia (ISM) (2016).
- V. Goudarzi, A. Gioti. Engagement And Interaction In Participatory Sound Art (2016).
2015 (20)
- A. Lopopolo, Emiel van Miltenburg. Sound-Based Distributional Models. IWCS (2015).
- Anna Xambó. Tabletop Tangible Interfaces For Music Performance : Design And Evaluation (2015).
- C. Roberts, Matthew Wright, J. Kuchera-Morin. Music Programming In Gibber. ICMC (2015).
- Diego Castán, David Tavarez, Paula Lopez-Otero, J. Franco-Pedroso, H. Delgado, E. Navas, L. Fernández, D. Ramos-Castro, J. Serrano, A. Ortega, E. Lleida. Albayzín-2014 Evaluation: Audio Segmentation And Classification In Broadcast News Domains. EURASIP J. Audio Speech Music. Process. (2015).
- Diego Castán, David Tavarez, Paula Lopez-Otero, J. Franco-Pedroso, H. Delgado, E. Navas, Laura Docío Fernández, Daniel Ramos, J. Serrano, A. Ortega, EDUARDO LLEIDA SOLANO. Albayzín-2014 Evaluation: Audio Segmentation And Classification In Broadcast News Domains. EURASIP J. Audio Speech Music. Process. (2015).
- Douwe Kiela, Stephen Clark. Multi- And Cross-Modal Semantics Beyond Vision: Grounding In Auditory Perception. EMNLP (2015).
- F. Font. Tag Recommendation Using Folksonomy Information For Online Sound Sharing Platforms (2015).
- F. Font, J. Serrà, X. Serra. Analysis Of The Impact Of A Tag Recommendation System In A Real-World Folksonomy. TIST (2015).
- G. Roma, X. Serra. Music Performance By Discovering Community Loops (2015).
- G. Roma, X. Serra. Querying Freesound With A Microphone (2015).
- H. Nishino, R. Nakatsu. Computer Music Languages And Systems: The Synergy Between Technology And Creativity (2015).
- Jainesh Doshi, Vishrant Tripathi, O. Desai, Shreyas Mangalgi. Instrument Classification Using Spiking Neural Networks (2015).
- Karol J. Piczak. Esc: Dataset For Environmental Sound Classification. ACM Multimedia (2015).
- Niklas Klügel. Collaborative Music-Making With Interactive Tabletops (2015).
- O. Picas, H. P. Rodriguez, Dara Dabiri, Hiroshi Tokuda, Wataru Hariya, K. Oishi, X. Serra. A Real-Time System For Measuring Sound Goodness In Instrumental Sounds (2015).
- Pablo Villegas. Content-Preserving Reconstruction Of Electronic Music Sessions Using Freely Available Musical Building-Blocks (2015).
- Qingchang Zhu, Z. Chen, Y. Soh. Using Unlabeled Acoustic Data With Locality-Constrained Linear Coding For Energy-Related Activity Recognition In Buildings. 2015 IEEE International Conference on Automation Science and Engineering (CASE) (2015).
- T. Kelkar, Anon Ray, Venkatesh Choppella. Sangeetkosh: An Open Web Platform For Music Education. 2015 IEEE 15th International Conference on Advanced Learning Technologies (2015).
- V. Apopei. Detection Dangerous Events In Environmental Sounds - A Preliminary Evaluation. 2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) (2015).
- Vito Claudio Ostuni, T. D. Noia, E. D. Sciascio, S. Oramas, X. Serra. A Semantic Hybrid Approach For Sound Recommendation. WWW (2015).
2014 (10)
- C. Jacoby. Automatic Urban Sound Classification Using Feature Learning Techniques (2014).
- D. Wolff. Spot The Odd Song Out : Similarity Model Adaptation And Analysis Using Relative Human Ratings (2014).
- F. Font, J. Serrà, X. Serra. Audio Clip Classification Using Social Tags And The Effect Of Tag Expansion. Semantic Audio (2014).
- F. Font, J. Serrà, X. Serra. Class-Based Tag Recommendation And User-Based Evaluation In Online Audio Clip Sharing. Knowl. Based Syst. (2014).
- F. Font, S. Oramas, György Fazekas, X. Serra. Extending Tagging Ontologies With Domain Specific Knowledge. International Semantic Web Conference (2014).
- J. R. Delgado-Contreras, J. García-Vázquez, R. Brena, C. E. Galván-Tejada, J. I. Galván-Tejada. Feature Selection For Place Classification Through Environmental Sounds. EUSPN/ICTH (2014).
- João Paulo Cordeiro. Sound Based Social Networks (2014).
- L. Wyse. Interactive Audio Web Development Workflow. ACM Multimedia (2014).
- Ohad Fried, Zeyu Jin, Reid Oda, A. Finkelstein. Audioquilt: 2D Arrangements Of Audio Samples Using Metric Learning And Kernelized Sorting. NIME (2014).
- Patrice Guyot. Caractérisation Et Reconnaissance De Sons D'Eau Pour Le Suivi Des Activités De La Vie Quotidienne : Une Approche Fondée Sur Le Signal, L'Acoustique Et La Perception (2014).
2013 (7)
- D. Wolff, Tillman Weyde. Learning Music Similarity From Relative User Ratings. Information Retrieval (2013).
- F. Font, J. Serrà, X. Serra. Folksonomy-Based Tag Recommendation For Collaborative Tagging Systems. Int. J. Semantic Web Inf. Syst. (2013).
- Long-Van Nguyen-Dinh, U. Blanke, G. Tröster. Towards Scalable Activity Recognition: Adapting Zero-Effort Crowdsourced Acoustic Models. MUM (2013).
- Miles Thorogood, P. Pasquier. Computationally Created Soundscapes With Audio Metaphor. ICCC (2013).
- Motohiro Sunouchi, Yuzuru Tanaka. Similarity Search Of Freesound Environmental Sound Based On Their Enhanced Multiscale Fractal Dimension (2013).
- Niklas Klügel, G. Groh. Towards Mapping Timbre To Emotional Affect. NIME (2013).
- Patrice Guyot, J. Pinquier, R. André-Obrecht. Water Sound Recognition Based On Physical Models. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (2013).
2012 (10)
- Brandon Mechtley, Andreas Spanias, P. Cook. Shortest Path Techniques For Annotation And Retrieval Of Environmental Sounds. ISMIR (2012).
- F. Font, G. Roma, P. Herrera, X. Serra. Characterization Of The Freesound Online Community. 2012 3rd International Workshop on Cognitive Information Processing (CIP) (2012).
- F. Font, J. Serrà, X. Serra. Folksonomy-Based Tag Recommendation For Online Audio Clip Sharing. ISMIR (2012).
- F. Font, X. Serra. Analysis Of The Folksonomy Of Freesound (2012).
- G. Roma, Anna Xambó, P. Herrera, Robin C. Laney. Factors In Human Recognition Of Timbre Lexicons Generated By Data Clustering (2012).
- G. Roma, P. Herrera, M. Zanin, S. Marín, F. Font, X. Serra. Small World Networks And Creativity In Audio Clip Sharing. Int. J. Soc. Netw. Min. (2012).
- M. Rossi, G. Tröster, O. Amft. Recognizing Daily Life Context Using Web-Collected Audio Data. 2012 16th International Symposium on Wearable Computers (2012).
- M. Sordo, Gopala K. Koduri, Sankalp Gulati, X. Serra. A Musically Aware System For Browsing And Interacting With Audio Music Collections (2012).
- Masayuki Karasuyama, Masashi Sugiyama. Canonical Dependency Analysis Based On Squared-Loss Mutual Information. Neural Networks (2012).
- Miles Thorogood, P. Pasquier, Arne Eigenfeldt. Audio Metaphor: Audio Information Retrieval For Soundscape Composition (2012).
2011 (3)
- J. Janer, G. Roma, S. Kersten. Authoring Augmented Soundscapes With User-Contributed Content (2011).
- J. Janer, S. Kersten, Mattian Schirosa, G. Roma. An Online Platform For Interactive Soundscapes With User-Contributed Audio Content (2011).
- Nuno N. Correia. Av Clash, Online Audiovisual Project: A Case Study Of Evaluation In New Media Art. Advances in Computer Entertainment Technology (2011).
2010 (3)
- G. Roma, J. Janer, S. Kersten, Mattia Schirosa, P. Herrera, X. Serra. Ecological Acoustics Perspective For Content-Based Retrieval Of Environmental Sounds. EURASIP J. Audio Speech Music. Process. (2010).
- G. Roma, P. Herrera. Graph Grammar Representation For Collaborative Sample-Based Music Creation. Audio Mostly Conference (2010).
- G. Roma, P. Herrera. Community Structure In Audio Clip Sharing. 2010 International Conference on Intelligent Networking and Collaborative Systems (2010).
2009 (2)
- Gerard Roma Trepat, Perfecto Herrera-Boyer, X. Serra. Freesound Radio: Supporting Music Creation By Exploration Of A Sound Database (2009).
- M. Magas, Polina Proutskova. A Location-Tracking Interface For Ethnomusicological Collections (2009).