Rohan Sheelvant* Bidisha Sharma Maulik Madhavi Rohan Kumar Das S. R. M. Prasanna* Haizhou Li
Department of Electrical and Computer Engineering, National University of Singapore, Singapore
*Department of Electrical Engineering, Indian Institute of Technology Dharwad, Dharwad, India
We present the development of a new database for speech localization that we refer to as Realistic Speech Localization 2019 (RSL2019) corpus. The corpus is designed for the study of sound source localization in real-world applications. The RSL2019 corpus is a continuing effort, which presently contains 22.60 hours of speech data, recorded using a four channel microphone array, and played over a loudspeaker from different directions of arrival (DOA). We consider 180 speech utterances spoken by 6 speakers, selected from RSR2015 database, which are played over the loudspeaker positioned at different angles and distances from the microphone array. We vary the DOA from 0 to 360 degree angle at an interval of 5 degree,at 1 metre and 1.5 metre distance. From each positionand DOA, we also record white noise to study the robustness, and time stretched pulse to generate the transfer function for speech localization algorithm. Furthermore, we present the experimental results and analysis on state-of-the-art sound source localization algorithm using the open source HARK toolkit on the created RSL2019 database. This database will be provided for research purpose upon request to the authors.
If you use this database please cite the following paper :
Rohan Sheelvant, Bidisha Sharma, Maulik Madhavi, Rohan Kumar Das, S.R.M. Prasanna and Haizhou Li “RSL2019: A Realistic Speech Localization Corpus” in Proc. International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (Oriental COCOSDA), Cebu City, Philippines, October 2019, Link