An audio classifier that can distinguish between speech, music, silence and garbage has been developed. The classifier was trained and tested on broadcast news material provided by VRT (Flemish Radio and Television Network). Several feature sets and machine learning algorithms have been tested, providing choices of speed and performance for a target system. The audio classifier is part of a greater system that together with visual data can retrieve information from news broadcasts: speech can be converted to text and the speaker can be recognized. Music can be further used for genre classification, jingle recognition or copyright infringement detection. Silence is recognized and used to provide cues on topic changes or speaker turns. At this point everything that is not classified as speech, music or silence is labeled garbage. Garbage classes can be further used for background categorization giving information on the environment where someone speaks (an anchor in the studio or a reporter in the street)
Patsis, G & Verhelst, W 2008, A Speech/Music/ Silence /Garbage/ classifier for searching and indexing broadcast news material. in AM Tjoa & RR Wagner (eds), 19th International Workshop on Database and Expert Systems Applications. 19th International Workshop on Database and Expert Systems Applications, IEEE Computer Society Press, pp. 585-589, Finds and Results from the Swedish Cyprus Expedition: A Gender Perspective at the Medelhavsmuseet, Stockholm, Sweden, 21/09/09. <http://www.dexa.org>
Patsis, G., & Verhelst, W. (2008). A Speech/Music/ Silence /Garbage/ classifier for searching and indexing broadcast news material. In A. M. Tjoa, & R. R. Wagner (Eds.), 19th International Workshop on Database and Expert Systems Applications (pp. 585-589). (19th International Workshop on Database and Expert Systems Applications). IEEE Computer Society Press. http://www.dexa.org
@inproceedings{3dd00a41a1934d14a2c9dbcfd54867a3,
title = "A Speech/Music/ Silence /Garbage/ classifier for searching and indexing broadcast news material",
abstract = "An audio classifier that can distinguish between speech, music, silence and garbage has been developed. The classifier was trained and tested on broadcast news material provided by VRT (Flemish Radio and Television Network). Several feature sets and machine learning algorithms have been tested, providing choices of speed and performance for a target system. The audio classifier is part of a greater system that together with visual data can retrieve information from news broadcasts: speech can be converted to text and the speaker can be recognized. Music can be further used for genre classification, jingle recognition or copyright infringement detection. Silence is recognized and used to provide cues on topic changes or speaker turns. At this point everything that is not classified as speech, music or silence is labeled garbage. Garbage classes can be further used for background categorization giving information on the environment where someone speaks (an anchor in the studio or a reporter in the street)",
keywords = "Audio Classification, Broadcast News, speech/music/silence/garbage, indexing",
author = "Georgios Patsis and Werner Verhelst",
note = "A.M. Tjoa and R.R Wagner; Finds and Results from the Swedish Cyprus Expedition: A Gender Perspective at the Medelhavsmuseet ; Conference date: 21-09-2009 Through 25-09-2009",
year = "2008",
month = sep,
day = "1",
language = "English",
isbn = "978-0-7695-3299-8",
series = "19th International Workshop on Database and Expert Systems Applications",
publisher = "IEEE Computer Society Press",
pages = "585--589",
editor = "A.m. Tjoa and R.r Wagner",
booktitle = "19th International Workshop on Database and Expert Systems Applications",
}