About ETRO  |  News  |  Events  |  Vacancies  |  Contact  
Home Research Education Industry Publications About ETRO

Master theses

Current and past ideas and concepts for Master Theses.

Vocal Emotion Imitation


Affective computing is the research field concerning the development of systems, devices and applications that can recognize, interpret, process, and simulate human emotions. In speech, human emotions are conveyed by two information channels: Semantic information ‘what is said’ and Prosodic information ‘how it is said’. Prosodic information channel is mainly utilized for speech analysis, including recognizing the emotional state of the speaker.

On the emotional speech synthesis side, an alternative form of vocal communication has been receiving attention recently on Human-Robot Interaction (HRI) domain which are called Semantic-Free Utterances (SFUs). SFUs are composed of vocalizations (nonsense vocal utterances, gibberish speech) and sounds (beeps, squeaks) without semantic content. Concrete examples of SFUs are the vocalizations of the R2-D2 and BB-8 from the Star Wars films, and the robots WALL-E and Eve from the Disney–Pixar film WALL-E. SFUs have the advantage of not bounding to a specific language and that can still facilitate rich communication and expression.

As efforts in affective computing concentrate on enabling computers and robots to recognize human emotions and to react accordingly, in this thesis the research will focus on combining these two in a first step. Accordingly, the work will include recognizing a user’s emotion from speech and producing a vocal response in the same emotion.

Kind of work

The proposed work will consists of the following general steps:

  • Vocal emotion classification for universal big six emotions [1][2]
  • Semantic-fee emotional speech generation based on the recognized emotion [3]
  • Integrated emotion imitation system that can respond to a user in the same emotion
  • Depending on the time available the developed system can be implemented on the robot Nao/(or on a smart speaker) with a contextual scenario
  • Thesis report and presentation

Framework of the Thesis

This thesis is related to the emotional speech synthesis and recognition research track at the audiovisual signal processing group at ETRO.

[1] Ekman, Paul. "Universals and cultural differences in facial expressions of emotion." Nebraska symposium on motivation. University of Nebraska Press, 1971.
[2] Wang, F., Verhelst, W., & Sahli, H. (2011). Relevance vector machine based speech emotion recognition. In Affective computing and intelligent interaction (pp. 111-120). Springer, Berlin, Heidelberg
[3] Yilmazyildiz, S., Verhelst, W., & Sahli, H. (2015). Gibberish speech as a tool for the study of affective expressiveness for robotic agents. Multimedia Tools and Applications, 74(22), 9959-9982.

Number of Students

1 or 2

Expected Student Profile

  • Following a MSc in a field related to one or more of the following: applied computer science, computer science and electrical engineering
  • Interest in voice controlled devices and speech signal processing
  • Interest in machine learning
  • Good programming skills
  • Interest in designing and performing experiments with humans/children

Prof. Hichem Sahli

+32 (0)2 629 2916

more info


Dr. Selma Yilmazyildiz

+32 (0)2 629 2980

more info

Ir. Henk Brouckxon

+32 (0)2 629 3955

more info

- Contact person




- Contact person

- Thesis proposals

- ETRO Courses

- Contact person

- Spin-offs

- Know How

- Journals

- Conferences

- Books

- Vacancies

- News

- Events

- Press


ETRO Department

Tel: +32 2 629 29 30

©2019 • Vrije Universiteit Brussel • ETRO Dept. • Pleinlaan 2 • 1050 Brussels • Tel: +32 2 629 2930 (secretariat) • Fax: +32 2 629 2883 • WebmasterDisclaimer