The ALIZ-E project aims at designing and developing long-term, adaptive social interaction between robots and child users (8-11 years old) in real-world settings, for which a conversational human-robot interaction system has been developed. In this context we present the auditory and visual perception components that have been specifically build for the purpose of supporting verbal and non-verbal human-robot interaction.