This paper proposes a spatio-temporal attentive mechanism to detect events from video sequences of natural scenes of dynamic en- vironments. More specifically, we wish to detect a visual event within a cluttered scene, without intensive training of the algorithm. In contrast to the event detection methods used in the literature, which drive atten- tion based on motion and spatial location hypothesis, in our approach the visual attention is region-driven as well as feature-driven. For this purpose a two stages attention mechanism is proposed. In a first phase spatio-temporal activity analysis extracts key frames from the image se- quence and selects salient areas within these frames. For this purpose, next to a peak detection method, we employed a change-point detection method, which exists both in a batch as well as a incremental version. Consequently, these areas are further processed to determine the most interesting active region, based on a newly defined region saliency mea- sure. The results of the proposed approach are reported using natural image sequence of a crowded train station.