imec-ETRO-VUB at W-NUT 2020 Shared Task-3: A Multilabel BERT-based system for predicting COVID-19 events
Host Publication: Conference on Empirical Methods in Natural Language Processing (and forerunners) (2020)
Authors: X. Yang, G. Bekoulis and N. Deligiannis
Publisher: Association for Computational Linguistics
Publication Date: Nov. 2020
Number of Pages: 9
In this paper, we present our system designed to address the W-NUT 2020 shared task for COVIDᆧ Event Extraction from Twitter. To mitigate the noisy nature of the Twitter stream, our system makes use of the COVID-Twitter-BERT (CT-BERT), which is a language model pre-trained on a large corpus of COVIDᆧ related Twitter messages. Our system is trained on the COVIDᆧ Twitter Event Corpus and is able to identify relevant text spans that answer pre-defined questions (i.e., slot types) for five COVIDᆧ related events (i.e., TESTED POSITIVE, TESTED NEGATIVE, CAN-NOT-TEST, DEATH and CURE & PREVENTION). We have experimented with different architectures our best performing model relies on a multilabel classifier on top of the CT-BERT model that jointly trains all the slot types for a single event. Our experimental results indicate that our Multilabel-CT-BERT system outperforms the baseline methods by 7 percentage points in terms of micro average F1 score. Our model ranked as 4th in the shared task leaderboard.