In this paper, we present our system designed to address the W-NUT 2020 shared task for COVID-19 Event Extraction from Twitter. To mitigate the noisy nature of the Twitter stream, our system makes use of the COVID-Twitter-BERT (CT-BERT), which is a language model pre-trained on a large corpus of COVID-19 related Twitter messages. Our system is trained on the COVID-19 Twitter Event Corpus and is able to identify relevant text spans that answer pre-defined questions (i.e., slot types) for five COVID-19 related events (i.e., TESTED POSITIVE, TESTED NEGATIVE, CAN-NOT-TEST, DEATH and CURE & PREVENTION). We have experimented with different architectures; our best performing model relies on a multilabel classifier on top of the CT-BERT model that jointly trains all the slot types for a single event. Our experimental results indicate that our Multilabel-CT-BERT system outperforms the baseline methods by 7 percentage points in terms of micro average F1 score. Our model ranked as 4th in the shared task leaderboard.
Yang, X, Bekoulis, I & Deligiannis, N 2020, imec-ETRO-VUB at W-NUT 2020 Shared Task-3: A Multilabel BERT-based system for predicting COVID-19 events. in Conference on Empirical Methods in Natural Language Processing (and forerunners) (2020). vol. Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), Association for Computational Linguistics, pp. 505-513, 2020 The 6th Workshop on Noisy User-generated Text (W-NUT)
, 19/11/20. <https://www.aclweb.org/anthology/2020.wnut-1.77.pdf>
Yang, X., Bekoulis, I., & Deligiannis, N. (2020). imec-ETRO-VUB at W-NUT 2020 Shared Task-3: A Multilabel BERT-based system for predicting COVID-19 events. In Conference on Empirical Methods in Natural Language Processing (and forerunners) (2020) (Vol. Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp. 505-513). Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.wnut-1.77.pdf
@inproceedings{e386ab703de74d9f977a28ac845715ce,
title = "imec-ETRO-VUB at W-NUT 2020 Shared Task-3: A Multilabel BERT-based system for predicting COVID-19 events",
abstract = "In this paper, we present our system designed to address the W-NUT 2020 shared task for COVID-19 Event Extraction from Twitter. To mitigate the noisy nature of the Twitter stream, our system makes use of the COVID-Twitter-BERT (CT-BERT), which is a language model pre-trained on a large corpus of COVID-19 related Twitter messages. Our system is trained on the COVID-19 Twitter Event Corpus and is able to identify relevant text spans that answer pre-defined questions (i.e., slot types) for five COVID-19 related events (i.e., TESTED POSITIVE, TESTED NEGATIVE, CAN-NOT-TEST, DEATH and CURE & PREVENTION). We have experimented with different architectures; our best performing model relies on a multilabel classifier on top of the CT-BERT model that jointly trains all the slot types for a single event. Our experimental results indicate that our Multilabel-CT-BERT system outperforms the baseline methods by 7 percentage points in terms of micro average F1 score. Our model ranked as 4th in the shared task leaderboard.",
author = "Xiangyu Yang and Ioannis Bekoulis and Nikolaos Deligiannis",
year = "2020",
month = nov,
day = "16",
language = "English",
volume = "Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)",
pages = "505--513",
booktitle = "Conference on Empirical Methods in Natural Language Processing (and forerunners) (2020)",
publisher = "Association for Computational Linguistics",
note = "2020 The 6th Workshop on Noisy User-generated Text (W-NUT)<br/> ; Conference date: 19-11-2020",
url = "http://noisy-text.github.io/2020/",
}