ETRO VUB
About ETRO  |  News  |  Events  |  Vacancies  |  Contact  
Home Research Education Industry Publications About ETRO

Master theses

Current and past ideas and concepts for Master Theses.

Neural architectures for extreme multi-label text classification

Subject

Extreme multi-label text classification (XMC) is the task of finding the most relevant subset of labels for each document from an extremely large label collection, where the number of labels is up to hundreds of thousands or millions. This problem is getting more important because of the fast growing of the big data applications and the needs to extract useful information from these applications. For example, the automated clinical coding system aims to classify a given input clinical text instance into the most relevant labels from the ICD-10 label collection (over 100.000 labels). Which can help doctors to fast process the EHR data from patients. Another example is that the Amazon shopping items are usually categorized into more than one relevant category, where the categorizes can reach the number of millions. The challenge for XMC is that the output space is usually sparse, and this can lead to label sparsity issues. In this thesis, we would like to investigate neural architectures for solving this problem.

Kind of work

In this master thesis, we will develop state-of-the-art neural models on existing XMC benchmark datasets and analyze their results. This work will involve the knowledge of text processing techniques and deep learning models like LSTM and Transformers. More specifically, this work consists 3 steps: (i) investigate existing XMC benchmark datasets, (ii) reproduce baseline models on those datasets, (iii) develop new models that have better performance compared to the baseline models.

Framework of the Thesis

Chang, Wei-Cheng, et al. "Taming pretrained transformers for extreme multi-label text classification." Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020.
Liu, Jingzhou, et al. "Deep learning for extreme multi-label text classification." Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017.
You, Ronghui, et al. "Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification." arXiv preprint arXiv:1811.01727 (2018).

Number of Students

1

Expected Student Profile

Background in machine leaning, especially deep learning.
Proven programming experience(e.g., Python)
Prior experience with machine learning frameworks (e.g., PyTorch, Tensorflow)

Promotor

Prof. Dr. Ir. Nikolaos Deligiannis

+32 (0)2 629 1683

ndeligia@etrovub.be

more info

Supervisor

Mr. Xiangyu Yang

+32 (0)2 629 2930

xyanga@etrovub.be

more info

- Contact person

- IRIS

- AVSP

- LAMI

- Contact person

- Thesis proposals

- ETRO Courses

- Contact person

- Spin-offs

- Know How

- Journals

- Conferences

- Books

- Vacancies

- News

- Events

- Press

Contact

ETRO Department

Tel: +32 2 629 29 30

©2022 • Vrije Universiteit Brussel • ETRO Dept. • Pleinlaan 2 • 1050 Brussels • Tel: +32 2 629 2930 (secretariat) • Fax: +32 2 629 2883 • WebmasterDisclaimer