United Nations (UN) member states adopted the Agenda 2030 for Sustainable Development in 2015, introducing the 17 Sustainable Development Goals (SDGs). Since then, many research articles related to these goals have been published. Universities and research institutes often track their publications related to SDGs by manually tagging them with one or more goals, which is a time-consuming and cost-intensive process. This work proposes an automated approach to tagging publications with a list of related SDGs.
The authors have evaluated their approach with a case study at the Swinburne University of Technology (Melbourne, Australia), using publications already tagged by a domain expert. They used the abstracts or the first paragraph to train two multilabel classifiers, a multilayer perceptron (MLP) and a Dropout Additive Regression Tree (DART). A pre-trained RoBERTa model was used to transform texts into numerical values. The embedding for each paper is the average of all word embeddings. The authors achieved average sensitivity scores across all SDGs of 0.71 for DART and 0.75 for MLP. They intend that their preliminary work will assist in studying trends in research relating to SDGs and in monitoring progress towards achieving SDGs.