TY - GEN
T1 - MUTUAL
T2 - Annual ACM Symposium on Applied Computing
AU - Katsarou, Katerina
AU - Jeney, Roxana
AU - Stefanidis, Kostas
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/3/27
Y1 - 2023/3/27
N2 - Multi-domain sentiment classification trains a classifier using multiple domains and then tests the classifier on one of the domains. Importantly, no domain is assumed to have sufficient labeled data; instead, the goal is leveraging information between domains, making multi-domain sentiment classification a very realistic scenario. Typically, labeled data is costly because humans must classify it manually. In this context, we propose the MUTUAL approach that learns general and domain-specific sentence embeddings that are also context-aware due to the attention mechanism. In this work, we propose using a stacked BiLSTM-based Autoencoder with an attention mechanism to generate the two above-mentioned types of sentence embeddings. Then, using the Jensen-Shannon (JS) distance, the general sentence embeddings of the four most similar domains to the target domain are selected. The selected general sentence embeddings and the domain-specific embeddings are concatenated and fed into a dense layer for training. Evaluation results on public datasets with 16 different domains demonstrate the efficiency of our model. In addition, we propose an active learning algorithm that first applies the elliptic envelope for outlier removal to a pool of unlabeled data that the MUTUAL model then classifies. Next, the most uncertain data points are selected to be labeled based on the least confidence metric. The experiments show higher accuracy for querying 38% of the original data than random sampling.
AB - Multi-domain sentiment classification trains a classifier using multiple domains and then tests the classifier on one of the domains. Importantly, no domain is assumed to have sufficient labeled data; instead, the goal is leveraging information between domains, making multi-domain sentiment classification a very realistic scenario. Typically, labeled data is costly because humans must classify it manually. In this context, we propose the MUTUAL approach that learns general and domain-specific sentence embeddings that are also context-aware due to the attention mechanism. In this work, we propose using a stacked BiLSTM-based Autoencoder with an attention mechanism to generate the two above-mentioned types of sentence embeddings. Then, using the Jensen-Shannon (JS) distance, the general sentence embeddings of the four most similar domains to the target domain are selected. The selected general sentence embeddings and the domain-specific embeddings are concatenated and fed into a dense layer for training. Evaluation results on public datasets with 16 different domains demonstrate the efficiency of our model. In addition, we propose an active learning algorithm that first applies the elliptic envelope for outlier removal to a pool of unlabeled data that the MUTUAL model then classifies. Next, the most uncertain data points are selected to be labeled based on the least confidence metric. The experiments show higher accuracy for querying 38% of the original data than random sampling.
KW - active learning
KW - BiLSTM
KW - jensen-shannon distance
KW - multi-domain sentiment classification
KW - self-attention
KW - sentence embeddings
KW - uncertainty sampling
U2 - 10.1145/3555776.3577765
DO - 10.1145/3555776.3577765
M3 - Conference contribution
AN - SCOPUS:85162871966
T3 - Proceedings of the ACM Symposium on Applied Computing
SP - 331
EP - 339
BT - Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, SAC 2023
PB - ACM
Y2 - 27 March 2023 through 31 March 2023
ER -