Papers
arxiv:2305.13303

Towards Unsupervised Recognition of Semantic Differences in Related Documents

Published on May 22, 2023
Authors:

Abstract

Recognizing semantic differences is approached as a token-level regression task using unsupervised methods with masked language models, showing correlation with gold labels but room for improvement.

Automatically highlighting words that cause semantic differences between two documents could be useful for a wide range of applications. We formulate recognizing semantic differences (RSD) as a token-level regression task and study three unsupervised approaches that rely on a masked language model. To assess the approaches, we begin with basic English sentences and gradually move to more complex, cross-lingual document pairs. Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels. However, all unsupervised approaches still leave a large margin of improvement. Code to reproduce our experiments is available at https://github.com/ZurichNLP/recognizing-semantic-differences

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2305.13303
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 1