Lightweight Cross-Lingual Sentence Representation Learning

Cross-lingual sentence representation models learn tasks like cross-lingual sentence retrieval and cross-lingual knowledge transfer without the need for training a new monolingual representation model from scratch. However, there has been little exploration of lightweight models.

Writing software code. Image credit: pxhere.com, CC0 Public Domain

A recent paper on arXiv.org introduces a lightweight dual-transformer architecture with just two layers. It significantly decreases memory consumption and accelerates the training to further improve efficiency. Two contrastive learning methods are proposed for generative tasks to compensate for the learning bottleneck of the lightweight transformer. The experiments on cross-lingual tasks like multilingual document classification confirm the ability of the suggested model to yield robust sentence representations.

Large-scale models for learning fixed-dimensional cross-lingual sentence representations like Large-scale models for learning fixed-dimensional cross-lingual sentence representations like LASER (Artetxe and Schwenk, 2019b) lead to significant improvement in performance on downstream tasks. However, further increases and modifications based on such large-scale models are usually impractical due to memory limitations. In this work, we introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations. We explore different training tasks and observe that current cross-lingual training tasks leave a lot to be desired for this shallow architecture. To ameliorate this, we propose a novel cross-lingual language model, which combines the existing single-word masked language model with the newly proposed cross-lingual token-level reconstruction task. We further augment the training task by the introduction of two computationally-lite sentence-level contrastive learning tasks to enhance the alignment of cross-lingual sentence representation space, which compensates for the learning bottleneck of the lightweight transformer for generative tasks. Our comparisons with competing models on cross-lingual sentence retrieval and multilingual document classification confirm the effectiveness of the newly proposed training tasks for a shallow model.

Research paper: Mao, Z., Gupta, P., Chu, C., Jaggi, M., and Kurohashi, S., "Lightweight Cross-Lingual Sentence Representation Learning", 2021. Link: https://arxiv.org/abs/2105.13856
 
Awesome last month