摘要
Cross-lingual word alignment is the task for word translation between monolingual word embedding spaces of two different languages. Recent work is mostly based on supervised approaches, while their success relies on bilingual seed dictionaries derived from aligned data. The unsupervised adversarial approaches, which utilize generative adversarial networks (GANs) to map the global monolingual space to another space, can eliminate the need for aligned data. However, most GAN-based unsupervised approaches ignore the issues of mode collapse and gradient disappearance in GANs, leading to a training failure to converge. In addition, these approaches often fail to account for the low isomorphism between language pairs, which prevents capturing the non-linear relationship contained in cross-lingual embedding spaces. To address these issues, we propose a novel unsupervised unified framework with an adaptive training objective for the GANs' improvement (ATOGAN) and a local mapping (LM) strategy for exploring the non-linear relationship. We present ATOGAN to learn bi-directional global mapping using unaligned word embeddings, which integrates particle swarm optimization (PSO) to adaptively select the training objective for preventing mode collapse and gradient disappearance. Then, we design an LM strategy based on the guidance of dictionaries generated by trained ATOGAN to alleviate reliance on isomorphism assumption for purely linear mapping. Experimental results demonstrate the effectiveness of our proposed method for cross-lingual word alignment in low isomorphic embedding spaces (distant language pairs).