• Home
  • Advanced Search
  • Directory of Libraries
  • About lib.ir
  • Contact Us
  • History

عنوان
Cross-Lingual Alignment of Word & Sentence Embeddings

پدید آورنده
Aldarmaki, Hanan

موضوع
Alignment,Arabic language,Bilingual dictionaries,Computer science,Dialects,English,French language,Language,Language acquisition,Mathematics,Natural language processing,Parallel corpora,Sentences,Space

رده

کتابخانه
Center and Library of Islamic Studies in European Languages

محل استقرار
استان: Qom ـ شهر: Qom

Center and Library of Islamic Studies in European Languages

تماس با کتابخانه : 32910706-025

NATIONAL BIBLIOGRAPHY NUMBER

Number
TL50265

LANGUAGE OF THE ITEM

.Language of Text, Soundtrack etc
انگلیسی

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper
Cross-Lingual Alignment of Word & Sentence Embeddings
General Material Designation
[Thesis]
First Statement of Responsibility
Aldarmaki, Hanan
Subsequent Statement of Responsibility
Diab, Mona T.

.PUBLICATION, DISTRIBUTION, ETC

Name of Publisher, Distributor, etc.
The George Washington University
Date of Publication, Distribution, etc.
2019

GENERAL NOTES

Text of Note
113 p.

DISSERTATION (THESIS) NOTE

Dissertation or thesis details and type of degree
Ph.D.
Body granting the degree
The George Washington University
Text preceding or following the note
2019

SUMMARY OR ABSTRACT

Text of Note
One of the notable developments in current natural language processing is the practical efficacy of probabilistic word representations, where words are embedded in high-dimensional continuous vector spaces that are optimized to reflect their distributional relationships. For sequences of words, such as phrases and sentences, distributional representations can be estimated by combining word embeddings using arithmetic operations like vector averaging or by estimating composition parameters from data using various objective functions. The quality of these compositional representations is typically estimated by their performance as features in extrinsic supervised classification benchmarks. Word and compositional embeddings for a single language can be induced without supervision using a large training corpus of raw text. To handle multiple languages and dialects, bilingual dictionaries and parallel corpora are often used for learning cross-lingual embeddings directly or to align pre-trained monolingual embeddings. In this work, we explore and develop various cross-lingual alignment techniques, compare the performance of the resulting cross-lingual embeddings, and study their characteristics. We pay particular attention to the bilingual data requirements of each approach since lower requirements facilitate wider language expansion. To begin with, we analyze various monolingual general-purpose sentence embedding models to better understand their qualities. By comparing their performance on extrinsic evaluation benchmarks and unsupervised clustering, we infer the characteristics of the most dominant features in their respective vector spaces. We then look into various cross-lingual alignment frameworks with different degrees of supervision. We begin with unsupervised word alignment, for which we propose an approach for inducing cross-lingual word mappings with no prior bilingual resources. We rely on assumptions about the consistency and structural similarities between the monolingual vector spaces of different languages. Using comparable monolingual news corpora, our approach resulted in highly accurate word mappings for two language pairs: French to English, and Arabic to English. With various refinement heuristics, the performance of the unsupervised alignment methods approached the performance of supervised dictionary mapping. Finally, we develop and evaluate different alignment approaches based on parallel text. We show that incorporating context in the alignment process often leads to significant improvements in performance. At the word level, we explore the alignment of contextualized word embeddings that are dynamically generated for each sentence. At the sentence level, we develop and investigate three alignment frameworks: joint modeling, representation transfer, and sentence mapping, applied to different sentence embedding models. We experiment with a matrix factorization model based on word-sentence co-occurrence statistics, and two general-purpose neural sentence embedding models. We report the performance of the various cross-lingual models with different sizes of parallel corpora to assess the minimal degree of supervision required by each alignment framework.

UNCONTROLLED SUBJECT TERMS

Subject Term
Alignment
Subject Term
Arabic language
Subject Term
Bilingual dictionaries
Subject Term
Computer science
Subject Term
Dialects
Subject Term
English
Subject Term
French language
Subject Term
Language
Subject Term
Language acquisition
Subject Term
Mathematics
Subject Term
Natural language processing
Subject Term
Parallel corpora
Subject Term
Sentences
Subject Term
Space

PERSONAL NAME - PRIMARY RESPONSIBILITY

Aldarmaki, Hanan

PERSONAL NAME - SECONDARY RESPONSIBILITY

Diab, Mona T.

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

The George Washington University

ELECTRONIC LOCATION AND ACCESS

Electronic name
 مطالعه متن کتاب 

p

[Thesis]
276903

a
Y

Proposal/Bug Report

Warning! Enter The Information Carefully
Send Cancel
This website is managed by Dar Al-Hadith Scientific-Cultural Institute and Computer Research Center of Islamic Sciences (also known as Noor)
Libraries are responsible for the validity of information, and the spiritual rights of information are reserved for them
Best Searcher - The 5th Digital Media Festival