Skip to Main content Skip to Navigation
Reports

Companion report to the Ph.D. dissertation: "Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations”

Anh Khoa Ngo Ho 1 
1 TLP - Traitement du Langage Parlé
LISN - Laboratoire Interdisciplinaire des Sciences du Numérique, STL - Sciences et Technologies des Langues
Abstract : This is a companion document to the Ph.D. dissertation "Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations” [Ngo Ho, 2021]. This document contains an exhaustive collection of graphs and tables related the analysis of various aspects of automatic word alignment, such as for instance the aligned/unaligned words, rare/unknown words, function/content words, word order divergences, etc; and for six language pairs: English with French, German, Romanian, Czech, Japanese and Vietnamese. We mostly analyze statistical word alignment models (Giza++ and Fastalign) as well as several variants based on neural models: IBM style word alignment models including context-independent models, contextual models, and character-based models; variants of a fully generative neural model based on variational autoencoders. We also document a deep analysis for Byte-Pair-Encoding, a subword tokenization algorithm. For information regarding these various methods, please refer to the thesis.
Document type :
Reports
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03153752
Contributor : Anh Khoa NGO HO Connect in order to contact the contributor
Submitted on : Thursday, April 1, 2021 - 6:04:57 PM
Last modification on : Sunday, June 26, 2022 - 3:05:34 AM
Long-term archiving on: : Friday, July 2, 2021 - 6:02:52 PM

File

Companion report to the PhD di...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03153752, version 1

Citation

Anh Khoa Ngo Ho. Companion report to the Ph.D. dissertation: "Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations”. [Technical Report] Université Paris Saclay; Laboratoire Interdisciplinaire des Sciences du Numérique. 2021. ⟨hal-03153752⟩

Share

Metrics

Record views

47

Files downloads

2