Publications

Replicable Benchmarking of Neural Machine Translation for Low-Resource Local Languages in Indonesia

Published in SEALP 2023 Workshop in IJCNLP-AACL 2023, 2023

Benchmarking potential breakthroughs became a challenge due to the high-resource settings many models utilize. However, using weak baselines are not a convincing way in showing the impact of a breakthrough. So, we created a replicable benchmarking of NMT for low-resource local languages in Indonesia made using only one GPU and utilizing only 96 hours per model.

Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia". Lucky Susanto, Ryandito Diandaru, Adila Krisnadhi, Ayu Purwarianti, Derry Wijaya. In the First Workshop in South East Asian Language Processing (SEALP), 2023. hhttps://arxiv.org/abs/2311.00998

Could We Have Had Better Multilingual LLMs If English Was Not the Central Language?

Published in LREC-Coling 2024, 2024

Large Language Models (LLMs) demonstrate strong machine translation capabilities on languages they are trained on. However, the impact of factors beyond training data size on translation performance remains a topic of debate, especially concerning languages not directly encountered during training. Our study delves into Llama2’s translation capabilities. By modeling a linear relationship between linguistic feature distances and machine translation scores, we ask ourselves if there are potentially better central languages for LLMs other than English.

TBD TBD