I am currently working as a Researcher at Huawei Singapore Research Center. I obtained my
Ph.D. from Nanyang
Technological University (NTU) in 2022, where I was a founding member of the NTU-NLP Group and was supervised
by Professor Shafiq Joty.
Notably, my doctoral dissertation was met with significant acclaim, leading to the conferral
of the distinguished Outstanding Ph.D. Thesis Award. I worked as a research intern at Meta AI hosted by Professor Philipp Koehn.
My research interest falls broadly in the
application of Machine Learning, especially Natural Language Processing. I am also
interested in Multimodal Representation Learning. I invest considerable time and effort in
delving into the intricacies of large-scale models, aiming to unravel their nuanced
functionalities and underlying mechanisms.
Tasnim Mohiuddin , Philipp Koehn, Vishrav Chaudhary, James Cross, Shruti Bhosale,
and Shafiq Joty, "Data Selection Curriculum for Neural Machine Translation".
In Findings of
2022 Conference on Empirical Methods in Natural Language Processing
(EMNLP 2022),
Abu Dhabi, UAE.
[PDF]
[Code]
M Saiful Bari, Tasnim Mohiuddin (Equal Contributions), and Shafiq Joty,
"UXLA: : A Robust Unsupervised Data Augmentation Framework for Cross-Lingual
NLP".
In Proceedings of
The Joint Conference of the 59th Annual Meeting of the Association for Computational
Linguistics and the 11th International Joint Conference on Natural Language
Processing
(ACL-IJCNLP 2021).
[PDF]
[Code]
Tasnim Mohiuddin , M Saiful Bari, and Shafiq Joty,
"AugVic: Exploiting BiText Vicinity for Low-Resource NMT".
In Findings of
The Joint Conference of the 59th Annual Meeting of the Association for Computational
Linguistics and the 11th International Joint Conference on Natural Language
Processing
(ACL-IJCNLP 2021).
[PDF]
[Code]
Tasnim Mohiuddin , Prathyusha Jwalapuram, Xiang Lin, and Shafiq Joty,
"Rethinking Coherence Modeling: Synthetic vs. Downstream Tasks".
In Proceedings of the
16th conference of the European Chapter of the Association for Computational
Linguistics
(EACL 2021), Kyiv, Ukraine.
[PDF]
[Resource]
Tasnim Mohiuddin , M Saiful Bari, and Shafiq Joty,
"LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through
Non-Linear Mapping in Latent Space".
In Proceedings of the
Conference on Empirical Methods in Natural Language Processing
(EMNLP 2020).
[PDF]
[Presentation]
[Code]
Han-Cheol Moon,
Tasnim Mohiuddin (Equal Contributions), Shafiq Joty, and Chi Xu,
"A Unified Neural Coherence Model". In Proceedings of the
Conference on Empirical Methods in Natural Language Processing and International
Joint Conference on Natural Language Processing
(EMNLP-IJCNLP 2019), Hong Kong, China.
[PDF]
[Presentation]
[Code]
Tasnim Mohiuddin and Shafiq Joty,
"Revisiting Adversarial Autoencoder for Unsupervised Word Translation with Cycle
Consistency
and Improved Training". In Proceedings of the
North American Chapter of the Association for Computational Linguistics: Human
Language Technologies
(NAACL-HLT 2019), Minneapolis, USA.
[PDF]
[Presentation]
[Poster]
[Code]
Tasnim Mohiuddin , Thanh-Tung Nguyen, and Shafiq Joty,
"Adaptation of Hierarchical Structured Models for Speech Act Recognition in
Asynchronous
Conversation"
. In Proceedings of the
North American Chapter of the Association for Computational Linguistics: Human
Language Technologies
(NAACL-HLT 2019), Minneapolis, USA.
[PDF]
[Presentation]
[Poster]
[Code]
Tasnim Mohiuddin , Shafiq Joty, and Dat Nguyen,
"Coherence Modeling of Asynchronous Conversations: A Neural Entity Grid
Approach". In
Proceedings of the
56th Annual Meeting of the Association
for Computational Linguistics
(ACL 2018), Melbourne, Australia.
[PDF]
[Poster]
[Code]
Tasnim Mohiuddin and
Shafiq Joty,
"Unsupervised Word Translation with Adversarial Autoencoder".
Computational Linguistics
(Special Issue of Computational Linguistics on Multilingual and Interlingual
Semantic Representations for Natural Language Processing) : pages XXX - XXX (2020),
MIT press (June 2020).
[PDF]
[Presentation]
Shafiq Joty and
Tasnim Mohiuddin,
"Modeling Speech Acts in Asynchronous Conversations: A Neural-CRF Approach".
Computational Linguistics
(Special Issue on Language in Social Media)
44:4 , pages 859 - 894 (2018),
MIT press (2018).
[PDF]
[Code]
Neural Machine Translation (NMT) models are typically trained on heterogeneous
data that are concatenated and randomly shuffled. However, not all of the
training data are equally useful to the model. Curriculum training aims to
present the data to the NMT models in a meaningful order.
In this project, we introduce a two-stage training framework for NMT where we
fine-tune a base NMT model on subsets of data, selected by both deterministic
scoring using pre-trained methods and online scoring that considers prediction
scores of the emerging NMT model.
Through comprehensive experiments on six language pairs comprising low- and
high-resource languages, we have shown that our curriculum strategies
consistently demonstrate better quality and faster convergence.
Publication: EMNLP-2022
The ability to edit images in a realistic and visually appealing manner is a
fundamental requirement in various computer vision applications. In this paper,
we present ImEW, a unified framework designed for solving image editing
tasks.
ImEW utilizes off-the-shelf foundation models to address four essential
editing tasks: object removal, object translation, object replacement, and
generative fill beyond the image frame. These tasks are accomplished by
leveraging the capabilities of state-of-the-art foundation models, namely the
Segment Anything Model, Grounding DINO, LaMa, and Stable Diffusion. These models
have undergone extensive training on large-scale datasets and have exhibited
exceptional performance in understanding image context, object manipulation, and
texture synthesis.
Through extensive experimentation, we demonstrate the effectiveness and
versatility of ImEW in accomplishing image editing tasks across a wide
range of real-world scenarios. The proposed framework opens up new possibilities
for realistic and visually appealing image editing and enables diverse
applications requiring sophisticated image modifications. Additionally, we
discuss the limitations and outline potential directions for future research in
the field of image editing using off-the-shelf foundation models, enabling
continued advancements in this domain.
The success of Neural Machine Translation (NMT) largely depends on the
availability of large bitext training corpora. Due to the lack of such large
corpora in low-resource language pairs, NMT systems often exhibit poor
performance. Extra relevant monolingual data often helps, but acquiring it could
be quite expensive, especially for low-resource languages. Moreover, domain
mismatch between bitext (train/test) and monolingual data might degrade the
performance. To alleviate such issues, we propose AugVic, a novel data
augmentation framework for low-resource NMT which exploits the vicinal samples
of the given bitext without using any extra monolingual data explicitly.
It can diversify the in-domain bitext data with finer-level control. Through
extensive experiments on four low-resource language pairs comprising data from
different domains, we have shown that our method is comparable to the
traditional back-translation that uses extra in-domain monolingual data. When we
combine the synthetic parallel data generated from AugVic with the ones
from the extra monolingual data, we achieve further improvements. We show that
AugVic helps to attenuate the discrepancies between relevant and
distant-domain monolingual data in traditional back-translation.
Publication: ACL-2021
Transfer learning has yielded state-of-the-art (SoTA) results in many supervised
natural language processing tasks. However, annotated data for every target task
in every target language is rare, especially for low-resource languages. We
target to solve cross-lingual adaptation problems from a source language
distribution to an unknown target language distribution, assuming no training
labels are available for the target language task.
In this project, we propose UXLA, a generic data augmentation
framework for
self-supervised learning in zero-resource transfer learning scenarios. At its
core, UXLA performs simultaneous self-training with data augmentation and
unsupervised sample selection. We augment data from the unlabeled
training examples in the target language as well as from the virtual input
samples (eg sentences) generated from the vicinity distribution of the source
and target language sentences. With the augmented data, UXLA performs
simultaneous self-learning with an effective distillation strategy to learn a
strongly adapted cross-lingual model from noisy (pseudo) labels for the target
language task. We propose novel ways to generate virtual input samples using
XLMR - a multilingual masked language model, and get reliable task labels by
simultaneous multilingual co-training.
To show our proposed methods' effectiveness, we conduct
extensive experiments on zero-resource cross-lingual transfer tasks for Named
Entity Recognition (XNER) and Natural Language Inference (XNLI).
UXLA achieves SoTA results in both tasks, outperforming the baselines by a
good margin. With an in-depth model dissection, we demonstrate the cumulative
contributions of different components to UXLA's success.
Publication: ACL-2021
Most of the successful and predominant methods for Bilingual Lexicon Induction
(BLI) are mapping-based, where a linear mapping function is learned with the
assumption that the word embedding spaces of different languages exhibit similar
geometric structures i.e. approximately isomorphic}). However, several
recent studies have criticized this simplified assumption showing that it does
not hold in general even for closely related languages. In this work, we propose
a novel semi-supervised method to learn cross-lingual word embeddings for
BLI. Our model is independent of the isomorphic assumption and uses
non-linear mapping in the latent space of two independently pre-trained}
autoencoders. Through extensive experiments on fifteen (15) different language
pairs (in both directions) comprising resource-rich and low-resource languages
from two different datasets, we demonstrate that our method outperforms existing
models by a good margin. Ablation studies show the importance of different model
components and the necessity of non-linear mapping.
Publication: EMNLP-2020
Suppose we are given
monolingual word embeddings for source and target languages. We do not
have any
initial dictionary or external cross-lingual signal. Our
goal is to learn word translation (a.k.a. bilingual lexicon induction)
i.e. for
a given source word, we want its translation in the target domain.
Cross-lingual word embeddings learned from monolingual embeddings have a
crucial
role in many downstream tasks, ranging from machine translation to transfer
learning.
Adversarial training has shown impressive success in learning
cross-lingual embeddings
and the associated word translation task without any parallel data by mapping
monolingual
embeddings to a shared space. In this project, we investigate
adversarial autoencoder for unsupervised word translation and propose
two novel extensions to it that yield more stable training and improved
results.
Our method includes
regularization terms to enforce
cycle consistency and
input reconstruction, and puts the target encoders as an adversary
against the
corresponding discriminator.
We use two types of
refinement procedures sequentially after obtaining the trained encoders
and mappings
from the adversarial training, namely,
refinement with Procrustes solution and
refinement with symmetric re-weighting.
Extensive experimentations with European, non-European and low-resource
languages
show that our method achieves better performance than existing adversarial and
non-adversarial
approaches and is also competitive with the supervised system. Along with
performing
comprehensive ablation studies to understand the contribution of different
components
of our adversarial model, we also conduct a thorough analysis of the refinement
procedures
to understand their effects.
Publication: NAACL-HLT 2019 , CL Journal 2020
Sentences in a text or a conversation do not occur independently; rather they
are connected to form a coherent discourse
that is easy to comprehend.
Coherence models are computational models that can distinguish a coherent
discourse
from incoherent ones. It has ranges of applications in text generation,
summarization,
and coherence scoring.
In this project, we conduct our research in
two steps.
First, we propose improvements to the recently proposed
neural entity grid model by
lexicalizing its entity transitions. We propose methods based on word
embeddings
to achieve better generalization with the lexicalized model.
Second, we extend the model to asynchronous conversations by
incorporating the
underlying conversational structure in the entity grid representation and
feature
computation. For this, we propose a novel grid representation for asynchronous
conversations
and adapt the convolution layer of the neural model accordingly.
Our model achieves state of the art results on standard coherence
assessment tasks
in monologue and conversations outperforming existing models. We also
demonstrate
its effectiveness in reconstructing thread structures.
Publication: ACL 2018
In this project, we work on the limitations of existing models which
underperform on tasks that require the model to be sensitive
to local contexts such as candidate ranking in conversational dialogue and in
machine
translation.
We propose a
unified coherence model that incorporates
sentence grammar,
inter-sentence coherence relations, and
global coherence patterns in a single framework. We use an
LSTM sentence encoder with explicit language model loss to capture the
syntax.
Inter-sentence discourse relations are modeled with a
bilinear layer, and a
lightweight convolution-pooling is used to capture the attention and
topic structures
(global coherence patterns ).
We evaluate our models on both local and global discrimination tasks on the
benchmark
dataset. Our results show that our approach outperforms existing methods by a
wide
margin in both tasks.
Publication: EMNLP-IJCNLP 2019
With the advent of Internet technologies, communication media like emails and
discussion forums have become common-place
for discussing work, issues, events, and experiences. Participants in these
media
interact with each other
asynchronously by writing at different times. Participants in an
asynchronous
conversation interact with each other in complex ways, performing certain
communicative
acts like asking questions, requesting information or suggesting something.
These
are called
speech acts.
Unlike synchronous conversations (e.g. meeting, phone), modeling
conversational
dependencies between sentences in an asynchronous conversation are challenging.
The
conversational flow often lacks sequential dependencies in its
temporal/chronological
order.
For example, if we arrange the sentences as they arrive in the
conversation,
it becomes hard to capture any dependency between the acts types because the two
components of the adjacency pairs can be far apart in the sequence. This leaves
us
with
one open research question:
how do we model the dependencies between sentences in a single comment and
between
sentences across different comments? Another major problem is
insufficient training data in the asynchronous domains.
In this project, we propose methods to effectively leverage abundant
unlabeled conversational
data and the available labeled data from synchronous domains. We carry out our
research
in
three main steps.
First, we introduce an end-to-end neural architecture based on a
hierarchical
LSTM encoder with a Softmax or conditional random fields (CRF) output layer, and
show that our method outperforms existing methods when trained on in-domain data
only.
Second, we improve our initial SAR models by semi-supervised learning in
the
form of pretrained word embeddings learned from a large unlabeled conversational
corpus.
Finally, we adapt our hierarchical LSTM encoder using domain adversarial
training
to leverage the labeled data from synchronous domains by explicitly modeling the
shift in the two domains.
We evaluate our models on three different asynchronous datasets containing
forum
and email conversations, and on the MRDA meeting corpus. Our main findings are:
Publications: CL Journal 2018, NAACL-HLT 2019