Atharva Kulkarni

Hello ! I am a second year Masters of Language Technologies (MLT) student in the Language Technologies Institute, School of Computer Science at Carnegie Mellon University, advised by Barnabás Póczos. My research interests lie at the intersection of machine learning and natural language processing, with a particular focus on:

Analyzing and enhancing generalization, robustness, and fairness of present-day neural networks (particularly LLMs).
Learning with limited data, active learning, data valuation algorithms, and data-centric machine learning.
Parameter efficient machine learning.

Currently, I am working on information theoretic measures for improving fairness and robustness of ML/NLP systems. I’ve also been fortunate to collaborate with several esteemed facutly at CMU, such as Graham Neubig, Aditi Raghunathan, Ameet Talwalkar, Emma Strubell, and David R. Mortensen, on various research projects that analyze and improve generalization and efficiency of neural networks. For the summer of 2023, I interned with Apple Research (Siri), investigating the emergent capabilities of LLMs for dialog applications.

Before coming to CMU, I was a Predoctoral Researcher / Research Associate with Prof. Tanmoy Chakraborty at the Laboratory for Computational Social Systems (LCS2), IIIT Delhi working on various research projects in NLP, multimodal machine learning, social computing, and conversational systems. My research has been published at top NLP/ML conferences such as ACL, EMNLP, EACL, SIGKDD, and IJCAI.

You can learn more about my publications here. You can find my detailed CV here.

I’m eager to connect with my academic peers! If our research interests align (or diverge) in intriguing ways, I’d be delighted to explore potential collaborations or simply exchange ideas! Additionally, I’m also looking for research internship opportunities for Summer 2024 to work on data-centric machine learning or generalization / fairness / efficiency of LLMs. Please feel free to reach out via email, if there is a good fit!

News

Feb 2024	Long standing work on Multitask Learning for Worst-Group Generalization got accepted to TMLR 2024!
Jan 2024	Summer 2023 internship work at Apple ML Research (Siri) on synthetic data generation for few-shot DST got accepted to EACL 2024 main conference. See you in Malta!
Oct 2023	Work on Wuggpt and Facutal Error Correction accepted to EMNLP main conference and Findings, respectively.

🕰️ all news ...

Selected Publications

TMLR

Multitask Learning Can Improve Worst-Group Outcomes

Atharva Kulkarni, Lucio M. Dery, Amrith Setlur, Aditi Raghunathan, Ameet Talwalkar, and Graham Neubig

Transactions on Machine Learning Research, 2024

Abs HTML PDF

In order to create machine learning systems that serve a variety of users well, it is vital to not only achieve high average performance but also ensure equitable outcomes across diverse groups. However, most machine learning methods are designed to improve a model’s average performance on a chosen end task without consideration for their impact on worst group error. Multitask learning (MTL) is one such widely used technique. In this paper, we seek not only to understand the impact of MTL on worst-group accuracy but also to explore its potential as a tool to address the challenge of group-wise fairness. We primarily consider the standard setting of fine-tuning a pre-trained model, where, following recent work (Gururangan et al., 2020; Dery et al., 2023), we multitask the end task with the pre-training objective constructed from the end task data itself. In settings with few or no group annotations, we find that multitasking often, but not consistently, achieves better worst-group accuracy than Just-Train-Twice (JTT; Liu et al. (2021)) – a representative distributionally robust optimization (DRO) method. Leveraging insights from synthetic data experiments, we propose to modify standard MTL by regularizing the joint multitask representation space. We run a large number of fine-tuning experiments across computer vision and natural language processing datasets and find that our regularized MTL approach consistently outperforms JTT on both average and worst-group outcomes. Our official code can be found here: https://github.com/atharvajk98/MTL-group-robustness.
EACL

SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking

Atharva Kulkarni, Bo-Hsiang Tseng, Joel Moniz, Dhivya Piraviperumal, Hong Yu, and Shruti Bhargava

In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Mar 2024

Abs HTML PDF

In-context learning with Large Language Models (LLMs) has emerged as a promising avenue of research in Dialog State Tracking (DST). However, the best-performing in-context learning methods involve retrieving and adding similar examples to the prompt, requiring access to labeled training data. Procuring such training data for a wide range of domains and applications is time-consuming, expensive, and, at times, infeasible. While zero-shot learning requires no training data, it significantly lags behind the few-shot setup. Thus, ‘\textitCan we efficiently generate synthetic data for any dialogue schema to enable few-shot prompting?’ Addressing this question, we propose , a data generation framework tailored for DST, utilizing LLMs. Our approach only requires the dialogue schema and a few hand-crafted dialogue templates to synthesize natural, coherent, and free-flowing dialogues with DST annotations. Few-shot learning using data from results in 4-5% improvement in Joint Goal Accuracy over the zero-shot baseline on MultiWOZ 2.1 and 2.4. Remarkably, our few-shot learning approach recovers nearly 98% of the performance compared to the few-shot setup using human-annotated training data.
SIGKDD

Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment

Atharva Kulkarni^*, Sarah Masud^*, Vikram Goyal, and Tanmoy Chakraborty

In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Mar 2023

Abs HTML PDF Code

Social media is awash with hateful content, much of which is often veiled with linguistic and topical diversity. The benchmark datasets used for hate speech detection do not account for such divagation as they are predominantly compiled using hate lexicons. However, capturing hate signals becomes challenging in neutrally-seeded malicious content. Thus, designing models and datasets that mimic the real-world variability of hate warrants further investigation.To this end, we present GOTHate, a large-scale code-mixed crowdsourced dataset of around 51k posts for hate speech detection from Twitter. GOTHate is neutrally seeded, encompassing different languages and topics. We conduct detailed comparisons of GOTHate with the existing hate speech datasets, highlighting its novelty. We benchmark it with 10 recent baselines. Our extensive empirical and benchmarking experiments suggest that GOTHate is hard to classify in a text-only setup. Thus, we investigate how adding endogenous signals enhances the hate speech detection task. We augment GOTHate with the user’s timeline information and ego network, bringing the overall data source closer to the real-world setup for understanding hateful content. Our proposed solution HEN-mBERT is a modular, multilingual, mixture-of-experts model that enriches the linguistic subspace with latent endogenous signals from history, topology, and exemplars. HEN-mBERT transcends the best baseline by 2.5% and 5% in overall macro-F1 and hate class F1, respectively. Inspired by our experiments, in partnership with Wipro AI, we are developing a semi-automated pipeline to detect hateful content as a part of their mission to tackle online harm.
IJCAI

Learning and Reasoning Multifaceted and Longitudinal Data for Poverty Estimates and Livelihood Capabilities of Lagged Regions in Rural India

Atharva Kulkarni, Raya Das, Ravi S. Srivastava, and Tanmoy Chakraborty

In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, Aug 2023

AI for Good - Projects

Abs HTML PDF

Poverty is a multifaceted phenomenon linked to the lack of capabilities of households to earn a sustainable livelihood, increasingly being assessed using multidimensional indicators. Its spatial pattern depends on social, economic, political, and regional variables. Artificial intelligence has shown immense scope in analyzing the complexities and nuances of poverty. The proposed project aims to examine the poverty situation of rural India for the period of 1990-2022 based on the quality of life and livelihood indicators. The districts will be classified into ‘advanced’, ‘catching up’, ‘falling behind’, and ‘lagged’ regions. The project proposes to integrate multiple data sources, including conventional national-level large sample household surveys, census surveys, and proxy variables like daytime, and nighttime data from satellite images, and communication networks, to name a few, to provide a comprehensive view of poverty at the district level. The project also intends to examine causation and longitudinal analysis to examine the reasons for poverty. Poverty and inequality could be widening in developing countries due to demographic and growth-agglomerating policies. Therefore, targeting the lagging regions and the vulnerable population is essential to eradicate poverty and improve the quality of life to achieve the goal of ‘zero poverty’. Thus, the study also focuses on the districts with a higher share of the marginal section of the population compared to the national average to trace the performance of development indicators and their association with poverty in these regions.
EMNLP

Empowering the Fact-checkers! Automatic Identification of Claim Spans on Twitter

Atharva Kulkarni^*, Megha Sundriyal^*, Vaibhav Pulastya, Md. Shad Akhtar, and Tanmoy Chakraborty

In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Dec 2022

Abs HTML PDF Code

The widespread diffusion of medical and political claims in the wake of COVID-19 has led to a voluminous rise in misinformation and fake news. The current vogue is to employ manual fact-checkers to efficiently classify and verify such data to combat this avalanche of claim-ridden misinformation. However, the rate of information dissemination is such that it vastly outpaces the fact-checkers’ strength. Therefore, to aid manual fact-checkers in eliminating the superfluous content, it becomes imperative to automatically identify and extract the snippets of claim-worthy (mis)information present in a post. In this work, we introduce the novel task of Claim Span Identification (CSI). We propose CURT, a large-scale Twitter corpus with token-level claim spans on more than 7.5k tweets. Furthermore, along with the standard token classification baselines, we benchmark our dataset with DABERTa, an adapter-based variation of RoBERTa. The experimental results attest that DABERTa outperforms the baseline systems across several evaluation metrics, improving by about 1.5 points. We also report detailed error analysis to validate the model’s performance along with the ablation studies. Lastly, we release our comprehensive span annotation guidelines for public use.
ACL

When did you become so smart, oh wise one?! Sarcasm Explanation in Multi-modal Multi-party Dialogues

Atharva Kulkarni^*, Shivani Kumar^*, Md Shad Akhtar, and Tanmoy Chakraborty

In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022

Abs HTML PDF Code

Indirect speech such as sarcasm achieves a constellation of discourse goals in human communication. While the indirectness of figurative language warrants speakers to achieve certain pragmatic goals, it is challenging for AI agents to comprehend such idiosyncrasies of human communication. Though sarcasm identification has been a well-explored topic in dialogue analysis, for conversational systems to truly grasp a conversation’s innate meaning and generate appropriate responses, simply detecting sarcasm is not enough; it is vital to explain its underlying sarcastic connotation to capture its true essence. In this work, we study the discourse structure of sarcastic conversations and propose a novel task – Sarcasm Explanation in Dialogue (SED). Set in a multimodal and code-mixed setting, the task aims to generate natural language explanations of satirical conversations. To this end, we curate WITS, a new dataset to support our task. We propose MAF (Modality Aware Fusion), a multimodal context-aware attention and global information fusion module to capture multimodality and use it to benchmark WITS. The proposed attention module surpasses the traditional multimodal fusion baselines and reports the best performance on almost all metrics. Lastly, we carry out detailed analysis both quantitatively and qualitatively.