Harsit Upadhya

M.S. Computer Science · Emory University · Class of 2026

I build AI systems that solve real problems — from sub-second query engines on 385M events to medical image enhancement with Pix2Pix and retrieval-augmented chatbots for biomedical QA. My work spans the full stack: data engineering, deep learning, and production ML. Currently researching digital health at the Emory FIT Lab.

Harsit Upadhya
01

Research Interests

My research explores how large language models and deep learning can be applied to real-world problems in healthcare and science. Current focus areas: retrieval-augmented generation for domain-specific QA, medical image enhancement, graph neural networks, and digital health monitoring using passive technology interaction data.

Retrieval-Augmented Generation Large Language Models Medical Imaging Digital Health Graph Neural Networks Data Engineering Deep Learning
02

Projects

Medical Image Enhancement (Pix2Pix)

Extended Pix2Pix with self-attention for chest X-ray quality improvement on NIH ChestX-ray14. Built synthetic degradation pipeline for paired training data.

PSNR 39.97 dB SSIM 0.9755
PyTorch Pix2Pix / cGAN U-Net PatchGAN Self-Attention

nlTGCR: Second-Order Optimizer

A second-order optimizer using the Fisher Information Matrix that beats Adam and RMSProp on MLPs. Applied Nyström approximation and Kronecker-factored preconditioning with JAX-style JIT compilation.

17× faster per epoch +3.2% accuracy vs Adam
PyTorch Fisher Information Matrix Nyström Approx. K-FAC JIT Compilation CIFAR-10

PEGASUS Paper Summarizer

Abstractive summarization pipeline for arXiv papers using google/pegasus-pubmed. Trained on 1,000 papers with beam search decoding. Published model on Hugging Face.

ROUGE-1: 0.377 ROUGE-2: 0.126 ROUGE-L: 0.219
PEGASUS PyTorch Lightning Hugging Face A100 / CUDA ROUGE

TasteMatch: AI Dietitian Chatbot

LLM-powered personal dietitian for diabetes management. Generates personalized meal recommendations with glycemic index verification and portion size calculations against clinical guidelines.

Ollama FastAPI LLMs RAG Diabetes Care

GNN Document Classification

Document relationship modeling on CORA using Graph Neural Networks. Compared GCN, GAT, and GraphSAGE architectures for citation prediction and clustering.

PyTorch Geometric GCN GAT GraphSAGE CORA Dataset

RAG-BioQA

Retrieval-augmented biomedical QA on PubMedQA. Dense retrieval with BioBERT + FAISS, re-ranking with BM25/ColBERT/MonoT5, and LoRA fine-tuned T5 generator.

BioBERT FAISS T5 + LoRA ColBERT MonoT5 PubMedQA

View all projects & research →

03

Publications

Harsit Upadhya, Upadhyay, A.

XNLI 2.0: Improving XNLI Dataset and Performance on Cross-Lingual Understanding

IEEE 8th I2CT Conference · 2023

View all publications →

04

Experience

Jan 2025 – Present

Graduate Research Assistant

Emory FIT Lab · Atlanta, GA

  • Built automated pipeline to extract and analyze Amazon Alexa voice interaction logs using Python/Selenium
  • Identified technology engagement patterns indicating functional decline in older adults for digital health monitoring

Jan 2026 – Present

Graduate Teaching Assistant

Emory University · Atlanta, GA

  • Instruct 40 undergraduates in Data Science 100, covering R tidyverse, data cleaning, visualization, and EDA

May 2025 – Oct 2025

VP, International Student Affairs

Graduate Student Government Association (GSGA)

  • Selected executive board member; served as primary liaison for 500+ international graduate students
05

Get in Touch

Open to collaborations in AI/ML, digital health, or NLP. Feel free to reach out.

Or via GitHub or LinkedIn.