Research Projects
Projects spanning retrieval-augmented generation, medical imaging, second-order optimization, NLP, and graph neural networks — all with detailed technical reports.
Systems & Data Engineering
E-Commerce Behavior Analytics Platform
Featured
A high-performance analytics platform that turns 385M+ raw e-commerce events into sub-second business intelligence. Three-tier cloud-native stack: React 18 + Material-UI frontend on Netlify, FastAPI backend on Google Cloud Run, PostgreSQL 14 with star schema on Cloud SQL.
300–600× faster
<1s queries
385M events
52GB dataset
Monthly partitioning (7 partitions) reduced scan size by 85% · 5 materialized views for pre-computed aggregations · B-tree indexing on product_id, user_session, event_type. Storage overhead: ~35% (~$3/month on Google Cloud SQL).
PostgreSQL 14
FastAPI
React 18
Star Schema
Material-UI
Recharts
Google Cloud SQL
Cloud Run
Docker
Generative & Vision
Medical Image Enhancement (Pix2Pix)
Implemented and extended the Pix2Pix conditional GAN for automated chest X-ray enhancement. Built a synthetic degradation pipeline (Gaussian noise σ=15, blur 3×3, JPEG quality 50) for paired training data on NIH ChestX-ray14 (4,999 frontal radiographs). Extended with Self-Attention SAGAN-style modules at the U-Net bottleneck.
PSNR 39.97 dB
SSIM 0.9755
200 epochs
Tesla T4 (16GB)
Key finding: Self-attention added 2.5M parameters and 50% training overhead but did not improve metrics — X-ray enhancement is a local operation well-served by U-Net skip connections.
PyTorch
Pix2Pix / cGAN
U-Net
PatchGAN
Self-Attention
NIH ChestX-ray14
PSNR / SSIM
Optimization & Theory
nlTGCR: Second-Order Optimizer
Designed a scalable second-order optimization algorithm using the Fisher Information Matrix (FIM) as a symmetric positive-definite Hessian approximation. Applied Nyström approximation (rank-k subsampling) for cheap FIM inversion and Kronecker-factored preconditioning (K-FAC) for linear layers. Used JAX-style JIT compilation for C-level matrix operation speeds.
17× faster per epoch
+3.2% accuracy vs Adam (MLP)
0.42s/epoch (5-layer MLP)
CIFAR-10 results: nlTGCR outperformed Adam/RMSProp on MLPs (54.52% vs 51.3%) with 17× faster epoch time. On CNNs, accuracy was comparable — convolutional structure breaks dense-Hessian assumption. Submitted to ICMLC '25.
PyTorch
Fisher Information Matrix
Nyström Approx.
K-FAC
JIT Compilation
CIFAR-10
NLP & Summarization
PEGASUS Scientific Paper Summarizer
Abstractive summarization pipeline for arXiv papers using google/pegasus-pubmed. Built preprocessing pipeline (URL removal, LaTeX stripping, special-character handling) preserving domain-specific vocabulary. Trained on 1,000 papers with beam search (width 4, length penalty 0.8) on A100 (40GB) with 16-bit mixed precision.
ROUGE-1: 0.377
ROUGE-2: 0.126
ROUGE-L: 0.219
1,000 train / 100 val / 100 test
PEGASUS
PyTorch Lightning
Hugging Face Transformers
A100 / CUDA
AdamW (lr=2e-5)
16-bit Mixed Precision
RAG-BioQA
Retrieval-augmented generation framework for long-form biomedical question answering on the PubMedQA dataset. Dense retrieval via BioBERT embeddings + FAISS indexing. Re-ranking pipeline comparing BM25, ColBERT, and MonoT5. Generator fine-tuned with LoRA for parameter-efficient T5 adaptation.
BioBERT
FAISS
T5 + LoRA
ColBERT
MonoT5
BM25
PubMedQA
Graph ML
GNN Document Classification (CORA)
Document relationship modeling using Graph Neural Networks on the CORA dataset. Combined citation networks, co-authorship signals, and semantic similarity for graph construction. Implemented and compared GCN, GAT, and GraphSAGE architectures for document classification and clustering.
PyTorch Geometric
GCN
GAT
GraphSAGE
CORA Dataset
Citation Networks
Applied AI
TasteMatch: AI Dietitian Chatbot
LLM-powered personal dietitian for users managing chronic conditions like diabetes. Analyzes user preferences, kitchen inventory, and dietary restrictions to generate personalized meal recommendations. Verifies nutritional facts against established diabetes care guidelines with glycemic index verification and portion size calculations.
Ollama
FastAPI
LLMs
RAG
Diabetes Care
Conversational AI
Research Agenda
My research explores how large language models and deep learning can be applied to real-world problems in healthcare and science. I'm particularly interested in:
- Retrieval-Augmented Generation — improving factual accuracy and domain specificity in LLMs through dense retrieval and re-ranking pipelines
- Medical Image Analysis — using GANs and attention mechanisms to enhance diagnostic quality of medical imagery
- Digital Health — extracting meaningful signals from passive technology usage data to monitor functional decline in aging populations
- Graph Neural Networks — modeling complex relational data for classification and clustering tasks
Current: Digital Health Monitoring
Active
At the Emory FIT Lab, I'm extracting and analyzing Amazon Alexa voice interaction logs to identify technology engagement patterns that correlate with functional decline in older adults. Building automated data extraction pipelines with Python/Selenium and developing ML models to detect meaningful behavioral changes over time.
Python
Selenium
Digital Health
Time Series Analysis
Emory FIT Lab