Research Themes
Our core research themes focus on building adaptable AI systems that learn from multiple modalities, perform under resource constraints, and remain reliable in practice.
Multimodal Learning
We study methods that jointly represent and reason over images, text, and structured inputs. This enables richer task understanding, better generalization, and stronger cross-modal retrieval.
Selected publications:
-
ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models (ECCV 2026)
-
Organizing Unstructured Image Collections using Natural Language (CVPR Findings 2026)
-
Democratizing Fine-grained Visual Recognition with Large Language Models (ICLR 2024)
Learning with Limited Resources
Our work targets models that are both data-efficient and compute-aware, allowing intelligent systems to work well in constrained environments and on edge devices.
Selected publications:
-
Predict-then-Diffuse: Adaptive Response Length for Compute-Budgeted Inference in Diffusion LLMs (IJCNN 2026)
-
FedMVP: Federated Multi-modal Visual Prompt Tuning for Vision-Language Models (ICCV 2025)
-
Less is more: Summarizing Patch Tokens for efficient Multi-Label Class-Incremental Learning (CoLLAs 2024)
Trustworthy AI
We investigate robustness, privacy, and predictive uncertainty of vision-language systems, ensuring model predictions remain reliable across distribution shifts and safety-critical use cases.
Selected publications:
-
Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers (ICLR 2026)
-
How (Mis) calibrated is Your Federated CLIP and What To Do About It? (pre-print, 2026)
-
Group-robust Machine Unlearning (TMLR 2025)