Portrait of Henry Hengyuan Zhao

Henry Hengyuan Zhao

Hi 👋, I'm Henry, a PhD student in the Show Lab at the National University of Singapore, advised by Prof. Mike Zheng Shou.
I train Large Multimodal Models (LMMs) and develop agents/benchmarks for GUI automation and chart-to-code.
Selected work: agent–computer interaction (WorldGUI), large multimodal model training (Genixer, LOVA3), and interactive alignment/evaluation (InterFeedback).

📢 News

🌺 Research Papers

Multimodal Model Alignment:
InterFeedback figure
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
Henry Hengyuan Zhao*, Wenqi Pei*, Yifei Tao*, Haiyang Mei, Mike Zheng Shou *Equal contribution
EMNLP 2025 Findings
ICLR 2025 @ Bi-Align Workshop (Oral)
We study whether LMMs can evolve via interactive human feedback and find: (1) Accuracy may not fully capture intelligence; (2) LMMs may cater to humans; (3) Low-quality feedback can hurt more than simple binary feedback.
Human-Agent-Computer Interaction:
WorldGUI figure
WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point
Henry Hengyuan Zhao, Kaiming Yang, Wendi Yu, Difei Gao, Mike Zheng Shou
ACL 2025 @ REALM Workshop
The first GUI benchmark targeting dynamic planning and action-event detection for desktop automation.
The Roles of MLLMs:
LOVA3 figure
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao, Pan Zhou, Difei Gao, Zechen Bai, Mike Zheng Shou
NeurIPS 2024
Beyond answering: we introduce asking and assessing during training and obtain consistent gains without extra annotation or hyper-parameter tuning.
Genixer figure
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou
ECCV 2024
First to examine MLLMs for data generation—showing diverse data synthesis and measurable downstream gains.
Parameter-Efficient Tuning:
SCT figure
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou
IJCV 2023
Tuning only a small set of salient channels achieves up to 780× parameter reduction vs. full fine-tuning.
Low-level Vision:
SRGA figure
Evaluating the Generalization Ability of Super-Resolution Networks
Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, Chao Dong
TPAMI 2023
ClassSR figure
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
Xiangtao Kong, Hengyuan Zhao, Yu Qiao, Chao Dong
CVPR 2021
PAN figure
Efficient Image Super-Resolution Using Pixel Attention
Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong
ECCVW 2020
Over 400 citations.