Portrait of Henry Hengyuan Zhao

Henry Hengyuan Zhao

Hi 👋, I'm Henry, a PhD student in the Show Lab at the National University of Singapore, advised by Prof. Mike Zheng Shou.
I work on multimodal post-training and AI agents, with a focus on model roles/behavior, reasoning faithfulness.
Selected work: chart-to-code (Chart2Code), agent–computer interaction (WorldGUI), large multimodal model training (Genixer, LOVA3), and interactive alignment/evaluation (InterFeedback).

📢 News

🌺 Selected Papers

Multimodal Reward Model:
SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification
Xiangyu Zhao, Henry Hengyuan Zhao, Yiheng Wang, Wanghan Xu, Yuhao Zhou, Qinglong Cao, Zhiwang Zhou, Lei Bai, Wenlong Zhang, Xiao-Ming Wu
KDD 2026
We construct SCIPRM70K, a large-scale dataset featuring Chain-of-Tool trajectories that explicitly interleave reasoning with the execution of scientific tools. Building upon this, we train an efficient reward model called Sci-PRM to provide fine-grained supervision on tool selection, execution accuracy, and result interpretation at each step in one inference.
Multimodal Coding:
From Charts to Code: A Hierarchical Benchmark for Multimodal Models
Jiahao Tang*, Henry Hengyuan Zhao* (Project Lead), Lijian Wu*, Zijian Zhang, Yifei Tao, Dongxing Mao, Yang Wan, Jingru Tan, Min Zeng, Min Li, Alex Jinpeng Wang *Equal contribution
ACL 2026 Main
A challenging chart-to-code benchmark with three hierarchical tasks.
Multimodal Model Faithfulness:
InterFeedback figure
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
Henry Hengyuan Zhao*, Wenqi Pei*, Yifei Tao*, Haiyang Mei, Mike Zheng Shou *Equal contribution
EMNLP 2025 Findings
ICLR 2025 @ Bi-Align Workshop (Oral)
We study whether LMMs can evolve via interactive human feedback and find: (1) Accuracy may not fully capture intelligence; (2) LMMs may cater to humans; (3) Low-quality feedback can hurt more than simple binary feedback.
Human-Agent-Computer Interaction:
WorldGUI figure
WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point
Henry Hengyuan Zhao, Kaiming Yang, Wendi Yu, Difei Gao, Mike Zheng Shou
ACL 2025 @ REALM Workshop
The first GUI benchmark targeting dynamic planning and action-event detection for desktop automation.
The Roles of MLLMs:
LOVA3 figure
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao, Pan Zhou, Difei Gao, Zechen Bai, Mike Zheng Shou
NeurIPS 2024
Beyond answering: we introduce asking and assessing during training and obtain consistent gains without extra annotation or hyper-parameter tuning.
Genixer figure
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou
ECCV 2024
First to examine MLLMs for data generation—showing diverse data synthesis and measurable downstream gains.
Parameter-Efficient Tuning:
SCT figure
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou
IJCV 2023
Tuning only a small set of salient channels achieves up to 780× parameter reduction vs. full fine-tuning.