Henry Hengyuan Zhao

Hi 👋, I'm Henry, a PhD student in the Show Lab at the National University of Singapore, advised by Prof. Mike Zheng Shou.

I work on multimodal post-training and AI agents, with a focus on model roles/behavior, reasoning faithfulness.

Selected work: chart-to-code (Chart2Code), agent–computer interaction (WorldGUI), large multimodal model training (Genixer, LOVA3), and interactive alignment/evaluation (InterFeedback).

Email GitHub Google Scholar @ZHHHYuan

📢 News

[05/2026] Sci-PRM is accepted by KDD 2026.
[04/2026] Chart2Code is accepted by ACL 2026 Main.
[08/2025] InterFeedBack is accepted by EMNLP 2025 Findings.
[03/2025] InterFeedBack is accepted by ICLR 2025 Bidirectional Human-AI Alignment Workshop.
[02/2025] Check our new preprint about WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation
[12/2024] I will present our NeurIPS paper LOVA3 at Vancouver.
[10/2024] I will present our ECCV paper Genixer at Milano.
[09/2024] One paper is accpeted by NeurIPS 2024.
[07/2024] One paper is accpeted by ECCV 2024.
[09/2023] One paper is accpeted by IJCV 2023.
[08/2023] One paper is accpeted by TPAMI 2023.

🌺 Selected Papers

Multimodal Reward Model:
	SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification Xiangyu Zhao, Henry Hengyuan Zhao, Yiheng Wang, Wanghan Xu, Yuhao Zhou, Qinglong Cao, Zhiwang Zhou, Lei Bai, Wenlong Zhang, Xiao-Ming Wu KDD 2026 We construct SCIPRM70K, a large-scale dataset featuring Chain-of-Tool trajectories that explicitly interleave reasoning with the execution of scientific tools. Building upon this, we train an efficient reward model called Sci-PRM to provide fine-grained supervision on tool selection, execution accuracy, and result interpretation at each step in one inference. Paper
Multimodal Coding:
	From Charts to Code: A Hierarchical Benchmark for Multimodal Models Jiahao Tang, Henry Hengyuan Zhao (Project Lead)*, Lijian Wu, Zijian Zhang, Yifei Tao, Dongxing Mao, Yang Wan, Jingru Tan, Min Zeng, Min Li, Alex Jinpeng Wang *Equal contribution ACL 2026 Main A challenging chart-to-code benchmark with three hierarchical tasks. Paper \| Codes \| 🔥 Dataset (3K Downloads One Month) \| Project Page
Multimodal Model Faithfulness:
	InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback Henry Hengyuan Zhao, Wenqi Pei, Yifei Tao, Haiyang Mei, Mike Zheng Shou Equal contribution EMNLP 2025 Findings ICLR 2025 @ Bi-Align Workshop (Oral) We study whether LMMs can evolve via interactive human feedback and find: (1) Accuracy may not fully capture intelligence; (2) LMMs may cater to humans; (3) Low-quality feedback can hurt more than simple binary feedback. Paper \| [新智元] \| Dataset \| AK's Retweet
GUI Interaction:
	WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point Henry Hengyuan Zhao, Kaiming Yang, Wendi Yu, Difei Gao, Mike Zheng Shou ACL 2025 @ REALM Workshop The first GUI benchmark targeting dynamic planning and action-event detection for desktop automation. Paper \| Codes \| Project Page
The Roles of MLLMs:
	LOVA3: Learning to Visual Question Answering, Asking and Assessment Henry Hengyuan Zhao, Pan Zhou, Difei Gao, Zechen Bai, Mike Zheng Shou NeurIPS 2024 Beyond answering: we introduce asking and assessing during training and obtain consistent gains without extra annotation or hyper-parameter tuning. Paper \| Codes
	Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou ECCV 2024 First to examine MLLMs for data generation—showing diverse data synthesis and measurable downstream gains. Paper \| Codes
Parameter-Efficient Tuning:
	SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou IJCV 2023 Tuning only a small set of salient channels achieves up to 780× parameter reduction vs. full fine-tuning. Paper \| Codes