Henry Hengyuan Zhao

Hi👋, this is Henry. I'm a PhD student in the Show Lab at National University of Singapore, advised by Prof. Mike Zheng Shou.

I am working on creating multimodal AI assistants that understand and collaborate with humans to solve real-world problems, with a broader interest in exploring the emerging roles of contemporary AI models in supporting human-AI and agent-computer interaction.

Research projects: I have been developing AI agent capable of interacting with computers (WorldGUI), training large multimodal models (Genixer, LOVA3) to uncover their potential roles for enhanced intelligence, and exploring their interactive intelligence for improved alignment (InterFeedback).

Links: Email / GitHub / Google Scholar / @ZHHHYuan

📢 News

[03/2025] InterFeedBack is accepted by ICLR 2025 Bidirectional Human-AI Alignment Workshop.
[02/2025] We released InterFeedBack to explore the question "Can Large Multimodal Models evolve through Interactive Human Feedback?" Check it out!
[02/2025] Check our new preprint about WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation
[12/2024] I will present our NeurIPS paper LOVA3 at Vancouver.
[10/2024] I will present our ECCV paper Genixer at Milano.
[09/2024] One paper is accpeted by NeurIPS 2024.
[07/2024] One paper is accpeted by ECCV 2024.
[09/2023] One paper is accpeted by IJCV 2023.
[08/2023] One paper is accpeted by TPAMI 2023.

🌺 Research Papers

Human-AI Interaction:
	InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback Henry Hengyuan Zhao, Wenqi Pei, Yifei Tao, Haiyang Mei, Mike Zheng Shou Equal contribution ICLR 2025@Bi-Align Workshop (Oral presentation) Can Large Multimodal Models evolve through Interactive Human Feedback? We found that (1) Accuracy may not fully reflect the models's intelligence; (2) LMMs may cater humans; (3) Low-quality feedback can degrade performance more than simply providing binary (0/1) feedback. Paper \| [新智元] \| Dataset \| AK's Retweet
Human-Agent-Computer Interaction:
	WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point Henry Hengyuan Zhao, Kaiming Yang, Wendi Yu, Difei Gao, Mike Zheng Shou arxiv, 2025 Benchmark: An early work for testing GUI agents in a dynamic setting. Agent: An effective and universal agent framework for GUI automation building uppn critic-thinking philosophy. Paper \| Codes \| Project Pages
The roles of MLLMs:
	LOVA3: Learning to Visual Question Answering, Asking and Assessment Henry Hengyuan Zhao, Pan Zhou, Difei Gao, Zechen Bai, Mike Zheng Shou NeurIPS 2024 🌺 Only answering questions? Let's think about asking and assessing questions when training MLLMs? Without hyperparameter tuning or additional data annotation, consistent performance improvements are achieved! Paper \| Codes
	Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou ECCV 2024 💡 How MLLMs perform in data generation? (We are the first work.) Take a look at using MLLMs to generate diverse multimodal data and observe the performance improvements. Paper \| Codes
Parameter-Efficient Tuning:
	SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou IJCV 2023 We found that tuning only a small number of task-specific channels, referred to as salient channels, is sufficient. This work represents a remarkable reduction of 780x in parameter costs compared to its full fine-tuning counterpart. Paper \| Codes
Low-level Vision:
	Evaluating the Generalization Ability of Super-Resolution Networks Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, Chao Dong TPAMI 2023 Paper \| Codes
	ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic Xiangtao Kong, Hengyuan Zhao, Yu Qiao, Chao Dong CVPR 2021 Paper \| Codes
	Efficient Image Super-Resolution Using Pixel Attention Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong ECCVW 2020 Over 400 citations Paper \| Codes

Human-AI Interaction:
	InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback Henry Hengyuan Zhao, Wenqi Pei, Yifei Tao, Haiyang Mei, Mike Zheng Shou Equal contribution ICLR 2025@Bi-Align Workshop (Oral presentation) Can Large Multimodal Models evolve through Interactive Human Feedback? We found that (1) Accuracy may not fully reflect the models's intelligence; (2) LMMs may cater humans; (3) Low-quality feedback can degrade performance more than simply providing binary (0/1) feedback. Paper \| [新智元] \| Dataset \| AK's Retweet
Human-Agent-Computer Interaction:
	WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point Henry Hengyuan Zhao, Kaiming Yang, Wendi Yu, Difei Gao, Mike Zheng Shou arxiv, 2025 Benchmark: An early work for testing GUI agents in a dynamic setting. Agent: An effective and universal agent framework for GUI automation building uppn critic-thinking philosophy. Paper \| Codes \| Project Pages
The roles of MLLMs:
	LOVA3: Learning to Visual Question Answering, Asking and Assessment Henry Hengyuan Zhao, Pan Zhou, Difei Gao, Zechen Bai, Mike Zheng Shou NeurIPS 2024 🌺 Only answering questions? Let's think about asking and assessing questions when training MLLMs? Without hyperparameter tuning or additional data annotation, consistent performance improvements are achieved! Paper \| Codes
	Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou ECCV 2024 💡 How MLLMs perform in data generation? (We are the first work.) Take a look at using MLLMs to generate diverse multimodal data and observe the performance improvements. Paper \| Codes
Parameter-Efficient Tuning:
	SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou IJCV 2023 We found that tuning only a small number of task-specific channels, referred to as salient channels, is sufficient. This work represents a remarkable reduction of 780x in parameter costs compared to its full fine-tuning counterpart. Paper \| Codes
Low-level Vision:
	Evaluating the Generalization Ability of Super-Resolution Networks Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, Chao Dong TPAMI 2023 Paper \| Codes
	ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic Xiangtao Kong, Hengyuan Zhao, Yu Qiao, Chao Dong CVPR 2021 Paper \| Codes
	Efficient Image Super-Resolution Using Pixel Attention Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong ECCVW 2020 Over 400 citations Paper \| Codes