Henry Hengyuan Zhao

Hi👋, this is Henry. I'm a PhD student in the Show Lab at National University of Singapore, advised by Prof. Mike Zheng Shou.
I am working on creating multimodal AI assistants that understand and collaborate with humans to solve real-world problems, with a broader interest in exploring the emerging roles of contemporary AI models in supporting human-AI and agent-computer interaction.
Research projects: I have been developing AI agent capable of interacting with computers (WorldGUI), training large multimodal models (Genixer, LOVA3) to uncover their potential roles for enhanced intelligence, and exploring their interactive intelligence for improved alignment (InterFeedback).

📢 News

🌺 Research Papers

Human-AI Interaction:
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
Henry Hengyuan Zhao*, Wenqi Pei*, Yifei Tao*, Haiyang Mei, Mike Zheng Shou *Equal contribution
ICLR 2025@Bi-Align Workshop (Oral presentation)
Can Large Multimodal Models evolve through Interactive Human Feedback?
We build a straightforward interactive framework that can bootstrap any LMM into an interactive process to solve problems. On top of this, we present InterFeedback-Bench, a benchmark for evaluating interactive intelligence of current LMMs.
Human-Agent-Computer Interaction:
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation
Henry Hengyuan Zhao, Difei Gao, Mike Zheng Shou
arxiv, 2025
Benchmark: An early work for testing GUI agents in a dynamic setting.
Agent: An effective and universal agent framwork for GUI automation building uppn critic-thinking philosophy.
The roles of MLLMs:
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao, Pan Zhou, Difei Gao, Zechen Bai, Mike Zheng Shou
NeurIPS 2024
🌺 Only answering questions? Let's think about asking and assessing questions when training MLLMs? Without hyperparameter tuning or additional data annotation, consistent performance improvements are achieved!
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou
ECCV 2024
💡 How MLLMs perform in data generation? (We are the first work.) Take a look at using MLLMs to generate diverse multimodal data and observe the performance improvements.
Parameter-Efficient Tuning:
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou
IJCV 2023
We found that tuning only a small number of task-specific channels, referred to as salient channels, is sufficient. This work represents a remarkable reduction of 780x in parameter costs compared to its full fine-tuning counterpart.
Low-level Vision:
Evaluating the Generalization Ability of Super-Resolution Networks
Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, Chao Dong
TPAMI 2023
Color2Style: Real-Time Exemplar-Based Image Colorization with Self-Reference Learning and Deep Feature Modulation
Hengyuan Zhao, Wenhao Wu, Yihao Liu, Dongliang He.
arxiv, 2021
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
Xiangtao Kong, Hengyuan Zhao, Yu Qiao, Chao Dong
CVPR 2021
Efficient Image Super-Resolution Using Pixel Attention
Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong
ECCVW 2020
Over 400 citations
A Simple and Robust Deep Convolutional Approach to Blind Image Denoising
Hengyuan Zhao, Wenze Shao, Bingkun Bao, Haibo Li
ICCVW 2019
Very Lightweight Photo Retouching Network with Conditional Sequential Modulation
Yihao Liu, Jingwen He, Xiangyu Chen, Zhengwen Zhang, Hengyuan Zhao, Chao Dong, Yu Qiao
TMM 2022