Henry Hengyuan Zhao

HiπŸ‘‹, this is Henry. I'm a PhD student in the Show Lab at National University of Singapore, advised by Prof. Mike Zheng Shou.
I am working on creating multimodal AI assistants that understand and collaborate with humans to solve real-world problems, with a broader interest in exploring the emerging roles of contemporary AI models in supporting human-AI and agent-computer interaction.
Research projects: I have been developing AI agent capable of interacting with computers (WorldGUI), training large multimodal models (Genixer, LOVA3) to uncover their potential roles for enhanced intelligence, and exploring their interactive intelligence for improved alignment (InterFeedback).

πŸ“’ News

🌺 Research Papers

Human-AI Interaction:
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
Henry Hengyuan Zhao*, Wenqi Pei*, Yifei Tao*, Haiyang Mei, Mike Zheng Shou *Equal contribution
ICLR 2025@Bi-Align Workshop (Oral presentation)
Can Large Multimodal Models evolve through Interactive Human Feedback? We found that (1) Accuracy may not fully reflect the models's intelligence; (2) LMMs may cater humans; (3) Low-quality feedback can degrade performance more than simply providing binary (0/1) feedback.
Human-Agent-Computer Interaction:
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation
Henry Hengyuan Zhao, Difei Gao, Mike Zheng Shou
arxiv, 2025
Benchmark: An early work for testing GUI agents in a dynamic setting.
Agent: An effective and universal agent framework for GUI automation building uppn critic-thinking philosophy.
The roles of MLLMs:
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao, Pan Zhou, Difei Gao, Zechen Bai, Mike Zheng Shou
NeurIPS 2024
🌺 Only answering questions? Let's think about asking and assessing questions when training MLLMs? Without hyperparameter tuning or additional data annotation, consistent performance improvements are achieved!
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou
ECCV 2024
πŸ’‘ How MLLMs perform in data generation? (We are the first work.) Take a look at using MLLMs to generate diverse multimodal data and observe the performance improvements.
Parameter-Efficient Tuning:
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou
IJCV 2023
We found that tuning only a small number of task-specific channels, referred to as salient channels, is sufficient. This work represents a remarkable reduction of 780x in parameter costs compared to its full fine-tuning counterpart.
Low-level Vision:
Evaluating the Generalization Ability of Super-Resolution Networks
Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, Chao Dong
TPAMI 2023
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
Xiangtao Kong, Hengyuan Zhao, Yu Qiao, Chao Dong
CVPR 2021
Efficient Image Super-Resolution Using Pixel Attention
Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong
ECCVW 2020
Over 400 citations