Henry Hengyuan Zhao

Hi👋, this is Henry. I'm a PhD student in the Show Lab at National University of Singapore, advised by Prof. Mike Zheng Shou.
I am working on creating multimodal AI assistants that understand and collaborate with humans to solve real-world problems,
with a broader interest in exploring the emerging roles of contemporary AI models in supporting human-AI and agent-computer interaction.
Research projects: I have been developing AI agent capable of interacting with computers
(WorldGUI), training large multimodal models (Genixer, LOVA3) to uncover their potential roles for enhanced intelligence,
and exploring their interactive intelligence for improved alignment (InterFeedback).
📢 News
- [03/2025] InterFeedBack is accepted by ICLR 2025 Bidirectional Human-AI Alignment Workshop.
- [02/2025] We released InterFeedBack to explore the question "Can Large Multimodal Models evolve through Interactive Human Feedback?" Check it out!
- [02/2025] Check our new preprint about WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation
- [12/2024] I will present our NeurIPS paper LOVA3 at Vancouver.
- [10/2024] I will present our ECCV paper Genixer at Milano.
- [09/2024] One paper is accpeted by NeurIPS 2024.
- [07/2024] One paper is accpeted by ECCV 2024.
- [09/2023] One paper is accpeted by IJCV 2023.
- [08/2023] One paper is accpeted by TPAMI 2023.
🌺 Research Papers
Human-AI Interaction: | |
![]() |
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal
Models via Human Feedback
ICLR 2025@Bi-Align Workshop (Oral presentation)
Can Large Multimodal Models evolve through Interactive Human Feedback?
We build a straightforward interactive framework that can bootstrap any LMM into an interactive process to solve problems. On top of this, we present InterFeedback-Bench, a benchmark for evaluating interactive intelligence of current LMMs. |
Human-Agent-Computer Interaction: | |
![]() |
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation
arxiv, 2025
Benchmark: An early work for testing GUI agents in a dynamic setting.
Agent: An effective and universal agent framwork for GUI automation building uppn critic-thinking philosophy. |
The roles of MLLMs: | |
![]() |
LOVA3: Learning to Visual Question Answering, Asking and Assessment
NeurIPS 2024
🌺 Only answering questions? Let's think about asking and assessing questions when training MLLMs? Without hyperparameter tuning or additional data annotation, consistent performance improvements are achieved!
|
![]() |
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
ECCV 2024
💡 How MLLMs perform in data generation? (We are the first work.) Take a look at using MLLMs to generate diverse multimodal data and observe the performance improvements.
|
Parameter-Efficient Tuning: | |
![]() |
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
IJCV 2023
We found that tuning only a small number of task-specific channels, referred to as salient channels, is sufficient. This work represents a remarkable reduction of 780x in parameter costs compared to its full fine-tuning counterpart.
|
Low-level Vision: | |
![]() |
Evaluating the Generalization Ability of Super-Resolution Networks
TPAMI 2023
|
![]() |
Color2Style: Real-Time Exemplar-Based Image Colorization with Self-Reference Learning and Deep Feature Modulation
arxiv, 2021
|
![]() |
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
CVPR 2021
|
![]() |
Efficient Image Super-Resolution Using Pixel Attention
ECCVW 2020
Over 400 citations
|
![]() |
A Simple and Robust Deep Convolutional Approach to Blind Image Denoising
ICCVW 2019
|
![]() |
Very Lightweight Photo Retouching Network with Conditional Sequential Modulation
TMM 2022
|