IconGEMeX
A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis

Bo Liu 1, Ke Zou 2 3, Liming Zhan 1, Zexin Lu 1, Xiaoyu Dong 1, Yidi Chen 4, Chengqiang Xie 1, Jiannong Cao 1, Xiao-Ming Wu * 1, Huazhu Fu * 5
1 The Hong Kong Polytechnic University, Hong Kong, 2 National University of Singapore, Singapore,
3 Sichuan University, China, 4 West China Hospital of Sichuan University, China,
5 Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore.


*Corresponding authors

Any questions: bokelvin.liu@connect.polyu.hk
Overview Image

Illustration of the proposed pipeline for constructing GEMeX, with two main stages.
Stage I involves cleaning data from Chest ImaGenome, while Stage II designs prompts to enable GPT-4o to generate a large-scale, groundable, and explainable VQA dataset.

🌟Contributions

1. We introduce GEMeX, a large-scale Med-VQA dataset for chest X-rays, designed to support diverse question types and provide enhanced explainability for medical VQA systems. To our knowledge, it is the largest chest X-ray VQA dataset and the first Med-VQA dataset to embody the concept of multimodal explainability.
2. We systematically benchmark 10 representative LVLMs using GEMeX, introducing multiple evaluation metrics to comprehensively demonstrate the performance of current popular LVLMs on the Med-VQA task.
3. We show that our proposed precise vision-text explainability notably enhances the visual reasoning ability of LVLMs through fine-tuning, addressing a key deficiency observed in various models. We highlight the importance of a large-scale, groundable, and explainable VQA benchmark for advancing the development and deployment of LVLMs in healthcare.

BibTeX

BibTex Code Here