Enhanced CLIP-GPT Framework for Cross-Lingual Remote Sensing Image Captioning
TL;DR
The Enhanced CLIP-GPT framework offers a lightweight solution for cross-lingual remote sensing image captioning, overcoming the high costs and data dependency of traditional encoder-decoder methods.
Enhanced CLIP-GPT Framework for Cross-Lingual Remote Sensing Image Captioning
Rui Song; Beigeng Zhao; Lizhi Yu
https://doi.org/10.1109/ACCESS.2024.3522585
Volume 13
Remote Sensing Image Captioning (RSIC) aims to generate precise and informative descriptive text for remote sensing images using computational algorithms. Traditional “encoder-decoder” approaches face limitations due to their high training costs and heavy reliance on large-scale annotated datasets, hindering their practical applications. To address these challenges, we propose a lightweight soluti...