GIT (GenerativeImage2Text), large-sized, fine-tuned on VQAv2
GIT (short for GenerativeImage2Text) model, large-sized version, fine-tuned on VQAv2. It was introduced in the paper GIT: A Generative Image-to-text Transformer for Vision…
tufa15nik/vilt-finetuned-vqasi
Visual Question Answering
•
Updated
Dec 15, 2022
•
13
Source link
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Model card for BLIP trained on visual question answering - large architecture (with ViT large backbone).
Pull figure…
GIT (GenerativeImage2Text), base-sized, fine-tuned on TextVQA
GIT (short for GenerativeImage2Text) model, base-sized version, fine-tuned on TextVQA. It was introduced in the paper GIT: A Generative Image-to-text Transformer for Vision…
GIT (GenerativeImage2Text), base-sized, fine-tuned on VQAv2
GIT (short for GenerativeImage2Text) model, base-sized version, fine-tuned on VQAv2. It was introduced in the paper GIT: A Generative Image-to-text Transformer for Vision…
Bingsu/temp_vilt_vqa
Visual Question Answering
•
Updated
Nov 28, 2022
•
15
Source link
azwierzc/vilt-b32-finetuned-vqa-pl
Visual Question Answering
•
Updated
Mar 21, 2022
•
25
Source link
Model
llava-clip-internlm2-1_8b-pretrain-v1 is a LLaVA checkpoint finetuned from internlm2-chat-1_8b and CLIP-ViT-Large-patch14-336 with LLaVA-Pretrain and LLaVA-Instruct-150K by Xtuner. The pretraining phase took 16 hours on a single Nvidia A6000 ada…
