Model card for Pix2Struct - Finetuned on Doc-VQA (Visual Question Answering over scanned documents)
Table of Contents
TL;DR
Using the model
Contribution
Citation
TL;DR
…
Model card for Pix2Struct - Finetuned on ChartQA (Visual Question Answering over charts)
Table of Contents
TL;DR
Using the model
Contribution
Citation
TL;DR
Pix2Struct…
Model card for Pix2Struct - Finetuned on Widget Captioning (Captioning a UI component on a screen) - large version
Table of Contents
TL;DR
Using the model…
Sharded BLIP-2 Model Card - flan-t5-xl
This is a sharded version of the blip2-flan-t5-xl which leverages Flan T5-xl for image-to-text tasks such as image captioning and visual…
NhatDFO/sf_blip2
Visual Question Answering
•
Updated
Feb 23, 2023
•
9
•
1
Source link
This is an unoffical mirror of the model weights for use with https://github.com/OFA-Sys/OFA
The original link is too slow when downloading from outside of China...
Source link
Source link
GIT (GenerativeImage2Text), large-sized, fine-tuned on TextVQA
GIT (short for GenerativeImage2Text) model, large-sized version, fine-tuned on TextVQA. It was introduced in the paper GIT: A Generative Image-to-text Transformer for Vision…
