icyheat23/blip-fine-tuned-2ep
Visual Question Answering
•
Updated
Mar 21, 2023
•
1
Source link
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
This is the Hugging Face repo for storing pre-trained & fine-tuned checkpoints of our Video-LLaMA, which is a multi-modal conversational large…
Model card for DePlot
Table of Contents
TL;DR
Using the model
Contribution
Citation
TL;DR
The abstract of the paper states that:
Visual language such as charts and…
Model card for Pix2Struct - Finetuned on Infographics-VQA (Visual Question Answering over high-res infographics) - large version
Table of Contents
TL;DR
Using the model
Contribution
Citation…
Model card for Pix2Struct - Finetuned on AI2D (scientific diagram VQA) - large version
Table of Contents
TL;DR
Using the model
Contribution
Citation
TL;DR
Pix2Struct…
Model card for Pix2Struct - Finetuned on OCR-VQA (Visual Question Answering over book covers)
Table of Contents
TL;DR
Using the model
Contribution
Citation
TL;DR
…
Model card for Pix2Struct - Finetuned on OCR-VQA (Visual Question Answering over book covers) - large version
Table of Contents
TL;DR
Using the model
Contribution
Citation…
Model card for Pix2Struct - Finetuned on Widget Captioning (Captioning a UI component on a screen)
Table of Contents
TL;DR
Using the model
Contribution
Citation
…
