Skip to content Skip to sidebar Skip to footer

OCR-DocVQA-Donut

Donut (base-sized model, fine-tuned on DocVQA) Donut model fine-tuned on DocVQA. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository. Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the…

git-base-vqav2

GIT (GenerativeImage2Text), base-sized, fine-tuned on VQAv2 GIT (short for GenerativeImage2Text) model, base-sized version, fine-tuned on VQAv2. It was introduced in the paper GIT: A Generative Image-to-text Transformer for Vision and Language by Wang et al. and first released in this repository. Disclaimer: The team releasing GIT did not write a model card for this…

swin-transformers

Edit model card Image classification with Swin Transformers on the 🤗Hub! Author: Kelvin Idanwekhai. Paper | Keras Tutorial Excerpt from the Tutorial: Swin Transformer (Shifted Window Transformer) can serve as a general-purpose backbone for computer vision. Swin Transformer is a hierarchical Transformer whose representations are computed with shifted windows. The shifted window scheme brings…

rl_course_vizdoom_health_gathering_supreme

Edit model card A(n) APPO model trained on the doom_health_gathering_supreme environment. This model was trained using Sample-Factory 2.0: https://github.com/alex-petrenko/sample-factory. Documentation for how to use Sample-Factory can be found at https://www.samplefactory.dev/ Downloading the model After installing Sample-Factory, download the model with: python -m sample_factory.huggingface.load_from_hub -r alperenunlu/rl_course_vizdoom_health_gathering_supreme Using the model To run the model after download,…