GIT (GenerativeImage2Text), large-sized, fine-tuned on TextVQA
GIT (short for GenerativeImage2Text) model, large-sized version, fine-tuned on TextVQA. It was introduced in the paper GIT: A Generative Image-to-text Transformer for Vision…
LapDepth-release
This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals"
Minsoo Song, Seokjae Lim, and Wonjun Kim* IEEE Transactions…
Edit model card
V2.5 Model :
Fine tune of my V2 model on all CommonVoice dataset (517k sample) on 2.5k step (batch size 200), Voice cloning has improved a bit but…
How to Get Started with the Model
To use the model through Hosted inference API, follow the code snippet provided below:
from transformers import BertTokenizer, BertForSequenceClassification
def …
UDOP model
The UDOP model was proposed in Unifying Vision, Text, and Layout for Universal Document Processing by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang…
GIT (GenerativeImage2Text), large-sized, fine-tuned on VQAv2
GIT (short for GenerativeImage2Text) model, large-sized version, fine-tuned on VQAv2. It was introduced in the paper GIT: A Generative Image-to-text Transformer for Vision…
LapDepth-release
This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals"
Minsoo Song, Seokjae Lim, and Wonjun Kim* IEEE Transactions…
Edit model card
persian-tts-female-vits
Uses
How to Get Started with the Model
persian-tts-female-vits
persian-tts-female vits model for text to speech purposes.
Persian فارسی
Single-speaker female voice
Trained…
