Skip to content Skip to sidebar Skip to footer

All Posts

Viewing 537-544 posts

blip-vqa-base

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Model card for BLIP trained on visual question answering- base architecture (with ViT base backbone). Pull figure from BLIP official repo TL;DR Authors from the paper write in the abstract: Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However,…

Marigold

Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation This model represents the official checkpoint of the paper titled "Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation". Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, Konrad Schindler We present Marigold, a diffusion model and associated fine-tuning protocol for monocular depth estimation. Its core principle is…

nsfw_image_detection

Edit model card Model Card: Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification Model Description Intended Uses & Limitations Intended Uses How to use Limitations Training Data Training Stats References Model Card: Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification Model Description The Fine-Tuned Vision Transformer (ViT) is a variant of…

PPO-LunarLander-v2

Edit model card PPO Agent playing LunarLander-v2 Usage (with SB3 RL Zoo) Training (with the RL Zoo) Hyperparameters Environment Arguments PPO Agent playing LunarLander-v2 This is a trained model of a PPO agent playing LunarLander-v2 using the stable-baselines3 library and the RL Zoo. The RL Zoo is a training framework for Stable Baselines3 reinforcement learning agents, with…

octo-base

Edit model card Octo Base Octo Base See https://github.com/octo-models/octo for instructions for using this model. Octo Base is trained with a window size of 2, predicting 7-dimensional actions 4 steps into the future using a diffusion policy. The model is a Transformer with 93M parameters (equivalent to a ViT-B). Images are tokenized by…

speecht5_tts

Edit model card SpeechT5 (TTS task) Model Description Model Sources [optional] Uses 🤗 Transformers Usage Fine-tuning the Model Direct Use Downstream Use [optional] Out-of-Scope Use Bias, Risks, and Limitations Recommendations Training Details Training Data Training Procedure Preprocessing [optional] Training hyperparameters Speeds, Sizes, Times [optional] Evaluation Testing Data, Factors & Metrics Testing Data Factors Metrics Results Summary …