TinyLLaVA: A Framework of Small-scale Large Multimodal Models
🎉 News
[2024.02.25] Update evaluation scripts and docs!
[2024.02.25] Data descriptions out. Release TinyLLaVA-1.5B and TinyLLaVA-2.0B!
[2024.02.24] Example code on inference and model loading added!
[2024.02.23] Evaluation code and scripts released!
[2024.02.21] Creating the TinyLLaVABench…
layoutlmv2-base-uncased-finetuned-vi-infovqa
This model is a fine-tuned version of microsoft/layoutlmv2-base-uncased on an unknown dataset.
It achieves the following results on the evaluation set:
Loss: 4.3332
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training…
羽人-百川7B
羽人-百川7B是基于baichuan-inc/baichuan-7B 进行多任务有监督微调的开源多模态大语言模型, 建立在 Pleisto 的以数据为中心(Data-centric AI)的工作上。羽人在多轮对话、开放域问答、角色扮演、文本生成、文本理解、图片理解等多个任务上均拥有优异的表现。
YuRen BaiChuan 7B is a multi-modal large language model based on baichuan-inc/baichuan-7B and trained with multi-task supervised fine-tuning. It is built on top of Pleisto's data-centric AI work. YuRen has excellent performance on multi-turn dialogue, open-domain question answering, role-playing, text generation, text understanding, image understanding and other…
Depth Anything model, small
The model card for our paper Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data.
You may also try our demo and visit our project page.
Installation
First, install the Depth Anything package:
git clone https://github.com/LiheYoung/Depth-Anything
cd Depth-Anything
pip install -r requirements.txt
Usage
Here's how to run the model:…
Edit model card
MediaPipe-Pose-Estimation: Optimized for Mobile Deployment
Detect and track human body poses in real-time images and video streams
Model Details
Installation
Configure Qualcomm® AI Hub to run this model on a cloud-hosted device
Demo off target
Run model on a cloud-hosted device
How does this work?
Deploying compiled model to Android
View on Qualcomm®…
Edit model card
Reinforce Agent playing CartPole-v1
Reinforce Agent playing CartPole-v1
This is a trained model of a Reinforce agent playing CartPole-v1 .
Source link
Edit model card
Model Card: VC-1 (Visual Cortex ViT-Large)
Last updated: 2023-04-07
Version: 1.0
Code: https://github.com/facebookresearch/eai-vc
Other Links: VC-1 Website, VC-1 Blogpost, VC-1 Paper, VC-1 Demo
The VC-1 model is a vision transformer (ViT) pre-trained on over 4,000 hours of egocentric videos from 7 different sources, together with ImageNet. The model is trained using Masked Auto-Encoding (MAE) and…
Edit model card
SeamlessStreaming
SeamlessStreaming models
Evaluating SeamlessStreaming models
Seamless Streaming demo
Running on HF spaces
Running locally
Install backend seamless_server dependencies
Install frontend streaming-react-app dependencies
Running the server
Debuging
Citation
SeamlessStreaming
SeamlessStreaming is a multilingual streaming translation model. It supports:
Streaming Automatic Speech Recognition on 96 languages.
Simultaneous translation on 101…
