Skip to content Skip to sidebar Skip to footer

All Posts

Viewing 457-464 posts

TinyLLaVA-1.5B

TinyLLaVA: A Framework of Small-scale Large Multimodal Models 🎉 News [2024.02.25] Update evaluation scripts and docs! [2024.02.25] Data descriptions out. Release TinyLLaVA-1.5B and TinyLLaVA-2.0B! [2024.02.24] Example code on inference and model loading added! [2024.02.23] Evaluation code and scripts released! [2024.02.21] Creating the TinyLLaVABench…

yuren-baichuan-7b

羽人-百川7B 羽人-百川7B是基于baichuan-inc/baichuan-7B 进行多任务有监督微调的开源多模态大语言模型, 建立在 Pleisto 的以数据为中心(Data-centric AI)的工作上。羽人在多轮对话、开放域问答、角色扮演、文本生成、文本理解、图片理解等多个任务上均拥有优异的表现。 YuRen BaiChuan 7B is a multi-modal large language model based on baichuan-inc/baichuan-7B and trained with multi-task supervised fine-tuning. It is built on top of Pleisto's data-centric AI work. YuRen has excellent performance on multi-turn dialogue, open-domain question answering, role-playing, text generation, text understanding, image understanding and other…

depth_anything_vits14

Depth Anything model, small The model card for our paper Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. You may also try our demo and visit our project page. Installation First, install the Depth Anything package: git clone https://github.com/LiheYoung/Depth-Anything cd Depth-Anything pip install -r requirements.txt Usage Here's how to run the model:…

MediaPipe-Pose-Estimation

Edit model card MediaPipe-Pose-Estimation: Optimized for Mobile Deployment Detect and track human body poses in real-time images and video streams Model Details Installation Configure Qualcomm® AI Hub to run this model on a cloud-hosted device Demo off target Run model on a cloud-hosted device How does this work? Deploying compiled model to Android View on Qualcomm®…

vc1-large

Edit model card Model Card: VC-1 (Visual Cortex ViT-Large) Last updated: 2023-04-07 Version: 1.0 Code: https://github.com/facebookresearch/eai-vc Other Links: VC-1 Website, VC-1 Blogpost, VC-1 Paper, VC-1 Demo The VC-1 model is a vision transformer (ViT) pre-trained on over 4,000 hours of egocentric videos from 7 different sources, together with ImageNet. The model is trained using Masked Auto-Encoding (MAE) and…

seamless-streaming

Edit model card SeamlessStreaming SeamlessStreaming models Evaluating SeamlessStreaming models Seamless Streaming demo Running on HF spaces Running locally Install backend seamless_server dependencies Install frontend streaming-react-app dependencies Running the server Debuging Citation SeamlessStreaming SeamlessStreaming is a multilingual streaming translation model. It supports: Streaming Automatic Speech Recognition on 96 languages. Simultaneous translation on 101…