Model
llava-internlm2-7b is a LLaVA model fine-tuned from InternLM2-Chat-7B and CLIP-ViT-Large-patch14-336 with LLaVA-Pretrain and LLaVA-Instruct by XTuner.
Results
Model
MMBench Test (EN)
MMBench Dev (EN)
MMBench Test (CN)
MMBench Dev (CN)
CCBench…
Ziya-Visual-14B-Chat
Main Page:Fengshenbang
Github: Fengshenbang-LM
姜子牙系列模型
Ziya-LLaMA-13B-v1.1
Ziya-LLaMA-13B-v1
Ziya-LLaMA-7B-Reward
Ziya-LLaMA-13B-Pretrain-v1
软件依赖
pip install torch==1.12.1 tokenizers==0.13.3 git+https://github.com/huggingface/transformers
模型分类 Model Taxonomy
需求 Demand
任务 Task
系列 Series
模型 Model
参数…
羽人-百川7B
羽人-百川7B是基于baichuan-inc/baichuan-7B 进行多任务有监督微调的开源多模态大语言模型, 建立在 Pleisto 的以数据为中心(Data-centric AI)的工作上。羽人在多轮对话、开放域问答、角色扮演、文本生成、文本理解、图片理解等多个任务上均拥有优异的表现。
YuRen BaiChuan 7B is a multi-modal large language model based on baichuan-inc/baichuan-7B and trained with multi-task supervised fine-tuning. It is built on…
Model card for Pix2Struct - Finetuned on Doc-VQA (Visual Question Answering over scanned documents) - large version
Table of Contents
TL;DR
Using the model
Contribution
Citation…
Model card for Pix2Struct - Finetuned on AI2D (scientific diagram VQA)
Table of Contents
TL;DR
Using the model
Contribution
Citation
TL;DR
Pix2Struct is an…
Vision-and-Language Transformer (ViLT), fine-tuned on VQAv2
Vision-and-Language Transformer (ViLT) model fine-tuned on VQAv2. It was introduced in the paper ViLT: Vision-and-Language Transformer
Without Convolution or Region Supervision by Kim et…
Ziya-BLIP2-14B-Visual-v1
Main Page:Fengshenbang
Github: Fengshenbang-LM
姜子牙系列模型
Ziya-BLIP2-14B-Visual-v1
Ziya-LLaMA-13B-v1.1
Ziya-LLaMA-13B-v1
Ziya-LLaMA-7B-Reward
Ziya-LLaMA-13B-Pretrain-v1
简介 Brief Introduction
Ziya-Visual多模态大模型基于姜子牙通用大模型V1训练,具有视觉问答和对话能力。今年3月份OpenAI发布具有识图能力的多模态大模型GPT-4,遗憾的是,时至今日绝大部分用户也都还没有拿到GPT-4输入图片的权限,Ziya-Visual参考了Mini-GPT4、LLaVA等优秀的开源实现,补齐了Ziya的识图能力,使中文用户群体可以体验到结合视觉和语言两大模态的大模型的卓越能力。
The Ziya-Visual multimodal Big Model is based on the Ziya-LLaMA-13B-v1…
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Model card for BLIP trained on visual question answering- base architecture (with ViT base backbone).
Pull figure from…
