Microsoft Kosmos-1
A Multimodal Large Language Model
About Microsoft Kosmos-1
Microsoft has recently developed Kosmos-1, a powerful multimodal large language model. It is able to respond to language prompts as well as visual cues, and can be used for a variety of tasks such as image captioning, visual question answering, and more. Kosmos-1 is able to take image and audio inputs, which allows it to advance past ChatGPT's text-only prompts.
The KOSMOS-1 model is built to support language, perception-language, and vision activities. Microsoft trained the model using large webscale datasets that include text data, image-text pairings, and interleaved pictures and words. The KOSMOS-1 model is able to handle perception-intensive tasks and natural language tasks, such as visual dialogue, visual explanation, visible question answering, image captioning, simple math equations, OCR, and zero-shot image classification with descriptions.
Source: https://www.zdnet.com/article/now-microsoft-has-a-new-ai-model-kosmos-1/