Website

Category

Next app

WIT by Google AI

A Wikipedia-Based Image Text Dataset For Multimodal Multilingual Machine Learning

About WIT by Google AI

WIT (Wikipedia-based Image Text) Dataset is a huge collection of data consisting of 37 million+ image-text pairs across 100+ languages. It was developed to help machines learn to identify the relationship between images and words.

Motivation

Research into multimodal visio-linguistic models requires a large dataset to successfully apply this technology. By creating WIT, Google AI seeks to provide an expansive dataset that goes beyond English language capabilities and has the potential to achieve breakthroughs in multilingual understanding through images.

Therefore, WIT was designed to be a high quality dataset with rigorous filtering applied. It encompasses 37.6 million image-text sets and covers 108 languages, with 12K+ examples for each language (53 of them have over 100K image-text pairs).

Sljf

WIT by Google AI screenshots

WIT by Google AI - screen 1

WIT by Google AI video

Read in Ukrainian or Ru