Overview

The Workshop on Multimodal Superintelligence is a global gathering of researchers, engineers, and visionaries committed to accelerating progress in open-source multimodal intelligence. This initiative is designed to be both collaborative and competitive, encouraging breakthroughs at the intersection of vision, language, audio, and 3D. It is a unique event that invites all researchers from different disciplines and applications across multimodal learning to participate.

The Workshop

The workshop invites the broader scientific community to contribute. Research areas of focus are highlighted below:

Accepted submissions will be featured on the website and presented during the culminating symposium.

The Grand Challenge - Language, Vision, Audio, 3D

A challenge (ending Dec. 10, 2025) to build the blueprints of some of the most capable open-source multimodal superintelligence systems out there. Focused on enabling ideas and pushing open source forward in the first stage (Dec 10). Winners of the first stage can have their models further supported by Lambda and trained for public foundation model release.

This is not just a competition — it's a movement to build AI that sees, hears, reads, speaks, and reasons.

Let's Build the Future of Multimodal Machine Learning — Together.