Chinese companies continue to release AI models that rival the capabilities of systems developed by OpenAI and other U.S.-based AI companies.
This week, MiniMaxa startup supported by Alibaba and Tencent which has raised approximately $850 million in venture capital and is valued at over $2.5 billion, made his debut three new models: MiniMax-Text-01, MiniMax-VL-01 and T2A-01-HD. MiniMax-Text-01 is a text-only model, while MiniMax-VL-01 can understand both images and text. The T2A-01-HD, on the other hand, generates audio, specifically speech.
MiniMax claims that MiniMax-Text-01, which has a size of 456 billion parameters, performs better than models such as Google’s recently unveiled Gemini 2.0 Flash on benchmarks like MATH and SimpleQA, which measure the capacity of a model to answer math problems and facts. based questions. Parameters roughly correspond to a model’s problem-solving capabilities, and models with more parameters generally perform better than those with fewer parameters.
As for MiniMax-VL-01, MiniMax claims that it competes with Anthropic’s Claude 3.5 Sonnet on assessments that require multimodal understanding, like ChartQA, which tasks models with answering queries related to charts and diagrams (e.g. example, “What is the maximum value of the orange line on this graph?”). Granted, the MiniMax-VL-01 doesn’t really outperform the Gemini 2.0 Flash in many of these tests. OpenAI’s GPT-4o and Meta’s Llama 3.1 also beat it several times.
It should be noted that MiniMax-Text-01 has an extremely large pop-up window. A model’s context, or popup, refers to the input (e.g., text) that a model considers before generating output (additional text). With a pop-up of 4 million tokens, MiniMax-Text-01 can analyze approximately 3 million words in one go, or just over five copies of “War and Peace.”
For context (no pun intended), MiniMax-Text-01’s pop-up window is approximately 31 times the size of that of GPT-4o and Llama 3.1.
The latest of the MiniMax models released this week, the T2A-01-HD, is an audio generator optimized for speech. The T2A-01-HD can generate a synthetic voice with adjustable cadence, pitch and tenor in approximately 17 different languages, including English and Chinese, and clone a voice from just 10 seconds of an audio recording .
MiniMax has not released benchmark results comparing the T2A-01-HD to other audio generation models. But to this reporter’s ears, the T2A-01-HD’s outputs sound on par with audio models from Meta and startups like PlayAI.
Except for the T2A-01-HD, which is exclusively available via the MiniMax API and Hailuo AI platform, new MiniMax models can be downloaded from GitHub and the Hugging Face AI development platform.
Just because models are available “openly” does not mean they are not locked in certain aspects. MiniMax-Text-01 and MiniMax-VL-01 are not truly open source in the sense that MiniMax has not released the components (e.g. training data) necessary to recreate them from scratch. Additionally, they are subject to MiniMax’s restrictive license, which prohibits developers from using the models to improve competing AI models and requires platforms with more than 100 million monthly active users to apply for a license special to MiniMax.
MiniMax was founded in 2021 by former employees of SenseTime, one of China’s largest AI companies. The company’s projects include applications such as Talkie, an AI role-playing platform in the style of Character AI, and text-to-video conversion models that MiniMax released in Hailuo.
Some MiniMax products have become the subject of minor controversy.
Talkie, which was removed from Apple’s App Store in December for unspecified “technical” reasons, features AI avatars of public figures including Donald Trump, Taylor Swift, Elon Musk and LeBron James, none of whom appear have consented to appear in the application.
In December, Broadcast magazine reported that MiniMax’s video generators can reproduce the logos of British TV channels, suggesting that MiniMax’s models were trained on content from those channels. And MiniMax would be to be pursued by iQiyi, a Chinese video streaming service that alleges that MiniMax illegally trained on iQiyi’s copyrighted recordings.
The new MiniMax models come days after the outgoing Biden administration proposed tighter export rules and restrictions on AI technologies for Chinese companies. Chinese companies were already barred from purchasing advanced AI chips, but if the new rules take effect as written, companies will face stricter caps on semiconductor technology and the models needed to start sophisticated AI systems.
On Wednesday, the Biden administration announcement additional measures aimed to prevent sophisticated chips from entering China. Chip foundries and packaging companies that want to export certain chips will be subject to broader licensing requirements unless they exercise greater scrutiny and due diligence to prevent their products from reach Chinese customers.