Tether Unveils Open Dataset for AI Training

ForkLog

3 months ago

Tether Unveils Open Dataset for AI Training

Tether Data’s AI division, QVAC, has significantly expanded the “world’s largest publicly available synthetic dataset” for artificial intelligence training.

QVAC Genesis II has added 107 billion new tokens, bringing the total to 148 billion across 19 educational fields. This “substantially increases” the scale, depth, and quality of reasoning.

The second version builds on the foundation of the first. It covers 10 new areas, including chemistry, computer science, statistics, machine learning, astronomy, geography, econometrics, and electrical engineering.

QVAC Genesis II recreates university-level physics and, together with Genesis I, forms “the most comprehensive synthetic educational dataset ever made available to the public.”

The release is based on a new approach to information generation—Option-Level Reasoning. It is designed to extract structured reasoning from model errors and correct answers.

“The result is training data that emphasizes clarity, causality, and decision-making, rather than just superficial correctness,” the company’s blog states.

Tether emphasized that QVAC focuses on training the model to think, reason, and explain, rather than mimic.

“Today, most programs are optimized for fluency rather than understanding. With this release, we move beyond volume to structure, reasoning, and clarity,” stated the firm’s CEO, Paolo Ardoino.

In May, Tether announced a new QVAC platform for developing “infinite and ubiquitous intelligence,” which envisions “launching and evolving” AI agents on user devices instead of large company data centers.

In June, Ardoino stated that within 15 years, a trillion AI agents will emerge, using Bitcoin and USDT for settlements and transactions.