Google has introduced Gemma 4, a new family of open AI models designed for advanced reasoning and agentic workflows.
We just released Gemma 4 — our most intelligent open models to date.
Built from the same world-class research as Gemini 3, Gemma 4 brings breakthrough intelligence directly to your own hardware for advanced reasoning and agentic workflows.
Released under a commercially… pic.twitter.com/W6Tvj9CuHW
— Google (@Google) April 2, 2026
“Gemma 4 is our most intelligent open model to date, providing an unprecedented level of intelligence per parameter,” the statement reads.
Since the launch of the first generation, developers have downloaded Gemma over 400 million times, creating more than 100,000 model variants within the Gemmaverse ecosystem. The latest version is built on the same research and technology as the Gemini 3 chatbot.
Various Sizes
The Gemma 4 neural network family includes four versions: Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts (MoE), and 31B Dense.
The compact E2B and E4B, with 2.3 billion and 4.5 billion active parameters respectively, focus on multimodality, low latency, and seamless integration. They can be run on a smartphone or a regular laptop.
The 26B MoE and flagship 31B (with 26 billion and 31 billion parameters) require a graphics accelerator like the Nvidia H100 with 80 GB of memory. These models are optimized for researchers and developers.
The senior versions perform well in benchmarks. In the global Arena AI open text model rankings, the flagship 31B ranks third, while the 26B ranks sixth. According to developers, the new lineup surpasses competitors’ models, which are 20 times larger.
Key Features
One of the main advantages of Gemma 4 is its advanced reasoning capabilities. The models can construct complex logic and plan multi-step tasks. They show significant progress in mathematics benchmarks and follow instructions accurately.
Other features include:
- Agentic workflows — built-in support for function calls, structured output in JSON format, and system instructions allows for the creation of autonomous assistants that interact with tools and API;
- Code generation — Gemma 4 supports high-quality code writing offline, turning a workstation into a local AI assistant;
- Vision and audio — all models process video and images with variable resolution, recognize text, and analyze diagrams. E2B and E4B also support speech recognition and understanding;
- Extended context window — compact versions support 128,000 tokens, while larger ones support up to 256,000. This is sufficient for processing entire repositories or large documents in a single request;
- Multilingualism — the model family can work with more than 140 languages.
Gemma 4 is already available in Google AI Studio and Google AI Edge Gallery. Integration is also supported by popular third-party tools and frameworks, including Hugging Face, vLLM, llama.cpp, MLX, Ollama, NVIDIA NIM, and LM Studio.
The models can be customized via Google Colab, Vertex AI, or on local graphics cards. For production, deployment is available on Google Cloud, including Cloud Run, GKE, and Sovereign Cloud.
Earlier in April, Google introduced a new AI model for video generation — Veo 3.1 Lite.
