Gemma 4 – what is it?

2026-05-04

De Novo Cloud Expert

Gemma 4 from Google is a family of open AI models focused on reasoning, software development, multimodal scenarios and agentic workflows. It is a direct continuation of the open Gemma model series, specifically Gemma 3, and is distributed under the Apache 2.0 licence. This last factor is important for companies that need an AI model with open weights and the ability to deploy it in their own infrastructure. Unlike the closed Gemini models, this line is designed for a wide range of users. For example, it can be integrated into CI/CD pipelines for automated code review or used as the core of local AI development agents.

The Gemma 4 model is fairly compact, while still offering a large context window. Small models support a window of up to 128,000 tokens, while medium-sized models support up to 256,000 tokens. This is particularly important in scenarios where the model must retain context across several stages of work. In practice, this is optimal for root cause analysis in extensive cluster logs or for large-scale refactoring of monolithic applications.

There is also a particular emphasis on multimodality. All models in the family work with text and images. At the same time, Gemma 4 is relatively resource-intensive: stable operation requires powerful GPUs or TPUs and a suitable software environment. For that reason, frameworks such as vLLM or TGI are often used to optimise the deployment of such models, as they make more efficient use of GPU accelerator memory.