Gemini Nano and Gemma: Google’s Powerful AI That Runs on Your Device
From a 2B-parameter model that runs on a laptop to a multimodal model that works offline on a phone with 2 GB of RAM, Google’s on-device AI strategy is one of the most ambitious in the industry. This is the full story: where it started, how it evolved, and where it is going.
The Cloud Was Never the Only Answer
There is a version of AI that lives entirely in the cloud. You ask a question, the signal travels to a data center thousands of miles away, a model processes it, and an answer comes back. For most tasks, this works well. But it assumes you have a reliable internet connection, that latency does not matter, and that privacy concerns are not a blocker.
Google has been building a parallel path for years. What if the model ran on the device itself? On your phone, your laptop, your tablet, even eventually on a wearable or a sensor? On-device AI is not just a backup for when you go offline. It is a fundamentally different model for how AI fits into people’s lives: faster, more private, and available everywhere.
This blog traces Google’s on-device AI journey through two main threads: Gemini Nano, the embedded model powering Pixel smartphones, and Gemma, the open-weight family that developers around the world can download, fine-tune, and deploy. Together, they represent the most complete on-device AI strategy in the industry.
Before Gemma: Google’s Open Source DNA
Google’s commitment to open AI research predates the current era of large language models by more than a decade. The 2017 paper “Attention is All You Need” from Google Research introduced the Transformer architecture that now underpins virtually every major AI model in the world. TensorFlow, JAX, BERT, T5, AlphaFold — the list of foundational technologies Google has released to the open community is long and consequential.
Gemma is the direct heir of this tradition. When Google DeepMind launched the first Gemma models in February 2024, the framing was explicit: these are open models built from the same research and technology used to create Gemini. Not a simplified or stripped-down version of Gemini shared as a goodwill gesture, but a genuine transfer of Google-grade model architecture into open weights that any developer can download, run locally, and fine-tune on their own data.
|
Google’s open AI contributions span over a decade: Transformers (2017), TensorFlow, JAX, BERT, T5, AlphaFold, AlphaCode. Gemma is the continuation of this lineage in the era of generative AI. |
Gemini Nano: AI That Never Leaves Your Phone
In December 2023, Google introduced Gemini, its most capable AI model family to date, in three sizes: Ultra, Pro, and Nano. While Ultra and Pro are cloud-based, Nano was designed from day one to run on-device, with no cloud call required.
The Pixel 8 Pro was the first smartphone ever engineered to run an on-device foundation model. It launched with Gemini Nano powering two features: Summarize in the Recorder app (meeting and lecture summaries generated entirely on the device) and Smart Reply in Gboard (contextual reply suggestions in WhatsApp, Line, and KakaoTalk).
Why On-Device Matters: Privacy and Speed
Gemini Nano running on Pixel has two structural advantages over cloud-based AI. First, privacy: for sensitive use cases like call transcription, scam detection, and message summarization, data never leaves the phone. Second, speed: there is no round-trip to a server, which means responses are faster and features work even without internet.
Scam Detection, launched in 2025 and powered by Gemini Nano, is a clear illustration of why on-device is the only viable architecture for this use case. The model analyzes your phone call in real time and alerts you if it detects patterns associated with fraud. Sending audio of every phone call to a cloud server would be both a privacy risk and a latency problem.
Gemini Nano Milestones on Pixel
- Dec 2023 — Pixel 8 Pro: Summarize in Recorder, Smart Reply in Gboard (first on-device foundation model on a smartphone)
- Jan 2024 — Magic Compose in Google Messages: on-device message rewriting in different styles
- Jun 2024 — Expanded to Pixel 8 and Pixel 8a as developer option
- Aug 2024 — Pixel 9 series: Call Notes (summarize phone calls), Pixel Screenshots (auto-organize screenshots)
- Nov 2024 — Gemini Nano with Multimodality: sight, sound, and language processing on-device
- Dec 2024 — Smarter Call Screen replies using Gemini Nano
- Mar 2025 — Scam Detection for calls and messages: real-time fraud alerts
- Oct 2025 — Pixel 10 with Tensor G5: Nano-powered Magic Cue, Voice Translate, Pro Res Zoom
Gemma: Open Weights, Google-Grade Intelligence
While Gemini Nano is an embedded component of Pixel devices, the Gemma family is a different kind of initiative. Gemma models are open-weight: you can download them to your own machine, inspect their architecture, fine-tune them on your own data, and deploy them without any dependency on Google’s infrastructure. They carry a commercially friendly license, which means startups, researchers, and enterprises can build production systems on top of them.
The name comes from the Latin word for “precious stone.” The metaphor is apt: a Gemma model is a concentrated, portable piece of something that would otherwise require enormous resources to create. Gemini-grade architecture and training, distilled into a form that runs on a laptop.
The Gemma Family: Generation by Generation
| Generation | Sizes | Target Hardware | Key Capability Added |
|---|---|---|---|
| Gemma 1 | 2B, 7B | Laptop/workstation | Text only, pre-trained + instruction-tuned |
| Gemma 2 | 2B, 9B, 27B | Single GPU / laptop | Redesigned arch, knowledge distillation, SynthID |
| Gemma 3 | 1B, 4B, 12B, 27B | Phone to GPU | 128k context, 140 languages, multimodal (vision) |
| Gemma 3n | E2B, E4B | Phone, tablet, 2GB RAM | Audio + vision + text + video, offline, real-time |
| Gemma 3 270M | 270M | MCU / edge chip | Ultra-compact, hyper-efficient on minimal power |
Gemma 1.0 (February 2024): The Opening Move
The first Gemma release shipped two sizes: 2B and 7B parameters. Both were capable of running on a developer laptop or desktop without cloud access. The 7B model outperformed significantly larger open models on benchmarks including question answering, reasoning, mathematics, science, and coding.
Alongside the weights, Google released a Responsible Generative AI Toolkit, Colab and Kaggle notebooks, and integrations with Hugging Face, NVIDIA NeMo, and TensorRT-LLM. The message was clear: this was not just a model release, it was an invitation to an ecosystem.
Gemma 2 (June 2024): A Redesigned Architecture
Gemma 2 introduced a completely redesigned architecture built for both performance and inference efficiency. Available in 9B and 27B sizes (a 2B was added later), the 27B model delivered performance competitive with models more than twice its size and could run at full precision on a single NVIDIA H100 or Google Cloud TPU, significantly reducing deployment costs.
Gemma 2 also introduced knowledge distillation, where larger models transfer their “intuition” into smaller, more efficient ones. This technique would become central to subsequent generations. Additionally, Gemma 2 launched with SynthID text watermarking capability being open-sourced for the first time.
Gemma 3 (March 2025): The Single-Accelerator Champion
Gemma 3 arrived as Google’s most capable open model at the time of release. In preliminary human preference evaluations on LMArena’s leaderboard, Gemma 3 outperformed Llama 3-405B, DeepSeek-V3, and o3-mini while running on a single GPU or TPU.
Gemma 3 introduced four size options (1B, 4B, 12B, 27B), a 128,000-token context window, out-of-the-box support for over 35 languages with pretrained support for over 140, and multimodal vision capabilities for the first time in the Gemma family. Within a year of the original Gemma launch, the family had crossed 100 million downloads and spawned more than 60,000 community-created variants in what Google began calling the Gemmaverse.
Gemma 3n (May 2025, Google I/O): The Phone-Native Model
The most significant architectural leap in the Gemma family came at Google I/O 2025 with Gemma 3n. Engineered specifically for mobile, Gemma 3n introduced “per-layer embedding” (PLE) parameter sharing, a technique that dramatically reduces the memory footprint during inference. The result: a genuinely multimodal model that handles audio, text, images, and video, and runs on phones and tablets with as little as 2 GB of RAM, fully offline.
Gemma 3n ships in two effective sizes (E2B and E4B), designed to feel like 2B and 4B parameter models in terms of capability while being far more compact in memory. It supports high-quality automatic speech recognition and can process all four modalities simultaneously. This is the model that enables a developer in a remote village to build an offline AI assistant that works without any internet infrastructure.
The Complete On-Device AI Timeline
The pace of this evolution is striking. From the first Gemini Nano on Pixel 8 Pro in December 2023 to FunctionGemma automating OS actions fully offline in December 2025, two years produced a complete architectural revolution in on-device AI.
| Date | Model | What Changed |
|---|---|---|
| Dec 2023 | Gemini Nano | Pixel 8 Pro – first on-device LLM on a smartphone (Summarize in Recorder, Smart Reply) |
| Feb 2024 | Gemma 1.0 | Open weights: 2B and 7B models. Laptop and desktop capable. Gemini DNA, open license. |
| May 2024 | Gemma 2 preview | New architecture, 9B and 27B. Outperforms models 2x its size on single GPU. |
| May 2024 | Gemini Nano Multimodal | Nano gains sight and hearing on Pixel: images, sounds, spoken language. |
| Jun 2024 | Gemma 2 release | Full release. 2B size added. CodeGemma, RecurrentGemma, PaliGemma introduced. |
| Mar 2025 | Gemma 3 | 1B, 4B, 12B, 27B. 128k context. 140 languages. Beats Llama 3-405B on LMArena. |
| May 2025 | Gemma 3n (I/O 2025) | Phones and tablets with 2 GB RAM. Audio, text, image, video. Fully offline capable. |
| May 2025 | MedGemma | Open multimodal model for health applications and medical image comprehension. |
| Aug 2025 | Gemma 3 270M | Hyper-efficient model. Runs on microcontrollers and ultra-low-power edge devices. |
| Dec 2025 | FunctionGemma | On-device agentic tool use. Automates OS tasks fully offline on mobile phones. |
| Dec 2025 | T5Gemma 2 | Encoder-decoder, multimodal, 128k context. As small as 370M params total. |
| Jan 2026 | TranslateGemma | 55 languages. 12B beats 27B baseline. Translates text inside images (multimodal). |
The Gemmaverse: Specialized Models for Every Domain
One of the most important things that happened in the two years following Gemma’s launch was not anything Google did internally. It was what the developer and research community did with the model weights. More than 60,000 Gemma variants were created and shared, in every language, domain, and form factor imaginable.
Google also released its own specialized variants, each targeting a specific domain where a general-purpose model may not be the most efficient or appropriate choice.
| Variant | Domain | What It Enables |
|---|---|---|
| MedGemma | Healthcare image and text AI | Analyze medical images, build health applications |
| ShieldGemma 2 | Safety classifier | Detect harmful content in LLM pipelines |
| FunctionGemma | On-device agentic tasks | Offline OS automation: calendar, contacts, settings |
| CodeGemma | Code generation and completion | Developer tools, IDE integrations, local coding assistant |
| PaliGemma | Vision-language model | Image question answering, image captioning |
| TranslateGemma | Specialized translation | 55 languages, beats larger models, translates text in images |
| T5Gemma 2 | Encoder-decoder model | Compact seq2seq tasks, 128k context, multimodal |
| SignGemma (upcoming) | Sign language translation | ASL to English, accessibility applications |
| DolphinGemma | Bioacoustics research | World’s first LLM trained on dolphin communications |
What On-Device AI Actually Enables: Real Stories
Benchmarks and architectural descriptions only go so far. The most compelling argument for on-device AI is what becomes possible when it is deployed in the real world, especially in contexts where cloud infrastructure is unavailable, expensive, or inappropriate.
Accessibility at Scale
Gemma Vision, a project from the Gemma 3n Impact Challenge, was built by a developer whose brother is blind. The app processes camera input from a phone strapped to the user’s chest, controlled by an 8-bit controller or voice commands, and provides real-time scene descriptions entirely on-device. The model runs through MediaPipe’s LLM Inference API, no internet required.
Another project, Vite Vere, transformed pictograms into full natural language expressions for Eva, a graphic designer with cerebral palsy who had been limited to simple commands for decades. Fine-tuned locally on Apple’s MLX framework, the model learned Eva’s specific communication patterns. These are applications that could not exist if every inference required a cloud call.
Education in Disconnected Regions
Lentera, another Impact Challenge submission, transformed affordable hardware into offline microservers that broadcast local WiFi hotspots. Students in regions without internet connectivity connect to the hotspot and interact with a Gemma 3n-powered educational assistant running locally via Ollama. No data center. No latency. No cost per query.
Science and Medicine
Perhaps the most striking real-world application came from research, not consumer use. Google DeepMind and Yale University used a 27B Gemma model to build C2S-Scale, a foundation model for single-cell biology analysis. The model generated a novel hypothesis about cancer cellular behavior: a specific drug combination that might make tumors more visible to the immune system. Lab tests confirmed the prediction. A Gemma-based model made a scientifically validated cancer research breakthrough.
The Scale of the Gemmaverse
The adoption numbers contextualize just how significant the Gemma initiative has become in the broader AI ecosystem.
- 100 million Gemma downloads by February 2025, within one year of the first release.
- 300 million downloads by the end of 2025, tripling over the course of a single year.
- 60,000 community-created Gemma variants as of the Gemma 3 launch in March 2025.
- Over 600 projects submitted to the Gemma 3n Impact Challenge on Kaggle alone.
- Gemma 3 270M introduced in August 2025 for hyper-efficient deployment on ultra-low-power edge devices.
- The Gemma Academic Program offers researchers $10,000 USD in Google Cloud credits per award to accelerate Gemma-based scientific research.
- Gemma models integrate with: Hugging Face, Ollama, NVIDIA NeMo, TensorRT-LLM, LiteRT, Google AI Edge, Vertex AI, Kaggle, Colab, and more.
What Comes Next: The Edge AI Trajectory
The direction is clear. Google’s on-device AI strategy is converging toward a world where every device carries meaningful AI capability, operates independently of network infrastructure for sensitive or latency-critical tasks, and serves as an intelligent node in a larger connected system when connectivity is available.
FunctionGemma, released in December 2025, points to the next frontier: on-device agents. Models small enough to run on a phone that can receive a natural language instruction, parse the correct OS action, and execute it without any cloud involvement. Calendar events, contacts, system settings, all handled by a local model that acts as an intelligent traffic controller between the user and the operating system.
SignGemma, announced at I/O 2025 and currently in development, will bring sign language translation directly to the device. The model translates American Sign Language into spoken English text in real time. This is not a cloud service with sign language capabilities bolted on. It is a purpose-built, on-device accessibility tool built on the Gemma architecture, designed to work even where internet infrastructure does not reach.
|
The convergence point: Gemini Nano handles embedded, always-on device intelligence. Gemma handles developer-deployable, domain-specific, customizable open models. Together, they cover the full spectrum of on-device AI from consumer hardware to research clusters. |
Powerful AI Does Not Require a Data Center
The story of Google’s on-device AI is a story about democratization in the most literal sense. Not making AI cheaper on the cloud, but making it possible without the cloud entirely. A developer in a region without reliable internet can download Gemma, fine-tune it on a local dataset, and deploy an application that runs offline indefinitely. A researcher can use Gemma to make a hypothesis that leads to a scientifically validated cancer discovery. A person with a disability can have a communication tool trained specifically on their own speech patterns.
Gemini Nano and Gemma are not competing with cloud AI. They are complementing it by reaching the places and use cases that cloud AI simply cannot serve. And as the models grow more capable, as the hardware becomes more efficient, and as the developer ecosystem around the Gemmaverse continues to expand, the distance between what is possible on a device and what is possible in a data center will continue to narrow.
From a Pixel 8 Pro in December 2023 to a 2 GB RAM phone running real-time audio, video, and text multimodality offline in 2025: this is what two years of focused, open AI development looks like.
|
To get started with Gemma: download models on Kaggle at Google | Gemma | Kaggle or via Hugging Face. For on-device deployment, explore Google AI Edge at ai.google.dev/edge. |
Ready to Transform How Your Teams Work with AI?
Gemma and Gemini Nano are just one piece of Google’s rapidly evolving AI ecosystem. Whether you are looking to integrate AI Search into your workflows, upskill your team, or explore what Google AI can do for your business, we are here to help.
Contact us today to learn more.
Author: Ata Güneş
Date Published: Apr 10, 2026
