Published: May 21, 2025
Summarization stands as one of the most common and vital AI tasks using large language models (LLMs). Summaries offer a critical means to quickly understand extensive content—from lengthy articles and dense chat logs to numerous reviews—saving time, enhancing productivity, and enabling faster, better-informed decision-making.
There are many different types of summaries, with varied levels of detail and formatting expectations. To meet the expectations of the various summary types, Chrome collaborated with Google Cloud to improve Gemini Nano's output.
We fine-tuned Gemini Nano with low-rank adaptation (LoRA) to enhance the experience and output quality, for all summary styles and lengths. Additionally, we implemented automatic and autorater evaluations on different aspects of summary quality, including factuality, coverage, format, and readability.
We've visualized what this difference looks like in practice. You can experiment with this implementation and take a look at a real-time demo that compares the outputs of Gemini Nano and Gemini Nano with LoRA.
What is the Summarizer API?
Explainer | Web | Extensions | Chrome Status | Intent |
---|---|---|---|---|
GitHub | View | Intent to Ship |
The Summarizer API condenses lengthy text content into brief, easy-to-digest summaries. The API is built into Chrome nd uses Gemini Nano to perform inference.
Different sites may require summaries with a
range of styles and lengths. For
example, if you're a news site, you may want to offer a bulleted list of key
points in your articles. Alternatively, users browsing product reviews could
benefit from a quick and short summary of the review sentiment. To demonstrate,
we've summarized the
Wikipedia page on Welsh Corgis
with the length set to short
.
Summary type | Output |
headline |
## Welsh Corgi: A History of Royalty and Herding Dogs |
key-points |
* The Welsh Corgi is a small herding dog that originated in Wales. * There are two main breeds: Pembroke and Cardigan Welsh Corgi. * The Pembroke is more popular and has been associated with the British royal family. |
tl;dr |
The Welsh Corgi, a small herding dog with a long history in Wales and the British royal family, comes in two varieties: Pembroke and Cardigan, both known for their fox-like faces, short legs, and herding instincts. |
teaser |
Discover the history of the Welsh Corgi, from its humble origins as a herding dog for Welsh farmers to its rise as a symbol of the British royal family. |
You can experiment with other pages using the Summarizer API Playground.
Experiment with fine-tuning
Fine-tuning is only available as a flag in Chrome Canary, from version 138.0.7180.0
. To use this model:
- Open Chrome Canary.
- Go to
chrome://flags/#summarization-api-for-gemini-nano
- Select Enabled with Adaptation.
- Restart the browser.
- Open DevTools Console and input
Summarizer.availability()
. This starts the download for the supplemental LoRA.
Once the download is complete, you can start experimenting.
Evaluating the summarizer's performance
We measured the performance improvement of the fine-tuned Gemini Nano primarily using two evaluation methods, automatic and autorater. Fine-tuning helps a model better perform specific tasks, such as:
- Translate medical text better.
- Generate images in a specific art style.
- Understand a new slang.
In this case, we wanted to better meet the expectations of each summary type.
Automatic evaluation
Automatic evaluation uses software to judge a model's output quality. We used this technique to search for formatting errors, sentence repetition, and existence of non-English characters in summaries of English input.
Formatting errors: We check whether the summary responses follow the prompt's formatting instructions. For example, for the short key-points style, we check whether each bullet point starts with an asterisk (
*
) and that the number of bullet points does not exceed 3 bullet points.Sentence repetition: We check whether the same sentence is repeated in a single summary response, as this indicates a poor quality response.
Non-English characters: We check whether the response includes non-English characters when the input is meant to be in English.
Hyperlink in output: We check if the response has any hyperlinks, in Markdown format or in plain text, that don't exist in the input.
We evaluated two styles of input: scraped articles and chat logs.
Headline | TL;DR | Key-Points | Teaser | |
Base / With LoRA | Base / With LoRA | Base / With LoRA | Base / With LoRA | |
Format errors | 13.54% / 7.05% | 41.07% / 4.61% | 12.58% / 6.36% | 51.17% / 6.74% |
Sentence repetition | 0.07% / 0.07% | 0.21% / 0.0% | 0.10% / 0.10% | 0.10% / 0.03% |
Non-English errors | 3.95% / 0.03% | 1.38% / 0.0% | 2.41% / 0.03% | 1.44% / 0.0% |
Hyperlinks | 0.07% / 0.0% | 0.14% / 0.0% | 0.14% / 0.0% | 0.34% / 0.0% |
Headline | TL;DR | Key-Points | Teaser | |
Base / With LoRA | Base / With LoRA | Base / With LoRA | Base / With LoRA | |
Format error | 13.17% / 0.24% | 22.92% / 0.18% | 4.43% / 0.09% | 29.64% / 3.51% |
Sentence repetition | 0.0% / 0.0% | 0.0% / 0.0% | 0.0% / 0.0% | 0.03% / 0.0% |
Non-English error | 0.15% / 0.0% | 0.15% / 0.0% | 0.03% / 0.0% | 0.06% / 0.0% |
Hyperlinks | 0.0% / 0.0% | 0.0% / 0.0% | 0.0% / 0.0% | 0.0% / 0.0% |
After fine-tuning the Gemini Nano, we saw a significant reduction in the format error rate across different summary types, for both articles and chat logs.
Autorater evaluation
We used Gemini 1.5 Pro for autorater evaluation, to judge Gemini Nano's output quality. As each summary has a different purpose, the criteria and value of the criteria differed for different summary types. All summary types were evaluated for:
- Coverage: Does the summary accurately capture the essential purpose of the input?
- Factuality: Is the summary truthful? Does the summary introduce new information that was not explicitly stated or implied in the text?
- Format: Is the summary formatted with valid Markdown syntax? Does the summary keep to the maximum length of sentences, as is requested?
- Clarity: Is the summary repetitive? Does the summary accurately convey the core message in the fewest possible words?
As these summary types have different purposes, additional metrics apply to specific summary types:
- Engagement: (
headline
): Is the summary immediately understandable to a general audience? Does the summary use a tone that is engaging and appealing to a general audience? - Succinctness (
tl;dr
): Is the summary clear, concise, and immediately understandable to someone with a very short attention span? Does it effectively distill the core message into a readily digestible form for a quick read? - Enticement (
teaser
): Does the summary effectively create intrigue and encourage the reader to want to learn more by reading the full text? Does it use language that is engaging and suggestive of interesting content?
We compared the output of the base model and the model with LoRA, side-by-side, using the autorater. The autorater's scores were averaged between 0 and 1, which was then assessed against the threshold value.
To ensure a well-grounded result, we reduced data variance and alleviated positional bias.
- Data variance reduction: We averaged the scores of three independent outputs per input, as independent runs may have slightly different results. We averaged the outputs for both the base model and fine-tuned Gemini Nano. While the differences in scores across outputs were only slightly different, averages help us more reliably understand large sets of data.
Alleviate positional bias: To avoid giving preference to the value of the summary shared first with the rater, we evaluated the results twice, then averaged the final scores.
- We evaluated the model with LoRA, then the base model.
- Then, we reversed the order. We evaluated the base model, followed by the model with LoRA.
- We averaged the final scores.
Short Medium Long Base / With LoRA Base / With LoRA Base / With LoRA LoRA first 74.29% / 86.64% 76.11% / 81.38% 68.62% / 78.95% Base model first 68.02% / 88.60% 64.97% / 87.58% 58.25% / 86.35% Version C (Average) 71.02% / 89.18% 69.59% / 84.08% 63.47% / 82.65% Winrates for key-points
summary type. Higher values are better results.
While the difference in scoring for outputs from the same model were only slightly different, averages help us more reliably understand large sets of data.
Across 500 articles, fine-tuned Gemini Nano performed significantly better than the base model.
Headline | TL;DR | Key-Points | Teaser | |
Base / With LoRA | Base / With LoRA | Base / With LoRA | Base / With LoRA | |
Short | 74.74% / 89.12% | 55.76% / 89.50% | 71.02% / 89.18% | 53.47% / 87.14% |
Medium | 73.10% / 87.89% | 41.82% / 81.21% | 69.59% / 84.08% | 48.98% / 86.74% |
Long | 60.99% / 89.32% | 50.51% / 84.85% | 63.47% / 82.65% | 62.65% / 87.55% |
The same was true in our evaluation of 500 chat logs, fine-tuned Gemini Nano outperformed the base model.
Headline | TL;DR | Key-Points | Teaser | |
Base / With LoRA | Base / With LoRA | Base / With LoRA | Base / With LoRA | |
Short | 70.59% / 96.15% | 66.27% / 97.79% | 81.60% / 97.40% | 67.48% / 96.14% |
Medium | 76.67% / 95.13% | 56.02% / 94.98% | 82.60% / 97.20% | 50.41% / 96.95% |
Long | 59.03% / 94.32% | 65.86% / 95.58% | 75.00% / 97.60% | 70.94% / 97.16% |
These results demonstrate that our fine-tuning improved the overall summary quality.
Better summaries with LoRA
Traditionally, fine-tuning is performed by adjusting the model's parameters. Modern AI models are huge, so this operation is slow, expensive, and requires storing a brand new copy of the model.
Instead of changing all the parameters, what if we add tiny extra pieces that steer the model in the direction we want? Their smaller size enables much faster training. This is the core tenet of low-rank adaptation (LoRA). With LoRA, certain pieces of the model remain unchanged, often referred to as freezing the parameters. A smaller set of data is then introduced with small corrections, focusing training efforts on these added pieces.
In fact, typically you can train as few as 2% of the original parameters with LoRA and still see significant changes in your outputs.
Why does this work? Many behaviors (such as new styles, tasks, or knowledge) can be incorporated without altering every model parameter. These changes can be expressed as small patterns or directions within the model's existing knowledge. LoRA efficiently captures these patterns without touching the original model.
Take a look at some examples that compare the Gemini Nano base model to the results after adding LoRA.
We asked Gemini Nano to create a short headline that "captures attention and conveys the essence of the content" in the Joshua Tree National Park Wikipedia article.
Gemini Nano | Gemini Nano with LoRA |
Joshua Tree National Park's Unique Landscape and History | Joshua Tree: National Park in California, Straddling Deserts & Home to Iconic Trees |
Next, we prompted Gemini Nano for a short teaser for the Ratatouille article. "Write an enticing one sentence that encourages users to read the full text."
Gemini Nano | Gemini Nano with LoRA |
Ratatouille is a beloved French Provençal dish featuring stewed vegetables, often including tomatoes and garlic. Its origins can be traced to the 18th century, and its preparation has evolved over time, with some chefs emphasizing individual vegetable flavors while others prefer a smoother, more unified texture. Beyond its French origins, ratatouille has inspired similar dishes in various cuisines around the world, including piperade from Spain and tian from the Mediterranean. The film "Ratatouille" (2007) further popularized the dish, showcasing its culinary versatility and emotional resonance. | Ratatouille, a French Provençal dish of stewed vegetables, has a rich culinary history, evolving from a simple stew into a modern culinary masterpiece that has captivated audiences worldwide. |
Real-time inference demo
We built an interface that demonstrates the comparison of outputs between Gemini Nano and Gemini Nano with LoRA.
We asked Gemini Nano to create a tl;dr
summary with a short
length for the
Ocean Sunfish article. Remember
that tl;dr
and short
requires a response in 1 sentence that is a "quick read."
By implementing fine-tuning, Gemini Nano can better generate a summary that follows the specific instructions.
Engage and share feedback
We're eager to hear your feedback on how your summaries are impacted by the fine-tuned Gemini Nano.
- Experiment with the updated model in Chrome Canary.
- Learm more about the Summarizer API.
- If you have feedback on Chrome's implementation, file a bug report or feature request.
Discover all of the built-in AI APIs which use models, including large language models, in the browser.
-
Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics. ↩
-
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL '02). ↩
-
Mousumi Akter, Naman Bansal, and Shubhra Kanti Karmaker. 2022. Revisiting Automatic Evaluation of Extractive Summarization Task: Can We Do Better than ROUGE?. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1547–1560, Dublin, Ireland. Association for Computational Linguistics. ↩
-
Daniel Deutsch and Dan Roth. 2021. Understanding the Extent to which Content Quality Metrics Measure the Information Quality of Summaries. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 300–309, Online. Association for Computational Linguistics. ↩