Blame - tools/metrics/histograms/README.md - chromium/src.git

blob: 86b848b5312f2e8f5694532b96248104f9dad3e8 [file] [log] [blame] [view]

mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	1	# Histogram Guidelines
				2
				3	This document gives the best practices on how to use histograms in code and how
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	4	to document the histograms for the dashboards. There are three general types
Darwin Huang	1ca97ac	2020-06-17 18:09:20	[diff] [blame]	5	of histograms: [enumerated histograms](#Enum-Histograms),
				6	[count histograms](#Count-Histograms) (for arbitrary numbers), and
				7	[sparse histograms](#When-To-Use-Sparse-Histograms) (for anything when the
				8	precision is important over a wide range and/or the range is not possible to
				9	specify a priori).
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	10
				11	[TOC]
				12
Ilya Sherman	b964189	2020-11-06 00:53:55	[diff] [blame]	13	## Defining Useful Metrics
Mark Pearson	b1d608d	2018-06-05 19:59:44	[diff] [blame]	14
Ilya Sherman	b964189	2020-11-06 00:53:55	[diff] [blame]	15	### Directly Measure What You Want
				16
				17	Measure exactly what you want, whether that's the time used for a function call,
				18	the number of bytes transmitted to fetch a page, the number of items in a list,
				19	etc. Do not assume you can calculate what you want from other histograms, as
				20	most ways of doing this are incorrect.
				21
				22	For example, suppose you want to measure the runtime of a function that just
				23	calls two subfunctions, each of which is instrumented with histogram logging.
				24	You might assume that you can simply sum the histograms for those two functions
				25	to get the total time, but that results in misleading data. If we knew which
				26	emissions came from which calls, we could pair them up and derive the total time
				27	for the function. However, histograms are pre-aggregated client-side, which
				28	means that there's no way to recover which emissions should be paired up. If you
				29	simply add up the two histograms to get a total duration histogram, you're
				30	implicitly assuming the two histograms' values are independent, which may not be
				31	the case.
				32
				33	Directly measure what you care about; don't try to derive it from other data.
				34
				35	### Provide Context
				36
				37	When defining a new metric, think ahead about how you will analyze the
				38	data. Often, this will require providing context in order for the data to be
				39	interpretable.
				40
				41	For enumerated histograms in particular, that often means including a bucket
				42	that can be used as a baseline for understanding the data recorded to other
				43	buckets: see the [enumerated histogram section](#Enum-Histograms).
				44
				45	### Naming Your Histogram
				46
				47	Histograms are taxonomized into categories, using dot (`.`) characters as
				48	separators. Thus, histogram names should be in the form Category.Name or
				49	Category.Subcategory.Name, etc., where each category organizes related
				50	histograms.
				51
				52	It should be quite rare to introduce new top-level categories into the existing
				53	taxonomy. If you're tempted to do so, please look through the existing
Robert Kaplow	cbc6fd6	2021-03-19 15:11:40	[diff] [blame]	54	categories to see whether any matches the metric(s) that you are adding. To
				55	create a new category, the CL must be reviewed by
				56	[email protected].
Mark Pearson	b1d608d	2018-06-05 19:59:44	[diff] [blame]	57
Mark Pearson	4c4bc97	2018-05-16 20:01:06	[diff] [blame]	58	## Coding (Emitting to Histograms)
				59
Daniel Cheng	01cd7593	2020-02-06 16:43:45	[diff] [blame]	60	Prefer the helper functions defined in
Mark Pearson	ed73f1f	2019-03-22 18:00:12	[diff] [blame]	61	[histogram_functions.h](https://cs.chromium.org/chromium/src/base/metrics/histogram_functions.h).
Daniel Cheng	01cd7593	2020-02-06 16:43:45	[diff] [blame]	62	These functions take a lock and perform a map lookup, but the overhead is
				63	generally insignificant. However, when recording metrics on the critical path
				64	(e.g. called in a loop or logged multiple times per second), use the macros in
				65	[histogram_macros.h](https://cs.chromium.org/chromium/src/base/metrics/histogram_macros.h)
				66	instead. These macros cache a pointer to the histogram object for efficiency,
				67	though this comes at the cost of increased binary size: 130 bytes/macro usage
				68	sounds small but quickly adds up.
Mark Pearson	159c3897	2018-06-05 19:44:08	[diff] [blame]	69
Mark Pearson	4c4bc97	2018-05-16 20:01:06	[diff] [blame]	70	### Don't Use the Same Histogram Logging Call in Multiple Places
				71
				72	These logging macros and functions have long names and sometimes include extra
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	73	parameters (defining the number of buckets for example). Use a helper function
				74	if possible. This leads to shorter, more readable code that's also more
				75	resilient to problems that could be introduced when making changes. (One could,
Mark Pearson	4c4bc97	2018-05-16 20:01:06	[diff] [blame]	76	for example, erroneously change the bucketing of the histogram in one call but
				77	not the other.)
				78
				79	### Use Fixed Strings When Using Histogram Macros
				80
				81	When using histogram macros (calls such as `UMA_HISTOGRAM_ENUMERATION`), you're
Victor-Gabriel Savu	b2afb6f4	2019-10-23 07:28:23	[diff] [blame]	82	not allowed to construct your string dynamically so that it can vary at a
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	83	callsite. At a given callsite (preferably you have only one), the string
				84	should be the same every time the macro is called. If you need to use dynamic
Mark Pearson	74c5321	2019-03-08 00:34:08	[diff] [blame]	85	names, use the functions in histogram_functions.h instead of the macros.
Mark Pearson	4c4bc97	2018-05-16 20:01:06	[diff] [blame]	86
				87	### Don't Use Same String in Multiple Places
				88
				89	If you must use the histogram name in multiple places, use a compile-time
				90	constant of appropriate scope that can be referenced everywhere. Using inline
				91	strings in multiple places can lead to errors if you ever need to revise the
				92	name and you update one one location and forget another.
				93
				94	### Efficiency
				95
Mark Pearson	ed73f1f	2019-03-22 18:00:12	[diff] [blame]	96	Generally, don't be concerned about the processing cost of emitting to a
				97	histogram (unless you're using [sparse
				98	histograms](#When-To-Use-Sparse-Histograms)). The normal histogram code is
				99	highly optimized. If you are recording to a histogram in particularly
				100	performance-sensitive or "hot" code, make sure you're using the histogram
				101	macros; see [reasons above](#Coding-Emitting-to-Histograms).
Mark Pearson	4c4bc97	2018-05-16 20:01:06	[diff] [blame]	102
				103	## Picking Your Histogram Type
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	104
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	105	### Enum Histograms
				106
				107	Enumerated histogram are most appropriate when you have a list of connected /
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	108	related states that should be analyzed jointly. For example, the set of actions
				109	that can be done on the New Tab Page (use the omnibox, click a most visited
				110	tile, click a bookmark, etc.) would make a good enumerated histogram.
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	111	If the total count of your histogram (i.e. the sum across all buckets) is
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	112	something meaningful—as it is in this example—that is generally a good sign.
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	113	However, the total count does not have to be meaningful for an enum histogram
				114	to still be the right choice.
				115
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	116	Enumerated histograms are also appropriate for counting events. Use a simple
Ilya Sherman	b964189	2020-11-06 00:53:55	[diff] [blame]	117	boolean histogram. It's usually best if you have a comparison point in the same
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	118	histogram. For example, if you want to count pages opened from the history page,
				119	it might be a useful comparison to have the same histogram record the number of
				120	times the history page was opened.
Mark Pearson	a768d022	2019-03-20 02:16:00	[diff] [blame]	121
Ilya Sherman	b964189	2020-11-06 00:53:55	[diff] [blame]	122	In rarer cases, it's okay if you only log to one bucket (say, `true`). However,
				123	think about whether this will provide enough [context](#Provide-Context). For
				124	example, suppose we want to understand how often users interact with a button.
				125	Just knowning that users clicked this particular button 1 million times in a day
				126	is not very informative on its own: The size of Chrome's user base is constantly
				127	changing, only a subset of users have consented to metrics reporting, different
				128	platforms have different sampling rates for metrics reporting, and so on. The
				129	data would be much easier to make sense of if it included a baseline: how often
				130	is the button shown?
				131
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	132	If only a few buckets are emitted to, consider using a [sparse
Mark Pearson	4d0b463	2017-10-04 21:58:48	[diff] [blame]	133	histogram](#When-To-Use-Sparse-Histograms).
				134
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	135	#### Requirements
				136
				137	Enums logged in histograms must:
				138
				139	- be prefixed with the comment:
				140	```c++
				141	// These values are persisted to logs. Entries should not be renumbered and
				142	// numeric values should never be reused.
				143	```
				144	- be numbered starting from `0`. Note this bullet point does not apply for
				145	enums logged with sparse histograms.
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	146	- have enumerators with explicit values (`= 0`, `= 1`, `= 2`) to make it clear
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	147	that the actual values are important. This also makes it easy to match the
				148	values between the C++/Java definition and [histograms.xml](./histograms.xml).
				149	- not renumber or reuse enumerator values. When adding a new enumerator, append
				150	the new enumerator to the end. When removing an unused enumerator, comment it
				151	out, making it clear the value was previously used.
				152
				153	If your enum histogram has a catch-all / miscellaneous bucket, put that bucket
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	154	first (`= 0`). This makes the bucket easy to find on the dashboard if additional
				155	buckets are added later.
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	156
				157	#### Usage
				158
Ilya Sherman	b6bd3c7	2020-04-15 23:08:15	[diff] [blame]	159	In C++, define an `enum class` with a `kMaxValue` enumerator:
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	160
Steven Holte	ecf841d	2018-08-10 00:53:34	[diff] [blame]	161	```c++
Daniel Cheng	cda1df5b	2018-03-30 21:30:16	[diff] [blame]	162	enum class NewTabPageAction {
				163	kUseOmnibox = 0,
				164	kClickTitle = 1,
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	165	// kUseSearchbox = 2, // no longer used, combined into omnibox
				166	kOpenBookmark = 3,
Daniel Cheng	cda1df5b	2018-03-30 21:30:16	[diff] [blame]	167	kMaxValue = kOpenBookmark,
				168	};
				169	```
Daniel Cheng	cda1df5b	2018-03-30 21:30:16	[diff] [blame]	170
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	171	`kMaxValue` is a special enumerator that must share the highest enumerator
				172	value, typically done by aliasing it with the enumerator with the highest
				173	value: clang automatically checks that `kMaxValue` is correctly set for `enum
				174	class`.
				175
				176	The histogram helpers use the `kMaxValue` convention, and the enum may be
				177	logged with:
				178
				179	```c++
Daniel Cheng	cda1df5b	2018-03-30 21:30:16	[diff] [blame]	180	UMA_HISTOGRAM_ENUMERATION("NewTabPageAction", action);
				181	```
Daniel Cheng	cda1df5b	2018-03-30 21:30:16	[diff] [blame]	182
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	183	or:
				184
Steven Holte	ecf841d	2018-08-10 00:53:34	[diff] [blame]	185	```c++
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	186	UmaHistogramEnumeration("NewTabPageAction", action);
Daniel Cheng	cda1df5b	2018-03-30 21:30:16	[diff] [blame]	187	```
Steven Holte	ecf841d	2018-08-10 00:53:34	[diff] [blame]	188
Nate Fischer	1f6efe5	2020-06-17 19:18:21	[diff] [blame]	189	Logging histograms from Java should look similar:
				190
				191	```java
				192	// These values are persisted to logs. Entries should not be renumbered and
				193	// numeric values should never be reused.
				194	@IntDef({NewTabPageAction.USE_OMNIBOX, NewTabPageAction.CLICK_TITLE,
				195	NewTabPageAction.OPEN_BOOKMARK})
				196	private @interface NewTabPageAction {
				197	int USE_OMNIBOX = 0;
				198	int CLICK_TITLE = 1;
				199	// int USE_SEARCHBOX = 2; // no longer used, combined into omnibox
				200	int OPEN_BOOKMARK = 3;
				201	int COUNT = 4;
				202	}
				203
				204	// Using a helper function is optional, but avoids some boilerplate.
				205	private static void logNewTabPageAction(@NewTabPageAction int action) {
				206	RecordHistogram.recordEnumeratedHistogram(
				207	"NewTabPageAction", action, NewTabPageAction.COUNT);
				208	}
				209	```
				210
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	211	#### Legacy Enums
				212
				213	**Note: this method of defining histogram enums is deprecated. Do not use this
Ilya Sherman	b6bd3c7	2020-04-15 23:08:15	[diff] [blame]	214	for new enums in C++.**
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	215
Chris Blume	bdca7ca	2020-06-08 15:48:35	[diff] [blame]	216	Many legacy enums define a `kCount` sentinel, relying on the compiler to
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	217	automatically update it when new entries are added:
				218
Steven Holte	ecf841d	2018-08-10 00:53:34	[diff] [blame]	219	```c++
Daniel Cheng	cda1df5b	2018-03-30 21:30:16	[diff] [blame]	220	enum class NewTabPageAction {
				221	kUseOmnibox = 0,
				222	kClickTitle = 1,
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	223	// kUseSearchbox = 2, // no longer used, combined into omnibox
				224	kOpenBookmark = 3,
Daniel Cheng	cda1df5b	2018-03-30 21:30:16	[diff] [blame]	225	kCount,
				226	};
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	227	```
Steven Holte	ecf841d	2018-08-10 00:53:34	[diff] [blame]	228
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	229	These enums must be recorded using the legacy helpers:
				230
				231	```c++
Daniel Cheng	cda1df5b	2018-03-30 21:30:16	[diff] [blame]	232	UMA_HISTOGRAM_ENUMERATION("NewTabPageAction", action, NewTabPageAction::kCount);
				233	```
				234
Daniel Cheng	914170d2	2019-05-08 09:46:32	[diff] [blame]	235	or:
				236
				237	```c++
				238	UmaHistogramEnumeration("NewTabPageAction", action, NewTabPageAction::kCount);
				239	```
mpearson	b36013be	2017-02-10 20:10:54	[diff] [blame]	240
Matt Giuca	f3e0e253	2017-10-03 23:07:52	[diff] [blame]	241	### Flag Histograms
				242
				243	When adding a new flag in
				244	[about_flags.cc](../../../chrome/browser/about_flags.cc), you need to add a
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	245	corresponding entry to [enums.xml](./enums.xml). This is automatically verified
				246	by the `AboutFlagsHistogramTest` unit test.
Matt Giuca	f3e0e253	2017-10-03 23:07:52	[diff] [blame]	247
				248	To add a new entry:
				249
				250	1. Edit [enums.xml](./enums.xml), adding the feature to the `LoginCustomFlags`
Brett Wilson	f4d5877	2017-10-30 21:37:57	[diff] [blame]	251	enum section, with any unique value (just make one up, although whatever it
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	252	is needs to appear in sorted order; `pretty_print.py` can do this for you).
Matt Giuca	f3e0e253	2017-10-03 23:07:52	[diff] [blame]	253	2. Build `unit_tests`, then run `unit_tests
				254	--gtest_filter='AboutFlagsHistogramTest.*'` to compute the correct value.
				255	3. Update the entry in [enums.xml](./enums.xml) with the correct value, and move
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	256	it so the list is sorted by value (`pretty_print.py` can do this for you).
Matt Giuca	f3e0e253	2017-10-03 23:07:52	[diff] [blame]	257	4. Re-run the test to ensure the value and ordering are correct.
				258
				259	You can also use `tools/metrics/histograms/validate_format.py` to check the
				260	ordering (but not that the value is correct).
				261
				262	Don't remove entries when removing a flag; they are still used to decode data
				263	from previous Chrome versions.
				264
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	265	### Count Histograms
				266
				267	[histogram_macros.h](https://cs.chromium.org/chromium/src/base/metrics/histogram_macros.h)
				268	provides macros for some common count types such as memory or elapsed time, in
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	269	addition to general count macros. These have reasonable default values; you
				270	seldom need to choose the number of buckets or histogram min. However, you still
				271	need to choose the histogram max (use the advice below).
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	272
				273	If none of the default macros work well for you, please thoughtfully choose
				274	a min, max, and bucket count for your histogram using the advice below.
				275
rkaplow	6dfcb89	2016-10-04 14:04:27	[diff] [blame]	276	#### Count Histograms: Choosing Min and Max
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	277
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	278	For histogram max, choose a value such that very few emissions to the histogram
Robert Kaplow	cbc6fd6	2021-03-19 15:11:40	[diff] [blame]	279	exceed the max. If a metric emission is above the max value, it will get put
				280	into an "overflow" bucket. If this bucket is too large, it can be difficult to
				281	compute statistics. One rule of thumb is at most 1% of samples should be in the
				282	overflow bucket (and ideally, less). This allows analysis of the 99th
				283	percentile. Err on the side of too large a range versus too short a range.
				284	(Remember that if you choose poorly, you'll have to wait for another release
				285	cycle to fix it.)
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	286
				287	For histogram min, if you care about all possible values (zero and above),
Robert Kaplow	cbc6fd6	2021-03-19 15:11:40	[diff] [blame]	288	choose a min of 1. All histograms have an underflow bucket for emitted zeros,
				289	so a min of 1 is appropriate. Otherwise, choose the min appropriate for your
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	290	particular situation.
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	291
rkaplow	6dfcb89	2016-10-04 14:04:27	[diff] [blame]	292	#### Count Histograms: Choosing Number of Buckets
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	293
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	294	Choose the smallest number of buckets that give you the granularity you need. By
				295	default, count histogram bucket sizes scale exponentially so you can get fine
				296	granularity when the numbers are small yet still reasonable resolution for
				297	larger numbers. The macros default to 50 buckets (or 100 buckets for histograms
				298	with wide ranges), which is appropriate for most purposes. Because histograms
				299	pre-allocate all the buckets, the number of buckets selected directly dictates
				300	how much memory is used. Do not exceed 100 buckets without good reason (and
				301	consider whether [sparse histograms](#When-To-Use-Sparse-Histograms) might work
				302	better for you in that case—they do not pre-allocate their buckets).
rkaplow	8a62ef6	2016-10-06 14:42:34	[diff] [blame]	303
Mark Pearson	6be2f35c	2018-08-14 07:06:02	[diff] [blame]	304	### Timing Histograms
				305
				306	You can easily emit a time duration (time delta) using UMA_HISTOGRAM_TIMES,
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	307	UMA_HISTOGRAM_MEDIUM_TIMES, UMA_HISTOGRAM_LONG_TIMES macros, and their
				308	friends, as well as helpers like SCOPED_UMA_HISTOGRAM_TIMER. Many timing
Mark Pearson	6be2f35c	2018-08-14 07:06:02	[diff] [blame]	309	histograms are used for performance monitoring; if this is the case for you,
				310	please read [this document about how to structure timing histograms to make
				311	them more useful and
Paul Jensen	5107d9c	2018-10-22 22:24:06	[diff] [blame]	312	actionable](https://chromium.googlesource.com/chromium/src/+/lkgr/docs/speed/diagnostic_metrics.md).
Mark Pearson	6be2f35c	2018-08-14 07:06:02	[diff] [blame]	313
Mark Pearson	49928ec	2018-06-05 20:15:49	[diff] [blame]	314	### Percentage or Ratio Histograms
				315
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	316	You can easily emit a percentage histogram using the UMA_HISTOGRAM_PERCENTAGE
				317	macro provided in
Mark Pearson	49928ec	2018-06-05 20:15:49	[diff] [blame]	318	[histogram_macros.h](https://cs.chromium.org/chromium/src/base/metrics/histogram_macros.h).
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	319	You can also easily emit any ratio as a linear histogram (for equally sized
				320	buckets).
Mark Pearson	49928ec	2018-06-05 20:15:49	[diff] [blame]	321
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	322	For such histograms, you want each value recorded to cover approximately the
				323	same span of time. This typically means emitting values periodically at a set
				324	time interval, such as every 5 minutes. We do not recommend recording a ratio at
				325	the end of a video playback, as video lengths vary greatly.
Mark Pearson	49928ec	2018-06-05 20:15:49	[diff] [blame]	326
Mark Pearson	9be8bffa	2020-03-03 19:08:02	[diff] [blame]	327	It is okay to emit at the end of an animation sequence when what's being
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	328	animated is fixed / known. In this case, each value represents roughly the same
				329	span of time.
Mark Pearson	9be8bffa	2020-03-03 19:08:02	[diff] [blame]	330
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	331	Why? You typically cannot make decisions based on histograms whose values are
				332	recorded in response to an event that varies in length because such metrics can
				333	conflate heavy usage with light usage. It's easier to reason about metrics that
				334	avoid this source of bias.
Mark Pearson	49928ec	2018-06-05 20:15:49	[diff] [blame]	335
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	336	Many developers have been bitten by this. For example, it was previously common
				337	to emit an actions-per-minute ratio whenever Chrome was backgrounded. Precisely,
				338	these metrics computed the number of uses of a particular action during a Chrome
				339	session, divided by length of time Chrome had been open. Sometimes, the recorded
				340	rate was based on a short interaction with Chrome–a few seconds or a minute.
				341	Other times, the recorded rate was based on a long interaction, tens of minutes
				342	or hours. These two situations are indistinguishable in the UMA logs–the
				343	recorded values can be identical.
Mark Pearson	49928ec	2018-06-05 20:15:49	[diff] [blame]	344
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	345	The inability to distinguish these two qualitatively different settings make
				346	such histograms effectively uninterpretable and not actionable. Emitting at a
				347	regular interval avoids the issue. Each value represents the same amount of time
				348	(e.g., one minute of video playback).
Mark Pearson	49928ec	2018-06-05 20:15:49	[diff] [blame]	349
rkaplow	8a62ef6	2016-10-06 14:42:34	[diff] [blame]	350	### Local Histograms
				351
Gayane Petrosyan	a6ee443c	2018-05-17 21:39:54	[diff] [blame]	352	Histograms can be added via [Local macros](https://codesearch.chromium.org/chromium/src/base/metrics/histogram_macros_local.h).
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	353	These still record locally, but are not uploaded to UMA and are therefore not
				354	available for analysis. This can be useful for metrics only needed for local
				355	debugging. We don't recommend using local histograms outside of that scenario.
rkaplow	8a62ef6	2016-10-06 14:42:34	[diff] [blame]	356
				357	### Multidimensional Histograms
				358
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	359	It is common to be interested in logging multidimensional data–where multiple
rkaplow	8a62ef6	2016-10-06 14:42:34	[diff] [blame]	360	pieces of information need to be logged together. For example, a developer may
				361	be interested in the counts of features X and Y based on whether a user is in
				362	state A or B. In this case, they want to know the count of X under state A,
				363	as well as the other three permutations.
				364
				365	There is no general purpose solution for this type of analysis. We suggest
				366	using the workaround of using an enum of length MxN, where you log each unique
				367	pair {state, feature} as a separate entry in the same enum. If this causes a
Gayane Petrosyan	a6ee443c	2018-05-17 21:39:54	[diff] [blame]	368	large explosion in data (i.e. >100 enum entries), a [sparse histogram](#When-To-Use-Sparse-Histograms)
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	369	may be appropriate. If you are unsure of the best way to proceed, please contact
				370	someone from the OWNERS file.
Gayane Petrosyan	a6ee443c	2018-05-17 21:39:54	[diff] [blame]	371
				372	## Histogram Expiry
				373
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	374	Histogram expiry is specified by the `expires_after` attribute in histogram
				375	descriptions in histograms.xml. The attribute can be specified as date in
				376	YYYY-MM-DD format or as Chrome milestone in M\*(e.g. M68) format. In the
				377	latter case, the actual expiry date is about 12 weeks after that branch is cut,
				378	or basically when it is replaced on the "stable" channel by the following
Brian White	fa0a3fa	2019-05-13 16:58:11	[diff] [blame]	379	release.
				380
Mark Pearson	ce4371c	2021-03-15 23:57:42	[diff] [blame]	381	After a histogram expires, it ceases to be displayed on the dashboard.
				382	Follow [these directions](#extending) to extend it.
Brian White	fa0a3fa	2019-05-13 16:58:11	[diff] [blame]	383
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	384	Once a histogram has expired, the code that records it becomes dead code and
				385	should be removed from the codebase along with marking the histogram definition
				386	as obsolete.
Gayane Petrosyan	a6ee443c	2018-05-17 21:39:54	[diff] [blame]	387
Brian White	8614f81	2019-02-07 21:07:01	[diff] [blame]	388	In rare cases, the expiry can be set to "never". This is used to denote
Robert Kaplow	cbc6fd6	2021-03-19 15:11:40	[diff] [blame]	389	metrics of critical importance that are, typically, used for other reports. For
				390	example, all metrics of the
				391	"[heartbeat](https://uma.googleplex.com/p/chrome/variations)" are set to never
				392	expire. All metrics that never expire must have an XML comment describing why so
				393	that it can be audited in the future. Setting an expiry to "never" must be
				394	reviewed by [email protected].
Brian White	8614f81	2019-02-07 21:07:01	[diff] [blame]	395
				396	```
				397	<!-- expires-never: "heartbeat" metric (internal: go/uma-heartbeats) -->
				398	```
				399
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	400	For all new histograms, the use of expiry attribute is strongly encouraged and
				401	enforced by the Chrome Metrics team through reviews.
Gayane Petrosyan	a6ee443c	2018-05-17 21:39:54	[diff] [blame]	402
				403	#### How to choose expiry for histograms
				404
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	405	If you are adding a histogram to evaluate a feature launch, set an expiry date
				406	consistent with the expected feature launch date. Otherwise, we recommend
				407	choosing 3-6 months.
Gayane Petrosyan	a6ee443c	2018-05-17 21:39:54	[diff] [blame]	408
Ilya Sherman	67418ea	2019-11-27 01:28:23	[diff] [blame]	409	Here are some guidelines for common scenarios:
Gayane Petrosyan	a6ee443c	2018-05-17 21:39:54	[diff] [blame]	410
Ilya Sherman	67418ea	2019-11-27 01:28:23	[diff] [blame]	411	* If the listed owner moved to different project, find a new owner.
				412	* If neither the owner nor the team uses the histogram, remove it.
				413	* If the histogram is not in use now, but might be useful in the far future,
				414	remove it.
				415	* If the histogram is not in use now, but might be useful in the near
				416	future, pick ~3 months or ~2 milestones ahead.
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	417	* If the histogram is actively in use now and is useful in the short term,
				418	pick 3-6 months or 2-4 milestones ahead.
Ilya Sherman	67418ea	2019-11-27 01:28:23	[diff] [blame]	419	* If the histogram is actively in use and seems useful for an indefinite time,
				420	pick 1 year.
				421
				422	We also have a tool that automatically extends expiry dates. The 80% more
				423	frequently accessed histograms are pushed out every Tuesday, to 6 months from
				424	the date of the run. Googlers can view the [design
				425	doc](https://docs.google.com/document/d/1IEAeBF9UnYQMDfyh2gdvE7WlUKsfIXIZUw7qNoU89A4).
Gayane Petrosyan	a6ee443c	2018-05-17 21:39:54	[diff] [blame]	426
Mark Pearson	ce4371c	2021-03-15 23:57:42	[diff] [blame]	427	#### How to extend an expired histogram {#extending}
				428
				429	You can revive an expired histogram by setting the expiration date to a
				430	date in the future.
				431
				432	There's some leeway here. A client may continue to send data for that
				433	histogram for some time after the official expiry date so simply bumping
				434	the 'expires_after' date at HEAD may be sufficient to resurrect it without
				435	any data discontinuity.
				436
				437	If a histogram expired more than a month ago (for histograms with an
				438	expiration date) or more than one milestone ago (for histograms with
				439	expiration milestones; this means top-of-tree is two or more milestones away
				440	from expired milestone), then you may be outside the safety window. In this
				441	case, when extending the histogram add to the histogram description a
				442	message: "Warning: this histogram was expired from DATE to DATE; data may be
				443	missing." (For milestones, write something similar.)
				444
				445	When reviving a histogram outside the safety window, realize the change to
				446	histograms.xml to revive it rolls out with the binary release. It takes
				447	some time to get to the stable channel.
				448
				449	It you need to revive it faster, the histogram can be re-enabled via adding to
				450	the [expired histogram allowlist](#Expired-histogram-allowlist).
				451
Gayane Petrosyan	a6ee443c	2018-05-17 21:39:54	[diff] [blame]	452	### Expired histogram notifier
				453
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	454	The expired histogram notifier notifies histogram owners before their histograms
				455	expire by creating crbugs, which are assigned to owners. This allows owners to
				456	extend the lifetime of their histograms, if needed, or deprecate them. The
				457	notifier regularly checks all histograms across the histograms.xml files and
				458	identifies expired or soon-to-be expired histograms. It then creates or updates
				459	crbugs accordingly.
Gayane Petrosyan	a6ee443c	2018-05-17 21:39:54	[diff] [blame]	460
Caitlin Fischer	9f484105	2020-11-04 21:02:44	[diff] [blame]	461	### Expired histogram allowlist
Gayane Petrosyan	a6ee443c	2018-05-17 21:39:54	[diff] [blame]	462
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	463	If a histogram expires but turns out to be useful, you can add the histogram's
Caitlin Fischer	9f484105	2020-11-04 21:02:44	[diff] [blame]	464	name to the allowlist until the updated expiration date reaches the stable
				465	channel. When doing so, update the histogram's summary to document the period
				466	during which the histogram's data is incomplete. To add a histogram to the
				467	allowlist, see the internal documentation:
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	468	[Histogram Expiry](https://goto.google.com/histogram-expiry-gdoc).
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	469
mpearson	72a5c9139	2017-05-09 22:49:44	[diff] [blame]	470	## Testing
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	471
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	472	Test your histograms using `chrome://histograms`. Make sure they're being
rkaplow	6dfcb89	2016-10-04 14:04:27	[diff] [blame]	473	emitted to when you expect and not emitted to at other times. Also check that
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	474	the values emitted to are correct. Finally, for count histograms, make sure
rkaplow	6dfcb89	2016-10-04 14:04:27	[diff] [blame]	475	that buckets capture enough precision for your needs over the range.
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	476
Ivan Sandrk	8ffc583	2018-07-09 12:34:58	[diff] [blame]	477	Pro tip: You can filter the set of histograms shown on `chrome://histograms` by
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	478	specifying a prefix. For example, `chrome://histograms/Extensions.Load` shows
				479	only histograms whose names match the pattern "Extensions.Load*".
Ivan Sandrk	8ffc583	2018-07-09 12:34:58	[diff] [blame]	480
mpearson	72a5c9139	2017-05-09 22:49:44	[diff] [blame]	481	In addition to testing interactively, you can have unit tests examine the
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	482	values emitted to histograms. See [histogram_tester.h](https://cs.chromium.org/chromium/src/base/test/metrics/histogram_tester.h)
mpearson	72a5c9139	2017-05-09 22:49:44	[diff] [blame]	483	for details.
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	484
Mark Pearson	4c4bc97	2018-05-16 20:01:06	[diff] [blame]	485	## Interpreting the Resulting Data
				486
				487	The top of [go/uma-guide](http://go/uma-guide) has good advice on how to go
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	488	about analyzing and interpreting the results of UMA data uploaded by users. If
Mark Pearson	4c4bc97	2018-05-16 20:01:06	[diff] [blame]	489	you're reading this page, you've probably just finished adding a histogram to
				490	the Chromium source code and you're waiting for users to update their version of
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	491	Chrome to a version that includes your code. In this case, the best advice is
				492	to remind you that users who update frequently / quickly are biased. Best take
Mark Pearson	4c4bc97	2018-05-16 20:01:06	[diff] [blame]	493	the initial statistics with a grain of salt; they're probably mostly right but
				494	not entirely so.
				495
mpearson	72a5c9139	2017-05-09 22:49:44	[diff] [blame]	496	## Revising Histograms
				497
Robert Kaplow	cbc6fd6	2021-03-19 15:11:40	[diff] [blame]	498	When changing the semantics of a histogram (when it's emitted, what the buckets
				499	represent, the bucket range or number of buckets, etc.), create a new histogram
				500	with a new name. Otherwise analysis that mixes the data pre- and post- change
				501	may be misleading. If the histogram name is still the best name choice, the
				502	recommendation is to simply append a '2' to the name. See [Cleaning Up Histogram
				503	Entries](#Cleaning-Up-Histogram-Entries) for details on how to handle the XML
				504	changes.
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	505
mpearson	72a5c9139	2017-05-09 22:49:44	[diff] [blame]	506	## Deleting Histograms
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	507
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	508	Please delete code that emits to histograms that are no longer needed.
				509	Histograms take up memory. Cleaning up histograms that you no longer care
				510	about is good! But see the note below on
Mark Pearson	2a311c5	2019-03-19 21:47:01	[diff] [blame]	511	[Cleaning Up Histogram Entries](#Cleaning-Up-Histogram-Entries).
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	512
				513	## Documenting Histograms
				514
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	515	Document histograms in [histograms.xml](./histograms.xml). There is also a
Mark Pearson	159c3897	2018-06-05 19:44:08	[diff] [blame]	516	[google-internal version of the file](http://go/chrome-histograms-internal) for
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	517	the rare case in which the histogram is confidential (added only to Chrome code,
Mark Pearson	159c3897	2018-06-05 19:44:08	[diff] [blame]	518	not Chromium code; or, an accurate description about how to interpret the
				519	histogram would reveal information about Google's plans).
				520
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	521	### Add Histogram and Documentation in the Same Changelist
				522
vapier	52b9aba	2016-12-14 06:09:25	[diff] [blame]	523	If possible, please add the [histograms.xml](./histograms.xml) description in
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	524	the same changelist in which you add the histogram-emitting code. This has
				525	several benefits. One, it sometimes happens that the
vapier	52b9aba	2016-12-14 06:09:25	[diff] [blame]	526	[histograms.xml](./histograms.xml) reviewer has questions or concerns about the
				527	histogram description that reveal problems with interpretation of the data and
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	528	call for a different recording strategy. Two, it allows the histogram reviewer
vapier	52b9aba	2016-12-14 06:09:25	[diff] [blame]	529	to easily review the emission code to see if it comports with these best
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	530	practices and to look for other errors.
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	531
				532	### Understandable to Everyone
				533
				534	Histogram descriptions should be roughly understandable to someone not familiar
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	535	with your feature. Please add a sentence or two of background if necessary.
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	536
Robert Kaplow	cbc6fd6	2021-03-19 15:11:40	[diff] [blame]	537	Note any caveats associated with your histogram in the summary. For example, if
				538	the set of supported platforms is surprising, such as if a desktop feature is
				539	not available on Mac, the summary should explain where it is recorded. It is
				540	also common to have caveats along the lines of "this histogram is only recorded
				541	if X" (e.g., upon a successful connection to a service, a feature is enabled by
				542	the user).
				543
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	544
				545	### State When It Is Recorded
				546
				547	Histogram descriptions should clearly state when the histogram is emitted
				548	(profile open? network request received? etc.).
				549
Mark Pearson	d8fc9fd2	2021-03-12 20:18:58	[diff] [blame]	550	Some histograms record error conditions. These should be clear about whether
				551	all errors are recorded or only the first. If only the first, the histogram
				552	description should have text like:
				553	```
				554	In the case of multiple errors, only the first reason encountered is recorded. Refer
				555	to Class::FunctionImplementingLogic() for details.
				556	```
				557
Ilya Sherman	470c95a	2020-09-21 23:05:43	[diff] [blame]	558	### Provide Clear Units or Enum Labels
				559
				560	For enumerated histograms, including boolean and sparse histograms, provide an
				561	`enum=` attribute mapping enum values to semantically contentful labels. Define
				562	the `<enum>` in enums.xml if none of the existing enums are a good fit. Use
				563	labels whenever they would be clearer than raw numeric values.
				564
				565	For non-enumerated histograms, include a `units=` attribute. Be specific:
				566	e.g. distinguish "MB" vs. "MiB", refine generic labels like "counts" to more
				567	precise labels like "pages", etc.
				568
jsbell	da3a66c	2017-02-09 21:40:32	[diff] [blame]	569	### Owners
rkaplow	8a62ef6	2016-10-06 14:42:34	[diff] [blame]	570
Caitlin Fischer	254a12f7	2019-07-31 20:57:03	[diff] [blame]	571	Histograms need owners, who are the experts on the metric and the points of
				572	contact for any questions or maintenance tasks, such as extending a histogram's
				573	expiry or deprecating the metric.
rkaplow	8a62ef6	2016-10-06 14:42:34	[diff] [blame]	574
Caitlin Fischer	254a12f7	2019-07-31 20:57:03	[diff] [blame]	575	Histograms must have a primary owner and may have secondary owners. A primary
Mario Bianucci	9947bbd	2020-10-28 17:41:47	[diff] [blame]	576	owner is a Googler with an @google.com or @chromium.org email address, e.g.
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	577	<owner>[email protected]</owner>, who is ultimately responsible for maintaining
				578	the metric. Secondary owners may be other individuals, team mailing lists, e.g.
				579	<owner>[email protected]</owner>, or paths to OWNERS files, e.g.
				580	<owner>src/directory/OWNERS</owner>.
Mark Pearson	74c5321	2019-03-08 00:34:08	[diff] [blame]	581
Caitlin Fischer	254a12f7	2019-07-31 20:57:03	[diff] [blame]	582	It's a best practice to list multiple owners, so that there's no single point
				583	of failure for histogram-related questions and maintenance tasks. If you are
				584	using a metric heavily and understand it intimately, feel free to add yourself
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	585	as an owner.
Mark Pearson	74c5321	2019-03-08 00:34:08	[diff] [blame]	586
Caitlin Fischer	254a12f7	2019-07-31 20:57:03	[diff] [blame]	587	Notably, owners are asked to determine whether histograms have outlived their
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	588	usefulness. When a histogram is nearing expiry, a robot files a reminder bug in
				589	Monorail. It's important that somebody familiar with the histogram notices and
				590	triages such bugs!
rkaplow	8a62ef6	2016-10-06 14:42:34	[diff] [blame]	591
Ilya Sherman	f64bca25	2020-11-10 23:16:24	[diff] [blame]	592	Tip: When removing someone from the owner list for a histogram, it's a nice
				593	courtesy to ask them for approval.
				594
Caitlin Fischer	feafb439	2020-10-05 21:10:07	[diff] [blame]	595	### Components
				596
				597	Histograms may be associated with components, which can help make sure that
				598	histogram expiry bugs don't fall through the cracks.
				599
				600	There are two ways in which components may be associated with a histogram. The
				601	first and recommended way is to add a tag to a histogram or histogram suffix,
				602	e.g. <component>UI>Shell</component>. The second way is to specify an OWNERS
				603	file as a secondary owner for a histogram. If the OWNERS file contains a
				604	component, then the component is associated with the histogram. If the specified
				605	OWNERS file doesn't have a component, but an OWNERS file in a parent directory
				606	does, then the parent directory's component is used.
				607
Mark Pearson	2a311c5	2019-03-19 21:47:01	[diff] [blame]	608	### Cleaning Up Histogram Entries
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	609
Henrique Nakashima	78c4547c	2021-03-25 21:56:42	[diff] [blame]	610	Do not delete histograms from histograms.xml files or move them to
				611	obsolete_histograms.xml. Instead, mark unused histograms as obsolete and
				612	annotate them with the date or milestone in the `<obsolete>` tag entry. They
				613	will later get moved to obsolete_histograms.xml via tooling.
Mark Pearson	2a311c5	2019-03-19 21:47:01	[diff] [blame]	614
Ilya Sherman	9e22dea	2020-10-05 22:32:36	[diff] [blame]	615	If deprecating only some variants of a
				616	[patterned histogram](#Patterned-Histograms), mark each deprecated `<variant>`
				617	as obsolete as well. Similarly, if the histogram used histogram suffixes, mark
				618	the suffix entry for the histogram as obsolete.
Mark Pearson	2a311c5	2019-03-19 21:47:01	[diff] [blame]	619
				620	If the histogram is being replaced by a new version:
				621
				622	* Note in the `<obsolete>` message the name of the replacement histogram.
				623
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	624	* Make sure the descriptions of the original and replacement histogram are
				625	different. It's never appropriate for them to be identical. Either the old
				626	description was wrong, and it should be revised to explain what it actually
				627	measured, or the old histogram was measuring something not as useful as the
				628	replacement, in which case the new histogram is measuring something different
				629	and needs to have a new description.
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	630
Mark Pearson	a010912	2018-05-30 18:23:05	[diff] [blame]	631	A changelist that marks a histogram as obsolete should be reviewed by all
				632	current owners.
				633
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	634	Deleting histogram entries would be bad if someone to accidentally reused your
				635	old histogram name and thereby corrupts new data with whatever old data is still
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	636	coming in. It's also useful to keep obsolete histogram descriptions in
				637	[histograms.xml](./histograms.xml)—that way, if someone is searching for a
vapier	52b9aba	2016-12-14 06:09:25	[diff] [blame]	638	histogram to answer a particular question, they can learn if there was a
				639	histogram at some point that did so even if it isn't active now.
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	640
Ilya Sherman	8f0034a	2020-07-22 22:06:34	[diff] [blame]	641	Exception: It is ok to delete the metadata for any histogram that has never
				642	been recorded to. For example, it's fine to correct a typo where the histogram
				643	name in the metadata does not match the name in the Chromium source code.
				644
Ilya Sherman	9e22dea	2020-10-05 22:32:36	[diff] [blame]	645	### Patterned Histograms
Ilya Sherman	f54104b	2017-07-12 23:45:47	[diff] [blame]	646
				647	It is sometimes useful to record several closely related metrics, which measure
Ilya Sherman	9e22dea	2020-10-05 22:32:36	[diff] [blame]	648	the same type of data, with some minor variations. You can declare the metadata
				649	for these concisely using patterned histograms. For example:
Ilya Sherman	f54104b	2017-07-12 23:45:47	[diff] [blame]	650
Ilya Sherman	9e22dea	2020-10-05 22:32:36	[diff] [blame]	651	```xml
Robert Kaplow	e1430ce	2021-03-25 19:02:18	[diff] [blame]	652	<histogram name="Pokemon.{Character}.EfficacyAgainst{OpponentType}"
				653	units="multiplier" expires_after="M95">
Ilya Sherman	9e22dea	2020-10-05 22:32:36	[diff] [blame]	654	<owner>[email protected]</owner>
				655	<owner>[email protected]</owner>
				656	<summary>
				657	The efficacy multiplier for {Character} against an opponent of
				658	{OpponentType} type.
				659	</summary>
				660	<token key="Character">
				661	<variant name="Bulbasaur"/>
				662	<variant name="Charizard"/>
				663	<variant name="Mewtwo"/>
				664	</token>
				665	<token key="OpponentType">
				666	<variant name="Dragon" summary="dragon"/>
				667	<variant name="Flying" summary="flappity-flap"/>
				668	<variant name="Psychic" summary="psychic"/>
				669	<variant name="Water" summary="water"/>
				670	</token>
				671	</histogram>
				672	```
				673
				674	This example defines metadata for 12 (= 3 x 4) concrete histograms, such as
				675
				676	```xml
Robert Kaplow	e1430ce	2021-03-25 19:02:18	[diff] [blame]	677	<histogram name="Pokemon.Charizard.EfficacyAgainstWater"
				678	units="multiplier" expires_after="M95">
Ilya Sherman	9e22dea	2020-10-05 22:32:36	[diff] [blame]	679	<owner>[email protected]</owner>
				680	<owner>[email protected]</owner>
				681	<summary>
				682	The efficacy multiplier for Charizard against an opponent of water type.
				683	</summary>
				684	</histogram>
				685	```
				686
				687	Note that each token `<variant>` defines what text should be substituted for it,
				688	both in the histogram name and in the summary text. As shorthand, a `<variant>`
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	689	that omits the `summary` attribute substitutes the value of the `name` attribute
				690	in the histogram's `<summary>` text as well.
Ilya Sherman	9e22dea	2020-10-05 22:32:36	[diff] [blame]	691
				692	*** promo
				693	Tip: You can declare an optional token by listing an empty name: `<variant
				694	name="" summary="aggregated across all breakdowns"/>`. This can be useful when
				695	recording a "parent" histogram that aggregates across a set of breakdowns.
				696	***
				697
				698	You can use the `<variants>` tag to define a set of `<variant>`s out-of-line.
				699	This is useful for token substitutions that are shared among multiple families
				700	of histograms. See
				701	[histograms.xml](https://source.chromium.org/search?q=file:histograms.xml%20%3Cvariants)
				702	for examples.
				703
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	704	By default, a `<variant>` inherits the owners declared for the patterned
Ilya Sherman	9e22dea	2020-10-05 22:32:36	[diff] [blame]	705	histogram. Each variant can optionally override the inherited list with custom
				706	owners:
				707	```xml
				708	<variant name="SubteamBreakdown" ...>
				709	<owner>[email protected]</owner>
				710	<owner>[email protected]</owner>
				711	</variant>
				712	```
Mark Pearson	a010912	2018-05-30 18:23:05	[diff] [blame]	713
Mark Pearson	2a311c5	2019-03-19 21:47:01	[diff] [blame]	714	As [with histogram entries](#Cleaning-Up-Histogram-Entries), never delete
Ilya Sherman	9e22dea	2020-10-05 22:32:36	[diff] [blame]	715	variants. If the variant expansion is no longer used, mark it as `<obsolete>`.
Mark Pearson	2a311c5	2019-03-19 21:47:01	[diff] [blame]	716
Ilya Sherman	9e22dea	2020-10-05 22:32:36	[diff] [blame]	717	*** promo
Oksana Zhuravlova	5242ad2	2021-02-19 00:14:20	[diff] [blame]	718	Tip: You can run `print_expanded_histograms.py --pattern=` to show all generated
Weilun Shi	bac61d9d3	2020-11-12 02:40:26	[diff] [blame]	719	histograms by patterned histograms or histogram suffixes including their
				720	summaries and owners. For example, this can be run (from the repo root) as:
				721	```
Oksana Zhuravlova	5242ad2	2021-02-19 00:14:20	[diff] [blame]	722	./tools/metrics/histograms/print_expanded_histograms.py --pattern=^UMA.A.B
Weilun Shi	bac61d9d3	2020-11-12 02:40:26	[diff] [blame]	723	```
				724	***
				725
				726	*** promo
Ilya Sherman	9e22dea	2020-10-05 22:32:36	[diff] [blame]	727	Tip: You can run `print_histogram_names.py --diff` to enumerate all the
				728	histogram names that are generated by a particular CL. For example, this can be
				729	run (from the repo root) as:
Charlie Harrison	90407d9	2020-05-19 23:57:32	[diff] [blame]	730	```
				731	./tools/metrics/histograms/print_histogram_names.py --diff origin/master
				732	```
Ilya Sherman	9e22dea	2020-10-05 22:32:36	[diff] [blame]	733	***
				734
				735	For documentation about the `<histogram_suffixes>` syntax, which is deprecated,
				736	see
				737	https://chromium.googlesource.com/chromium/src/+/refs/tags/87.0.4270.1/tools/metrics/histograms/one-pager.md#histogram-suffixes-deprecated-in-favor-of-pattern-histograms
Charlie Harrison	90407d9	2020-05-19 23:57:32	[diff] [blame]	738
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	739	## When To Use Sparse Histograms
				740
Caitlin Fischer	b5e9435	2020-10-27 17:34:50	[diff] [blame]	741	Sparse histograms are well-suited for recording counts of exact sample values
				742	that are sparsely distributed over a large range. They can be used with enums
Ilya Sherman	1eee82c4c	2017-12-08 01:22:19	[diff] [blame]	743	as well as regular integer values. It is often valuable to provide labels in
				744	[enums.xml](./enums.xml).
mpearson	2b5f7e0	2016-10-03 21:27:03	[diff] [blame]	745
				746	The implementation uses a lock and a map, whereas other histogram types use a
				747	vector and no lock. It is thus more costly to add values to, and each value
				748	stored has more overhead, compared to the other histogram types. However it
				749	may be more efficient in memory if the total number of sample values is small
				750	compared to the range of their values.
				751
Mark Pearson	ed73f1f	2019-03-22 18:00:12	[diff] [blame]	752	Please talk with the metrics team if there are more than a thousand possible
				753	different values that you could emit.
				754
rkaplow	6dfcb89	2016-10-04 14:04:27	[diff] [blame]	755	For more information, see [sparse_histograms.h](https://cs.chromium.org/chromium/src/base/metrics/sparse_histogram.h).
Caitlin Fischer	b466a04	2019-07-31 21:41:46	[diff] [blame]	756
Ilya Sherman	f64bca25	2020-11-10 23:16:24	[diff] [blame]	757
Caitlin Fischer	b466a04	2019-07-31 21:41:46	[diff] [blame]	758	# Team Documentation
				759
Ilya Sherman	f64bca25	2020-11-10 23:16:24	[diff] [blame]	760	## Reviewing Metrics CLs
				761
Robert Kaplow	cbc6fd6	2021-03-19 15:11:40	[diff] [blame]	762	If you are a metric OWNER, you have the serious responsibility of ensuring
				763	Chrome's data collection is following best practices. If there's any concern
				764	about an incoming metrics changelist, please escalate by assigning to
				765	[email protected].
				766
Ilya Sherman	f64bca25	2020-11-10 23:16:24	[diff] [blame]	767	When reviewing metrics CLs, look at the following, listed in approximate order
				768	of importance:
				769
				770	### Privacy
				771
				772	Does anything tickle your privacy senses? (Googlers, see
				773	[go/uma-privacy](https://goto.google.com/uma-privacy) for guidelines.)
				774
				775	Please escalate if there's any doubt!
				776
				777	### Clarity
				778
				779	Is the metadata clear enough for [all Chromies](#Understandable-to-Everyone) to
				780	understand what the metric is recording? Consider the histogram name,
				781	description, units, enum labels, etc.
				782
				783	It's really common for developers to forget to list [when the metric is
				784	recorded](#State-When-It-Is-Recorded). This is particularly important context,
				785	so please remind developers to clearly document it.
				786
				787	Note: Clarity is a bit less important for very niche metrics used only by a
				788	couple of engineers. However, it's hard to assess the metric design and
				789	correctness if the metadata is especially unclear.
				790
				791	### Metric design
				792
				793	* Does the metric definition make sense?
				794	* Will the resulting data be interpretable at analysis time?
				795
				796	### Correctness
				797
				798	Is the histogram being recorded correctly?
				799
				800	* Does the bucket layout look reasonable?
				801
				802	* The metrics APIs like base::UmaHistogram* have some sharp edges,
				803	especially for the APIs that require specifying the number of
				804	buckets. Check for off-by-one errors and unused buckets.
				805
				806	* Is the bucket layout efficient? Typically, push back if there are >50
				807	buckets -- this can be ok in some cases, but make sure that the CL author
				808	has consciously considered the tradeoffs here and is making a reasonable
				809	choice.
				810
				811	* For timing metrics, do the min and max bounds make sense for the duration
				812	that is being measured?
				813
				814	* The base::UmaHistogram* functions are
				815	[generally preferred](#Coding-Emitting-to-Histograms) over the
				816	UMA_HISTOGRAM_* macros. If using the macros, remember that names must be
				817	runtime constants!
				818
				819	Also, related to [clarity](#Clarity): Does the client logic correctly implement
				820	the metric described in the XML metadata? Some common errors to watch out for:
				821
				822	* The metric is only emitted within an if-stmt (e.g., only if some data is
				823	available) and this restriction isn't mentioned in the metadata description.
				824
				825	* The metric description states that it's recorded when X happens, but it's
				826	actually recorded when X is scheduled to occur, or only emitted when X
				827	succeeds (but omitted on failure), etc.
				828
				829	When the metadata and the client logic do not match, the appropriate solution
				830	might be to update the metadata, or it might be to update the client
				831	logic. Guide this decision by considering what data will be more easily
				832	interpretable and what data will have hidden surprises/gotchas.
				833
				834	### Sustainability
				835
Robert Kaplow	cd6e042	2021-04-07 21:58:53	[diff] [blame^]	836	* Is the CL adding a reasonable number of metrics/buckets?
Ilya Sherman	f64bca25	2020-11-10 23:16:24	[diff] [blame]	837	* When reviewing a CL that is trying to add many metrics at once, guide the CL
				838	author toward an appropriate solution for their needs. For example,
				839	multidimensional metrics can be recorded via UKM, and we are currently
Robert Kaplow	cd6e042	2021-04-07 21:58:53	[diff] [blame^]	840	building support for structured metrics in UMA.
				841	* There's no hard rule, but anything above 20 separate histograms should be
				842	escalated by being assigned to [email protected].
				843	* Similarly, any histogram with more than 100 possible buckets should be
				844	escalated by being assigned to [email protected].
Ilya Sherman	f64bca25	2020-11-10 23:16:24	[diff] [blame]	845
				846	* Are expiry dates being set
				847	[appropriately](#How-to-choose-expiry-for-histograms)?
				848
				849	### Everything Else!
				850
				851	This document describes many other nuances that are important for defining and
				852	recording useful metrics. Check CLs for these other types of issues as well.
				853
				854	And, as you would with a language style guide, periodically re-review the doc to
				855	stay up to date on the details.
				856
				857	### Becoming a Metrics Owner
				858
				859	If you would like to be listed as one of the OWNERS for metrics metadata, reach
				860	out to one of the existing //base/metrics/OWNERS. Similar to language
				861	readability review teams, we have a reverse shadow onboarding process:
				862
				863	1. First, read through this document to get up to speed on best practices.
				864
				865	2. Partner up with an experienced reviewer from //base/metrics/OWNERS.
				866
				867	3. Join the cs/chrome-metrics.gwsq.
				868
				869	Note: This step is optional if you are not on the metrics team. Still,
				870	consider temporarily joining the metrics gwsq as a quick way to get a breadth
				871	of experience. You can remove yourself once your training is completed.
				872
				873	4. Start reviewing CLs! Once you're ready to approve a CL, add a comment like "I
				874	am currently ramping up as a metrics reviewer, +username for OWNERS approval"
				875	and add your partner as a reviewer on the CL. Once at a point where there's
				876	pretty good alignment in the code review feedback, your partner will add you
				877	to the OWNERS file.
				878
Caitlin Fischer	b466a04	2019-07-31 21:41:46	[diff] [blame]	879
				880	## Processing histograms.xml
				881
				882	When working with histograms.xml, verify whether you require fully expanded
				883	OWNERS files. Many scripts in this directory process histograms.xml, and
				884	sometimes OWNERS file paths are expanded and other times they are not. OWNERS
				885	paths are expanded when scripts make use of merge_xml's function MergeFiles;
				886	otherwise, they are not.