mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 1 | # Histogram Guidelines |
| 2 | |
| 3 | This document gives the best practices on how to use histograms in code and how |
rkaplow | 6dfcb89 | 2016-10-04 14:04:27 | [diff] [blame] | 4 | to document the histograms for the dashboards. There are three general types |
vapier | 52b9aba | 2016-12-14 06:09:25 | [diff] [blame] | 5 | of histograms: enumerated histograms, count histograms (for arbitrary numbers), |
| 6 | and sparse histograms (for anything when the precision is important over a wide |
| 7 | range and/or the range is not possible to specify a priori). |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 8 | |
| 9 | [TOC] |
| 10 | |
Mark Pearson | b1d608d | 2018-06-05 19:59:44 | [diff] [blame] | 11 | ## Naming Your Histogram |
| 12 | |
| 13 | Histogram names should be in the form Group.Name or Group.Subgroup.Name, |
| 14 | etc., where each group organizes related histograms. |
| 15 | |
Mark Pearson | 4c4bc97 | 2018-05-16 20:01:06 | [diff] [blame] | 16 | ## Coding (Emitting to Histograms) |
| 17 | |
Daniel Cheng | 01cd7593 | 2020-02-06 16:43:45 | [diff] [blame^] | 18 | Prefer the helper functions defined in |
Mark Pearson | ed73f1f | 2019-03-22 18:00:12 | [diff] [blame] | 19 | [histogram_functions.h](https://cs.chromium.org/chromium/src/base/metrics/histogram_functions.h). |
Daniel Cheng | 01cd7593 | 2020-02-06 16:43:45 | [diff] [blame^] | 20 | These functions take a lock and perform a map lookup, but the overhead is |
| 21 | generally insignificant. However, when recording metrics on the critical path |
| 22 | (e.g. called in a loop or logged multiple times per second), use the macros in |
| 23 | [histogram_macros.h](https://cs.chromium.org/chromium/src/base/metrics/histogram_macros.h) |
| 24 | instead. These macros cache a pointer to the histogram object for efficiency, |
| 25 | though this comes at the cost of increased binary size: 130 bytes/macro usage |
| 26 | sounds small but quickly adds up. |
Mark Pearson | 159c3897 | 2018-06-05 19:44:08 | [diff] [blame] | 27 | |
Mark Pearson | 4c4bc97 | 2018-05-16 20:01:06 | [diff] [blame] | 28 | ### Don't Use the Same Histogram Logging Call in Multiple Places |
| 29 | |
| 30 | These logging macros and functions have long names and sometimes include extra |
| 31 | parameters (defining the number of buckets for example). Use a helper function |
| 32 | if possible. This leads to shorter, more readable code that's also more |
| 33 | resilient to problems that could be introduced when making changes. (One could, |
| 34 | for example, erroneously change the bucketing of the histogram in one call but |
| 35 | not the other.) |
| 36 | |
| 37 | ### Use Fixed Strings When Using Histogram Macros |
| 38 | |
| 39 | When using histogram macros (calls such as `UMA_HISTOGRAM_ENUMERATION`), you're |
Victor-Gabriel Savu | b2afb6f4 | 2019-10-23 07:28:23 | [diff] [blame] | 40 | not allowed to construct your string dynamically so that it can vary at a |
Mark Pearson | 74c5321 | 2019-03-08 00:34:08 | [diff] [blame] | 41 | callsite. At a given callsite (preferably you have only one), the string |
| 42 | should be the same every time the macro is called. If you need to use dynamic |
| 43 | names, use the functions in histogram_functions.h instead of the macros. |
Mark Pearson | 4c4bc97 | 2018-05-16 20:01:06 | [diff] [blame] | 44 | |
| 45 | ### Don't Use Same String in Multiple Places |
| 46 | |
| 47 | If you must use the histogram name in multiple places, use a compile-time |
| 48 | constant of appropriate scope that can be referenced everywhere. Using inline |
| 49 | strings in multiple places can lead to errors if you ever need to revise the |
| 50 | name and you update one one location and forget another. |
| 51 | |
| 52 | ### Efficiency |
| 53 | |
Mark Pearson | ed73f1f | 2019-03-22 18:00:12 | [diff] [blame] | 54 | Generally, don't be concerned about the processing cost of emitting to a |
| 55 | histogram (unless you're using [sparse |
| 56 | histograms](#When-To-Use-Sparse-Histograms)). The normal histogram code is |
| 57 | highly optimized. If you are recording to a histogram in particularly |
| 58 | performance-sensitive or "hot" code, make sure you're using the histogram |
| 59 | macros; see [reasons above](#Coding-Emitting-to-Histograms). |
Mark Pearson | 4c4bc97 | 2018-05-16 20:01:06 | [diff] [blame] | 60 | |
| 61 | ## Picking Your Histogram Type |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 62 | |
| 63 | ### Directly Measure What You Want |
| 64 | |
| 65 | Measure exactly what you want, whether that's time used for a function call, |
| 66 | number of bytes transmitted to fetch a page, number of items in a list, etc. |
| 67 | Do not assume you can calculate what you want from other histograms. Most of |
| 68 | the ways to do this are incorrect. For example, if you want to know the time |
| 69 | taken by a function that all it does is call two other functions, both of which |
| 70 | are have histogram logging, you might think you can simply add up those |
| 71 | the histograms for those functions to get the total time. This is wrong. |
| 72 | If we knew which emissions came from which calls, we could pair them up and |
| 73 | derive the total time for the function. However, histograms entries do not |
| 74 | come with timestamps--we pair them up appropriately. If you simply add up the |
| 75 | two histograms to get the total histogram, you're implicitly assuming those |
| 76 | values are independent, which may not be the case. Directly measure what you |
| 77 | care about; don't try to derive it from other data. |
| 78 | |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 79 | ### Enum Histograms |
| 80 | |
| 81 | Enumerated histogram are most appropriate when you have a list of connected / |
| 82 | related states that should be analyzed jointly. For example, the set of |
| 83 | actions that can be done on the New Tab Page (use the omnibox, click a most |
| 84 | visited tile, click a bookmark, etc.) would make a good enumerated histogram. |
| 85 | If the total count of your histogram (i.e. the sum across all buckets) is |
| 86 | something meaningful--as it is in this example--that is generally a good sign. |
| 87 | However, the total count does not have to be meaningful for an enum histogram |
| 88 | to still be the right choice. |
| 89 | |
Mark Pearson | a768d022 | 2019-03-20 02:16:00 | [diff] [blame] | 90 | Enumerated histograms are also appropriate for counting events. Use a simple |
| 91 | boolean histogram. It's okay if you only log to one bucket (say, `true`). |
| 92 | It's usually best (though not necessary), if you have a comparison point in |
| 93 | the same histogram. For example, if you want to count pages opened from the |
| 94 | history page, it might be a useful comparison to have the same histogram |
| 95 | record the number of times the history page was opened. |
| 96 | |
Daniel Cheng | 914170d2 | 2019-05-08 09:46:32 | [diff] [blame] | 97 | If only a few buckets will be emitted to, consider using a [sparse |
Mark Pearson | 4d0b463 | 2017-10-04 21:58:48 | [diff] [blame] | 98 | histogram](#When-To-Use-Sparse-Histograms). |
| 99 | |
Daniel Cheng | 914170d2 | 2019-05-08 09:46:32 | [diff] [blame] | 100 | #### Requirements |
| 101 | |
| 102 | Enums logged in histograms must: |
| 103 | |
| 104 | - be prefixed with the comment: |
| 105 | ```c++ |
| 106 | // These values are persisted to logs. Entries should not be renumbered and |
| 107 | // numeric values should never be reused. |
| 108 | ``` |
| 109 | - be numbered starting from `0`. Note this bullet point does *not* apply for |
| 110 | enums logged with sparse histograms. |
| 111 | - have enumerators with explicit values (`= 0`, `= 1`, `= 2`), to make it clear |
| 112 | that the actual values are important. This also makes it easy to match the |
| 113 | values between the C++/Java definition and [histograms.xml](./histograms.xml). |
| 114 | - not renumber or reuse enumerator values. When adding a new enumerator, append |
| 115 | the new enumerator to the end. When removing an unused enumerator, comment it |
| 116 | out, making it clear the value was previously used. |
| 117 | |
| 118 | If your enum histogram has a catch-all / miscellaneous bucket, put that bucket |
| 119 | first (`= 0`). This will make the bucket easy to find on the dashboard if |
| 120 | additional buckets are added later. |
| 121 | |
| 122 | #### Usage |
| 123 | |
| 124 | Define an `enum class` with a `kMaxValue` enumerator: |
| 125 | |
Steven Holte | ecf841d | 2018-08-10 00:53:34 | [diff] [blame] | 126 | ```c++ |
Daniel Cheng | cda1df5b | 2018-03-30 21:30:16 | [diff] [blame] | 127 | enum class NewTabPageAction { |
| 128 | kUseOmnibox = 0, |
| 129 | kClickTitle = 1, |
Daniel Cheng | 914170d2 | 2019-05-08 09:46:32 | [diff] [blame] | 130 | // kUseSearchbox = 2, // no longer used, combined into omnibox |
| 131 | kOpenBookmark = 3, |
Daniel Cheng | cda1df5b | 2018-03-30 21:30:16 | [diff] [blame] | 132 | kMaxValue = kOpenBookmark, |
| 133 | }; |
| 134 | ``` |
Daniel Cheng | cda1df5b | 2018-03-30 21:30:16 | [diff] [blame] | 135 | |
Daniel Cheng | 914170d2 | 2019-05-08 09:46:32 | [diff] [blame] | 136 | `kMaxValue` is a special enumerator that must share the highest enumerator |
| 137 | value, typically done by aliasing it with the enumerator with the highest |
| 138 | value: clang automatically checks that `kMaxValue` is correctly set for `enum |
| 139 | class`. |
| 140 | |
| 141 | The histogram helpers use the `kMaxValue` convention, and the enum may be |
| 142 | logged with: |
| 143 | |
| 144 | ```c++ |
Daniel Cheng | cda1df5b | 2018-03-30 21:30:16 | [diff] [blame] | 145 | UMA_HISTOGRAM_ENUMERATION("NewTabPageAction", action); |
| 146 | ``` |
Daniel Cheng | cda1df5b | 2018-03-30 21:30:16 | [diff] [blame] | 147 | |
Daniel Cheng | 914170d2 | 2019-05-08 09:46:32 | [diff] [blame] | 148 | or: |
| 149 | |
Steven Holte | ecf841d | 2018-08-10 00:53:34 | [diff] [blame] | 150 | ```c++ |
Daniel Cheng | 914170d2 | 2019-05-08 09:46:32 | [diff] [blame] | 151 | UmaHistogramEnumeration("NewTabPageAction", action); |
Daniel Cheng | cda1df5b | 2018-03-30 21:30:16 | [diff] [blame] | 152 | ``` |
Steven Holte | ecf841d | 2018-08-10 00:53:34 | [diff] [blame] | 153 | |
Daniel Cheng | 914170d2 | 2019-05-08 09:46:32 | [diff] [blame] | 154 | #### Legacy Enums |
| 155 | |
| 156 | **Note: this method of defining histogram enums is deprecated. Do not use this |
| 157 | for new enums.** |
| 158 | |
| 159 | Many legacy enums define a `kCount` sentinel, reying on the compiler to |
| 160 | automatically update it when new entries are added: |
| 161 | |
Steven Holte | ecf841d | 2018-08-10 00:53:34 | [diff] [blame] | 162 | ```c++ |
Daniel Cheng | cda1df5b | 2018-03-30 21:30:16 | [diff] [blame] | 163 | enum class NewTabPageAction { |
| 164 | kUseOmnibox = 0, |
| 165 | kClickTitle = 1, |
Daniel Cheng | 914170d2 | 2019-05-08 09:46:32 | [diff] [blame] | 166 | // kUseSearchbox = 2, // no longer used, combined into omnibox |
| 167 | kOpenBookmark = 3, |
Daniel Cheng | cda1df5b | 2018-03-30 21:30:16 | [diff] [blame] | 168 | kCount, |
| 169 | }; |
Daniel Cheng | 914170d2 | 2019-05-08 09:46:32 | [diff] [blame] | 170 | ``` |
Steven Holte | ecf841d | 2018-08-10 00:53:34 | [diff] [blame] | 171 | |
Daniel Cheng | 914170d2 | 2019-05-08 09:46:32 | [diff] [blame] | 172 | These enums must be recorded using the legacy helpers: |
| 173 | |
| 174 | ```c++ |
Daniel Cheng | cda1df5b | 2018-03-30 21:30:16 | [diff] [blame] | 175 | UMA_HISTOGRAM_ENUMERATION("NewTabPageAction", action, NewTabPageAction::kCount); |
| 176 | ``` |
| 177 | |
Daniel Cheng | 914170d2 | 2019-05-08 09:46:32 | [diff] [blame] | 178 | or: |
| 179 | |
| 180 | ```c++ |
| 181 | UmaHistogramEnumeration("NewTabPageAction", action, NewTabPageAction::kCount); |
| 182 | ``` |
mpearson | b36013be | 2017-02-10 20:10:54 | [diff] [blame] | 183 | |
Matt Giuca | f3e0e253 | 2017-10-03 23:07:52 | [diff] [blame] | 184 | ### Flag Histograms |
| 185 | |
| 186 | When adding a new flag in |
| 187 | [about_flags.cc](../../../chrome/browser/about_flags.cc), you need to add a |
| 188 | corresponding entry to [enums.xml](./enums.xml). This will be automatically |
| 189 | verified by the `AboutFlagsHistogramTest` unit test. |
| 190 | |
| 191 | To add a new entry: |
| 192 | |
| 193 | 1. Edit [enums.xml](./enums.xml), adding the feature to the `LoginCustomFlags` |
Brett Wilson | f4d5877 | 2017-10-30 21:37:57 | [diff] [blame] | 194 | enum section, with any unique value (just make one up, although whatever it |
Dave Schuyler | 988e1a47 | 2018-01-04 02:21:11 | [diff] [blame] | 195 | is needs to appear in sorted order; `pretty_print.py` will do this for you). |
Matt Giuca | f3e0e253 | 2017-10-03 23:07:52 | [diff] [blame] | 196 | 2. Build `unit_tests`, then run `unit_tests |
| 197 | --gtest_filter='AboutFlagsHistogramTest.*'` to compute the correct value. |
| 198 | 3. Update the entry in [enums.xml](./enums.xml) with the correct value, and move |
Brett Wilson | f4d5877 | 2017-10-30 21:37:57 | [diff] [blame] | 199 | it so the list is sorted by value (`pretty_print.py` will do this for you). |
Matt Giuca | f3e0e253 | 2017-10-03 23:07:52 | [diff] [blame] | 200 | 4. Re-run the test to ensure the value and ordering are correct. |
| 201 | |
| 202 | You can also use `tools/metrics/histograms/validate_format.py` to check the |
| 203 | ordering (but not that the value is correct). |
| 204 | |
| 205 | Don't remove entries when removing a flag; they are still used to decode data |
| 206 | from previous Chrome versions. |
| 207 | |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 208 | ### Count Histograms |
| 209 | |
| 210 | [histogram_macros.h](https://cs.chromium.org/chromium/src/base/metrics/histogram_macros.h) |
| 211 | provides macros for some common count types such as memory or elapsed time, in |
| 212 | addition to general count macros. These have reasonable default values; you |
| 213 | will not often need to choose number of buckets or histogram min. You still |
| 214 | will need to choose the histogram max (use the advice below). |
| 215 | |
| 216 | If none of the default macros work well for you, please thoughtfully choose |
| 217 | a min, max, and bucket count for your histogram using the advice below. |
| 218 | |
rkaplow | 6dfcb89 | 2016-10-04 14:04:27 | [diff] [blame] | 219 | #### Count Histograms: Choosing Min and Max |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 220 | |
| 221 | For histogram max, choose a value so that very few emission to the histogram |
| 222 | will exceed the max. If many emissions hit the max, it can be difficult to |
| 223 | compute statistics such as average. One rule of thumb is at most 1% of samples |
| 224 | should be in the overflow bucket. This allows analysis of the 99th percentile. |
vapier | 52b9aba | 2016-12-14 06:09:25 | [diff] [blame] | 225 | Err on the side of too large a range versus too short a range. (Remember that |
| 226 | if you choose poorly, you'll have to wait for another release cycle to fix it.) |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 227 | |
| 228 | For histogram min, if you care about all possible values (zero and above), |
| 229 | choose a min of 1. (All histograms have an underflow bucket; emitted zeros |
| 230 | will go there. That's why a min of 1 is appropriate.) Otherwise, choose the |
| 231 | min appropriate for your particular situation. |
| 232 | |
rkaplow | 6dfcb89 | 2016-10-04 14:04:27 | [diff] [blame] | 233 | #### Count Histograms: Choosing Number of Buckets |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 234 | |
| 235 | Choose the smallest number of buckets that will get you the granularity you |
| 236 | need. By default count histograms bucket sizes scale exponentially so you can |
rkaplow | 6dfcb89 | 2016-10-04 14:04:27 | [diff] [blame] | 237 | get fine granularity when the numbers are small yet still reasonable resolution |
rkaplow | 8a62ef6 | 2016-10-06 14:42:34 | [diff] [blame] | 238 | for larger numbers. The macros default to 50 buckets (or 100 buckets for |
| 239 | histograms with wide ranges) which is appropriate for most purposes. Because |
| 240 | histograms pre-allocate all the buckets, the number of buckets selected |
| 241 | directly dictate how much memory is used. Do not exceed 100 buckets without |
Mark Pearson | f0312e9 | 2019-09-26 18:56:22 | [diff] [blame] | 242 | good reason (and consider whether [sparse |
| 243 | histograms](#When-To-Use-Sparse-Histograms) might work better for you in that |
| 244 | case--they do not pre-allocate their buckets). |
rkaplow | 8a62ef6 | 2016-10-06 14:42:34 | [diff] [blame] | 245 | |
Mark Pearson | 6be2f35c | 2018-08-14 07:06:02 | [diff] [blame] | 246 | ### Timing Histograms |
| 247 | |
| 248 | You can easily emit a time duration (time delta) using UMA_HISTOGRAM_TIMES, |
| 249 | UMA_HISTOGRAM_MEDIUM_TIMES, and UMA_HISTOGRAM_LONG_TIMES macros, and their |
| 250 | friends, as well as helpers such as SCOPED_UMA_HISTOGRAM_TIMER. Many timing |
| 251 | histograms are used for performance monitoring; if this is the case for you, |
| 252 | please read [this document about how to structure timing histograms to make |
| 253 | them more useful and |
Paul Jensen | 5107d9c | 2018-10-22 22:24:06 | [diff] [blame] | 254 | actionable](https://chromium.googlesource.com/chromium/src/+/lkgr/docs/speed/diagnostic_metrics.md). |
Mark Pearson | 6be2f35c | 2018-08-14 07:06:02 | [diff] [blame] | 255 | |
Mark Pearson | 49928ec | 2018-06-05 20:15:49 | [diff] [blame] | 256 | ### Percentage or Ratio Histograms |
| 257 | |
| 258 | You can easily emit a percentage histogram using the |
| 259 | UMA_HISTOGRAM_PERCENTAGE macro provided in |
| 260 | [histogram_macros.h](https://cs.chromium.org/chromium/src/base/metrics/histogram_macros.h). |
| 261 | You can also easily emit any ratio as a linear histogram (for equally |
| 262 | sized buckets). |
| 263 | |
| 264 | For such histograms, you should think carefully about _when_ the values are |
| 265 | emitted. Normally, you should emit values periodically at a set time interval, |
| 266 | such as every 5 minutes. Conversely, we strongly discourage emitting values |
| 267 | based on event triggers. For example, we do not recommend recording a ratio |
| 268 | at the end of a video playback. |
| 269 | |
| 270 | Why? You typically cannot make decisions based on histograms whose values are |
| 271 | recorded in response to an event, because such metrics can conflate heavy usage |
| 272 | with light usage. It's easier to reason about metrics that route around this |
| 273 | source of bias. |
| 274 | |
| 275 | Many developers have been bitten by this. For example, it was previously common |
| 276 | to emit an actions-per-minute ratio whenever Chrome was backgrounded. |
| 277 | Precisely, these metrics computed the number of uses of a particular action |
| 278 | during a Chrome session, divided by length of time Chrome had been open. |
| 279 | Sometimes, the recorded rate was based on a short interaction with Chrome – a |
| 280 | few seconds or a minute. Other times, the recorded rate was based on a long |
| 281 | interaction, tens of minutes or hours. These two situations are |
| 282 | indistinguishable in the UMA logs – the recorded values can be identical. |
| 283 | |
| 284 | This inability to distinguish these two qualitatively different settings make |
| 285 | such histograms effectively uninterpretable and not actionable. Emitting at a |
| 286 | regular interval avoids the issue. Each value will represent the same amount of |
| 287 | time (e.g., one minute of video playback). |
| 288 | |
rkaplow | 8a62ef6 | 2016-10-06 14:42:34 | [diff] [blame] | 289 | ### Local Histograms |
| 290 | |
Gayane Petrosyan | a6ee443c | 2018-05-17 21:39:54 | [diff] [blame] | 291 | Histograms can be added via [Local macros](https://codesearch.chromium.org/chromium/src/base/metrics/histogram_macros_local.h). |
| 292 | These will still record locally, but will not be uploaded to UMA and will |
| 293 | therefore not be available for analysis. This can be useful for metrics only |
| 294 | needed for local debugging. We don't recommend using local histograms outside |
| 295 | of that scenario. |
rkaplow | 8a62ef6 | 2016-10-06 14:42:34 | [diff] [blame] | 296 | |
| 297 | ### Multidimensional Histograms |
| 298 | |
| 299 | It is common to be interested in logging multidimensional data - where multiple |
| 300 | pieces of information need to be logged together. For example, a developer may |
| 301 | be interested in the counts of features X and Y based on whether a user is in |
| 302 | state A or B. In this case, they want to know the count of X under state A, |
| 303 | as well as the other three permutations. |
| 304 | |
| 305 | There is no general purpose solution for this type of analysis. We suggest |
| 306 | using the workaround of using an enum of length MxN, where you log each unique |
| 307 | pair {state, feature} as a separate entry in the same enum. If this causes a |
Gayane Petrosyan | a6ee443c | 2018-05-17 21:39:54 | [diff] [blame] | 308 | large explosion in data (i.e. >100 enum entries), a [sparse histogram](#When-To-Use-Sparse-Histograms) |
| 309 | may be appropriate. If you are unsure of the best way to proceed, please |
| 310 | contact someone from the OWNERS file. |
| 311 | |
| 312 | ## Histogram Expiry |
| 313 | |
| 314 | Histogram expiry is specified by **'expires_after'** attribute in histogram |
| 315 | descriptions in histograms.xml. The attribute can be specified as date in |
Brian White | fa0a3fa | 2019-05-13 16:58:11 | [diff] [blame] | 316 | **YYYY-MM-DD** format or as Chrome milestone in **M**\*(e.g. M68) format. In the |
| 317 | latter case, the actual expiry date is about 12 weeks after that branch is cut, |
| 318 | or basically when it is replaced on the "stable" channel by the following |
| 319 | release. |
| 320 | |
| 321 | After a histogram expires, it will cease to be displayed on the dashboard. |
| 322 | However, the client may continue to send data for that histogram for some time |
| 323 | after the official expiry date so simply bumping the 'expires_after' date in |
| 324 | HEAD may be sufficient to resurrect it without any discontinuity. If too much |
| 325 | time has passed and the client is no longer sending data, it can be re-enabled |
| 326 | via Finch: see [Expired Histogram Whitelist](#Expired-histogram-whitelist). |
| 327 | |
| 328 | Once a histogram has expired, the code to record it becomes dead code and should |
| 329 | be removed from the codebase along with marking the histogram definition as |
| 330 | obsolete. |
Gayane Petrosyan | a6ee443c | 2018-05-17 21:39:54 | [diff] [blame] | 331 | |
Brian White | 8614f81 | 2019-02-07 21:07:01 | [diff] [blame] | 332 | In **rare** cases, the expiry can be set to "never". This is used to denote |
| 333 | metrics of critical importance that are, typically, used for other reports. |
| 334 | For example, all metrics of the "[heartbeat](https://uma.googleplex.com/p/chrome/variations)" |
| 335 | are set to never expire. All metrics that never expire must have an XML |
| 336 | comment describing why so that it can be audited in the future. |
| 337 | |
| 338 | ``` |
| 339 | <!-- expires-never: "heartbeat" metric (internal: go/uma-heartbeats) --> |
| 340 | ``` |
| 341 | |
Gayane Petrosyan | a6ee443c | 2018-05-17 21:39:54 | [diff] [blame] | 342 | For all the new histograms the use of expiry attribute will be strongly |
| 343 | encouraged and enforced by Chrome metrics team through reviews. |
| 344 | |
| 345 | #### How to choose expiry for histograms |
| 346 | |
Ilya Sherman | 67418ea | 2019-11-27 01:28:23 | [diff] [blame] | 347 | If you are adding a histogram that will be used to evaluate a feature launch, |
| 348 | set an expiry date consistent with the expected feature launch date. Otherwise, |
| 349 | we recommend choosing 3-6 months. |
Gayane Petrosyan | a6ee443c | 2018-05-17 21:39:54 | [diff] [blame] | 350 | |
Ilya Sherman | 67418ea | 2019-11-27 01:28:23 | [diff] [blame] | 351 | Here are some guidelines for common scenarios: |
Gayane Petrosyan | a6ee443c | 2018-05-17 21:39:54 | [diff] [blame] | 352 | |
Ilya Sherman | 67418ea | 2019-11-27 01:28:23 | [diff] [blame] | 353 | * If the listed owner moved to different project, find a new owner. |
| 354 | * If neither the owner nor the team uses the histogram, remove it. |
| 355 | * If the histogram is not in use now, but might be useful in the far future, |
| 356 | remove it. |
| 357 | * If the histogram is not in use now, but might be useful in the near |
| 358 | future, pick ~3 months or ~2 milestones ahead. |
| 359 | * If the histogram is actively in use now and useful for a short term, pick |
| 360 | 3-6 month or 2-4 milestones ahead. |
| 361 | * If the histogram is actively in use and seems useful for an indefinite time, |
| 362 | pick 1 year. |
| 363 | |
| 364 | We also have a tool that automatically extends expiry dates. The 80% more |
| 365 | frequently accessed histograms are pushed out every Tuesday, to 6 months from |
| 366 | the date of the run. Googlers can view the [design |
| 367 | doc](https://docs.google.com/document/d/1IEAeBF9UnYQMDfyh2gdvE7WlUKsfIXIZUw7qNoU89A4). |
Gayane Petrosyan | a6ee443c | 2018-05-17 21:39:54 | [diff] [blame] | 368 | |
| 369 | ### Expired histogram notifier |
| 370 | |
| 371 | Expired histogram notifier will notify owners in advance by creating crbugs so |
| 372 | that the owners can extend the lifetime of the histogram if needed or deprecate |
| 373 | it. It will regularly check all the histograms in histograms.xml and will |
| 374 | determine expired histograms or histograms expiring soon. Based on that it will |
| 375 | create or update crbugs that will be assigned to histogram owners. |
| 376 | |
| 377 | ### Expired histogram whitelist |
| 378 | |
| 379 | If a histogram expires but turns out to be useful, you can add histogram name |
| 380 | to the whitelist until the updated expiration date reaches to the stable |
| 381 | channel. For adding histogram to the whitelist, see internal documentation |
| 382 | [Histogram Expiry](https://goto.google.com/histogram-expiry-gdoc) |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 383 | |
mpearson | 72a5c9139 | 2017-05-09 22:49:44 | [diff] [blame] | 384 | ## Testing |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 385 | |
vapier | 52b9aba | 2016-12-14 06:09:25 | [diff] [blame] | 386 | Test your histograms using `chrome://histograms`. Make sure they're being |
rkaplow | 6dfcb89 | 2016-10-04 14:04:27 | [diff] [blame] | 387 | emitted to when you expect and not emitted to at other times. Also check that |
| 388 | the values emitted to are correct. Finally, for count histograms, make sure |
| 389 | that buckets capture enough precision for your needs over the range. |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 390 | |
Ivan Sandrk | 8ffc583 | 2018-07-09 12:34:58 | [diff] [blame] | 391 | Pro tip: You can filter the set of histograms shown on `chrome://histograms` by |
| 392 | specifying a prefix. For example, `chrome://histograms/Extensions.Load` will |
| 393 | show only histograms whose names match the pattern "Extensions.Load*". |
| 394 | |
mpearson | 72a5c9139 | 2017-05-09 22:49:44 | [diff] [blame] | 395 | In addition to testing interactively, you can have unit tests examine the |
Devlin Cronin | 15291e9c | 2018-06-07 21:37:48 | [diff] [blame] | 396 | values emitted to histograms. See [histogram_tester.h](https://cs.chromium.org/chromium/src/base/test/metrics/histogram_tester.h) |
mpearson | 72a5c9139 | 2017-05-09 22:49:44 | [diff] [blame] | 397 | for details. |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 398 | |
Mark Pearson | 4c4bc97 | 2018-05-16 20:01:06 | [diff] [blame] | 399 | ## Interpreting the Resulting Data |
| 400 | |
| 401 | The top of [go/uma-guide](http://go/uma-guide) has good advice on how to go |
| 402 | about analyzing and interpreting the results of UMA data uploaded by users. If |
| 403 | you're reading this page, you've probably just finished adding a histogram to |
| 404 | the Chromium source code and you're waiting for users to update their version of |
| 405 | Chrome to a version that includes your code. In this case, the best advice is |
| 406 | to remind you that users who update frequently / quickly are biased. Best take |
| 407 | the initial statistics with a grain of salt; they're probably *mostly* right but |
| 408 | not entirely so. |
| 409 | |
mpearson | 72a5c9139 | 2017-05-09 22:49:44 | [diff] [blame] | 410 | ## Revising Histograms |
| 411 | |
| 412 | When changing the semantics of a histogram (when it's emitted, what buckets |
| 413 | mean, etc.), make it into a new histogram with a new name. Otherwise the |
| 414 | "Everything" view on the dashboard will be mixing two different |
rkaplow | 8a62ef6 | 2016-10-06 14:42:34 | [diff] [blame] | 415 | interpretations of the data and make no sense. |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 416 | |
mpearson | 72a5c9139 | 2017-05-09 22:49:44 | [diff] [blame] | 417 | ## Deleting Histograms |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 418 | |
| 419 | Please delete the code that emits to histograms that are no longer needed. |
Gayane Petrosyan | a6ee443c | 2018-05-17 21:39:54 | [diff] [blame] | 420 | Histograms take up memory. Cleaning up histograms that you no longer care |
Mark Pearson | 2a311c5 | 2019-03-19 21:47:01 | [diff] [blame] | 421 | about is good! But see the note below on |
| 422 | [Cleaning Up Histogram Entries](#Cleaning-Up-Histogram-Entries). |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 423 | |
| 424 | ## Documenting Histograms |
| 425 | |
Mark Pearson | 159c3897 | 2018-06-05 19:44:08 | [diff] [blame] | 426 | Document histograms in [histograms.xml](./histograms.xml). There is also a |
| 427 | [google-internal version of the file](http://go/chrome-histograms-internal) for |
| 428 | the rare case when the histogram is confidential (added only to Chrome code, |
| 429 | not Chromium code; or, an accurate description about how to interpret the |
| 430 | histogram would reveal information about Google's plans). |
| 431 | |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 432 | ### Add Histogram and Documentation in the Same Changelist |
| 433 | |
vapier | 52b9aba | 2016-12-14 06:09:25 | [diff] [blame] | 434 | If possible, please add the [histograms.xml](./histograms.xml) description in |
| 435 | the same changelist in which you add the histogram-emitting code. This has |
| 436 | several benefits. One, it sometimes happens that the |
| 437 | [histograms.xml](./histograms.xml) reviewer has questions or concerns about the |
| 438 | histogram description that reveal problems with interpretation of the data and |
| 439 | call for a different recording strategy. Two, it allows the histogram reviewer |
| 440 | to easily review the emission code to see if it comports with these best |
| 441 | practices, and to look for other errors. |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 442 | |
| 443 | ### Understandable to Everyone |
| 444 | |
| 445 | Histogram descriptions should be roughly understandable to someone not familiar |
| 446 | with your feature. Please add a sentence or two of background if necessary. |
| 447 | |
| 448 | It is good practice to note caveats associated with your histogram in this |
| 449 | section, such as which platforms are supported (if the set of supported |
Gayane Petrosyan | a6ee443c | 2018-05-17 21:39:54 | [diff] [blame] | 450 | platforms is surprising). E.g., a desktop feature that happens not to be |
| 451 | logged on Mac. |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 452 | |
| 453 | ### State When It Is Recorded |
| 454 | |
| 455 | Histogram descriptions should clearly state when the histogram is emitted |
| 456 | (profile open? network request received? etc.). |
| 457 | |
jsbell | da3a66c | 2017-02-09 21:40:32 | [diff] [blame] | 458 | ### Owners |
rkaplow | 8a62ef6 | 2016-10-06 14:42:34 | [diff] [blame] | 459 | |
Caitlin Fischer | 254a12f7 | 2019-07-31 20:57:03 | [diff] [blame] | 460 | Histograms need owners, who are the experts on the metric and the points of |
| 461 | contact for any questions or maintenance tasks, such as extending a histogram's |
| 462 | expiry or deprecating the metric. |
rkaplow | 8a62ef6 | 2016-10-06 14:42:34 | [diff] [blame] | 463 | |
Caitlin Fischer | 254a12f7 | 2019-07-31 20:57:03 | [diff] [blame] | 464 | Histograms must have a primary owner and may have secondary owners. A primary |
| 465 | owner is an individual, e.g. <owner>[email protected]</owner>, who is |
| 466 | ultimately responsible for maintaining the metric. Secondary owners may be |
| 467 | other individuals, team mailing lists, e.g. <owner>[email protected]</owner>, |
| 468 | or paths to OWNERS files, e.g. <owner>src/directory/OWNERS</owner>. |
Mark Pearson | 74c5321 | 2019-03-08 00:34:08 | [diff] [blame] | 469 | |
Caitlin Fischer | 254a12f7 | 2019-07-31 20:57:03 | [diff] [blame] | 470 | It's a best practice to list multiple owners, so that there's no single point |
| 471 | of failure for histogram-related questions and maintenance tasks. If you are |
| 472 | using a metric heavily and understand it intimately, feel free to add yourself |
| 473 | as an owner. For individuals, @chromium.org email addresses are preferred. |
Mark Pearson | 74c5321 | 2019-03-08 00:34:08 | [diff] [blame] | 474 | |
Caitlin Fischer | 254a12f7 | 2019-07-31 20:57:03 | [diff] [blame] | 475 | Notably, owners are asked to determine whether histograms have outlived their |
| 476 | usefulness. When a histogram is nearing expiry, a robot will file a reminder |
| 477 | bug in Monorail. It's important that somebody familiar with the histogram |
| 478 | notices and triages such bugs! |
rkaplow | 8a62ef6 | 2016-10-06 14:42:34 | [diff] [blame] | 479 | |
Mark Pearson | 2a311c5 | 2019-03-19 21:47:01 | [diff] [blame] | 480 | ### Cleaning Up Histogram Entries |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 481 | |
Mark Pearson | 2a311c5 | 2019-03-19 21:47:01 | [diff] [blame] | 482 | Do not delete histograms from histograms.xml. Instead, mark unused |
| 483 | histograms as obsolete and annotate them with the date or milestone in |
| 484 | the `<obsolete>` tag entry. |
| 485 | |
| 486 | If the histogram used [histogram suffixes](#Histogram-Suffixes), mark |
| 487 | the suffix entry for the histogram as obsolete as well. |
| 488 | |
| 489 | If the histogram is being replaced by a new version: |
| 490 | |
| 491 | * Note in the `<obsolete>` message the name of the replacement histogram. |
| 492 | |
| 493 | * Make sure the descriptions of the original and replacement histogram |
| 494 | are different. It's never appropriate for them to be identical. Either |
| 495 | the old description was wrong, and it should be revised to explain what |
| 496 | it actually measured, or the old histogram was measuring something not |
| 497 | as useful as the replacement, in which case the new histogram is |
| 498 | measuring something different and needs to have a new description. |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 499 | |
Mark Pearson | a010912 | 2018-05-30 18:23:05 | [diff] [blame] | 500 | A changelist that marks a histogram as obsolete should be reviewed by all |
| 501 | current owners. |
| 502 | |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 503 | Deleting histogram entries would be bad if someone to accidentally reused your |
| 504 | old histogram name and thereby corrupts new data with whatever old data is still |
| 505 | coming in. It's also useful to keep obsolete histogram descriptions in |
vapier | 52b9aba | 2016-12-14 06:09:25 | [diff] [blame] | 506 | [histograms.xml](./histograms.xml) -- that way, if someone is searching for a |
| 507 | histogram to answer a particular question, they can learn if there was a |
| 508 | histogram at some point that did so even if it isn't active now. |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 509 | |
Ilya Sherman | f54104b | 2017-07-12 23:45:47 | [diff] [blame] | 510 | ### Histogram Suffixes |
| 511 | |
| 512 | It is sometimes useful to record several closely related metrics, which measure |
| 513 | the same type of data, with some minor variations. It is often useful to use one |
| 514 | or more <histogram_suffixes> elements to save on redundant verbosity |
| 515 | in [histograms.xml](./histograms.xml). If a root `<histogram>` or a `<suffix>` |
| 516 | element is used only to construct a partial name, to be completed by further |
| 517 | suffixes, annotate the element with the attribute `base="true"`. This instructs |
| 518 | tools not to treat the partial base name as a distinct histogram. Note that |
| 519 | suffixes can be applied recursively. |
| 520 | |
Mark Pearson | a010912 | 2018-05-30 18:23:05 | [diff] [blame] | 521 | You can also declare ownership of `<histogram_suffixes>`. If there's no owner |
| 522 | specified, the generated histograms will inherit owners from the parents. |
| 523 | |
Mark Pearson | 2a311c5 | 2019-03-19 21:47:01 | [diff] [blame] | 524 | As [with histogram entries](#Cleaning-Up-Histogram-Entries), never delete |
| 525 | histogram suffixes. If the suffix expansion is no longer used, mark it as |
| 526 | obsolete. You can also mark individual histograms within the suffix as |
| 527 | obsolete, indicating the expansion for that histogram is obsolete yet the |
| 528 | expansion for other histograms with the same suffix are not. |
| 529 | |
Ilya Sherman | 1eee82c4c | 2017-12-08 01:22:19 | [diff] [blame] | 530 | ### Enum labels |
| 531 | |
| 532 | _All_ histograms, including boolean and sparse histograms, may have enum labels |
| 533 | provided via [enums.xml](./enums.xml). Using labels is encouraged whenever |
| 534 | labels would be clearer than raw numeric values. |
| 535 | |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 536 | ## When To Use Sparse Histograms |
| 537 | |
| 538 | Sparse histograms are well suited for recording counts of exact sample values |
Mark Pearson | 4d0b463 | 2017-10-04 21:58:48 | [diff] [blame] | 539 | that are sparsely distributed over a large range. They can be used with enums |
Ilya Sherman | 1eee82c4c | 2017-12-08 01:22:19 | [diff] [blame] | 540 | as well as regular integer values. It is often valuable to provide labels in |
| 541 | [enums.xml](./enums.xml). |
mpearson | 2b5f7e0 | 2016-10-03 21:27:03 | [diff] [blame] | 542 | |
| 543 | The implementation uses a lock and a map, whereas other histogram types use a |
| 544 | vector and no lock. It is thus more costly to add values to, and each value |
| 545 | stored has more overhead, compared to the other histogram types. However it |
| 546 | may be more efficient in memory if the total number of sample values is small |
| 547 | compared to the range of their values. |
| 548 | |
Mark Pearson | ed73f1f | 2019-03-22 18:00:12 | [diff] [blame] | 549 | Please talk with the metrics team if there are more than a thousand possible |
| 550 | different values that you could emit. |
| 551 | |
rkaplow | 6dfcb89 | 2016-10-04 14:04:27 | [diff] [blame] | 552 | For more information, see [sparse_histograms.h](https://cs.chromium.org/chromium/src/base/metrics/sparse_histogram.h). |
Caitlin Fischer | b466a04 | 2019-07-31 21:41:46 | [diff] [blame] | 553 | |
| 554 | # Team Documentation |
| 555 | |
| 556 | This section contains useful information for folks on Chrome Metrics. |
| 557 | |
| 558 | ## Processing histograms.xml |
| 559 | |
| 560 | When working with histograms.xml, verify whether you require fully expanded |
| 561 | OWNERS files. Many scripts in this directory process histograms.xml, and |
| 562 | sometimes OWNERS file paths are expanded and other times they are not. OWNERS |
| 563 | paths are expanded when scripts make use of merge_xml's function MergeFiles; |
| 564 | otherwise, they are not. |