The max_docs_per_value parameter is being evaluated at the slice level for concurrent segment search instead of at the shard level, which is resulting in a higher than expected document count being returned in the aggregations.
See: #10046
Also, the docs for this agg type does not even mention this parameter.
Related flaky test issues: