diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md
index 46ab8744a3734b26723d991555d154113f47e969..b570c8d0b4c92b6989189e2ee681e4d66ad9a061 100644
--- a/doc/administration/reference_architectures/10k_users.md
+++ b/doc/administration/reference_architectures/10k_users.md
@@ -30,38 +30,39 @@ specifically the [Before you start](_index.md#before-you-start) and [Deciding wh
> - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)
> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](_index.md#deciding-which-architecture-to-start-with)
-| Service | Nodes | Configuration | GCP | AWS | Azure |
+| Service | Nodes | Configuration | GCP example1 | AWS example1 | Azure example1 |
|------------------------------------------|-------|-------------------------|------------------|----------------|-----------|
-| External load balancer3 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` | `F4s v2` |
-| Consul1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| PostgreSQL1 | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | `m5.2xlarge` | `D8s v3` |
-| PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Internal load balancer3 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` | `F4s v2` |
-| Redis/Sentinel - Cache2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| Redis/Sentinel - Persistent2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| Gitaly5 | 3 | 16 vCPU, 60 GB memory6 | `n1-standard-16` | `m5.4xlarge` | `D16s v3` |
-| Praefect5 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Sidekiq7 | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| GitLab Rails7 | 3 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | `F32s v2` |
+| External load balancer4 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` | `F4s v2` |
+| Consul2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| PostgreSQL2 | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | `m5.2xlarge` | `D8s v3` |
+| PgBouncer2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Internal load balancer4 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` | `F4s v2` |
+| Redis/Sentinel - Cache3 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| Redis/Sentinel - Persistent3 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| Gitaly67 | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` | `D16s v3` |
+| Praefect6 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Praefect PostgreSQL2 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Sidekiq8 | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| GitLab Rails8 | 3 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | `F32s v2` |
| Monitoring node | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
-| Object storage4 | - | - | - | - | - |
+| Object storage5 | - | - | - | - | - |
**Footnotes:**
-1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
-2. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instances](#provide-your-own-redis-instances) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
+1. Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported, including ARM variants if available. See [Supported machine types](_index.md#supported-machine-types) for more information.
+2. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
+3. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instances](#provide-your-own-redis-instances) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
- Redis is primarily single threaded and doesn't significantly benefit from an increase in CPU cores. For this size of architecture it's strongly recommended having separate Cache and Persistent instances as specified to achieve optimum performance.
-3. Recommended to be run with a reputable third-party load balancer or service (LB PaaS) which can provide HA capabilities.
+4. Recommended to be run with a reputable third-party load balancer or service (LB PaaS) which can provide HA capabilities.
The sizing depends on selected Load Balancer and additional factors such as Network Bandwidth. Refer to [Load Balancers](_index.md#load-balancers) for more information.
-4. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
-5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
+5. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
+6. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/_index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`.
-6. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
+7. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required.
-6. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](_index.md#autoscaling-of-stateful-nodes).
+8. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](_index.md#autoscaling-of-stateful-nodes).
However, [Cloud Native Hybrid setups](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) are generally preferred as certain components
such as like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes.
@@ -160,30 +161,41 @@ monitor .[#7FFFD4,norank]u--> elb
## Requirements
-Before starting, see the [requirements](_index.md#requirements) for reference architectures.
+Before proceeding, review the [requirements](_index.md#requirements) for the reference architectures.
## Testing methodology
-The 10k architecture is designed to cover a large majority of workflows and is regularly
-[smoke and performance tested](_index.md#validation-and-test-results) by the Test Platform team
-against the following endpoint throughput targets:
+The 200 RPS / 10k user reference architecture is designed to accommodate most common workflows. The [Framework](https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/gitlab-delivery/framework/) team regularly conducts smoke and performance testing against the following endpoint throughput targets:
-- API: 200 RPS
-- Web: 20 RPS
-- Git (Pull): 20 RPS
-- Git (Push): 4 RPS
+| Endpoint Type | Target Throughput |
+| ------------- | ----------------- |
+| API | 200 RPS |
+| Web | 20 RPS |
+| Git (Pull) | 20 RPS |
+| Git (Push) | 4 RPS |
-The above targets were selected based on real customer data of total environmental loads corresponding to the user count,
-including CI and other workloads.
+These targets are based on actual customer data reflecting total environmental loads for the specified user count, including CI pipelines and other workloads.
-If you have metrics to suggest that you have regularly higher throughput against the above endpoint targets, [large monorepos](_index.md#large-monorepos)
-or notable [additional workloads](_index.md#additional-workloads) these can notably impact the performance environment and [further adjustments may be required](_index.md#scaling-an-environment).
-If this applies to you, we strongly recommended referring to the linked documentation and reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+### Performance considerations
-Testing is done regularly by using the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) and its dataset, which is available for anyone to use.
-The results of this testing are [available publicly on the GPT wiki](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest). For more information on our testing strategy [refer to this section of the documentation](_index.md#validation-and-test-results).
+You may need additional adjustments if your environment has:
-The load balancers used for testing were HAProxy for Linux package environments or equivalent Cloud Provider services with NGINX Ingress for Cloud Native Hybrids. These selections do not represent a specific requirement or recommendation as most [reputable load balancers are expected to work](#configure-the-external-load-balancer).
+- Consistently higher throughput than the listed targets
+- [Large monorepos](_index.md#large-monorepos)
+- Significant [additional workloads](_index.md#additional-workloads)
+
+In these cases, refer to [scaling an environment](_index.md#scaling-an-environment) for more information. If you believe these considerations may apply to you, contact us for additional guidance as required.
+
+### Testing tools and results
+
+We use the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) for testing, which includes a publicly available dataset. You can view detailed test results on the [GPT wiki](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest). For more information about our testing methodology, see the [validation and test results](_index.md#validation-and-test-results) section.
+
+### Load Balancer configuration
+
+Our testing environment uses:
+
+- HAProxy for Linux package environments
+- Cloud Provider equivalents with NGINX Ingress for Cloud Native Hybrids
## Set up components
@@ -1200,7 +1212,7 @@ designated the primary, and failover occurs automatically if the primary node go
**Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.**
**However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact the performance of the environment and further adjustments may be required.**
-If this applies to you, we strongly recommended referring to the linked documentation as well as reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+If you believe this applies to you, contact us for additional guidance as required.
{{< /alert >}}
@@ -1549,7 +1561,7 @@ requirements that are dependent on data and load.
**Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.**
**However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact the performance of the environment and further adjustments may be required.**
-If this applies to you, we strongly recommended referring to the linked documentation as well as reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+If you believe this applies to you, contact us for additional guidance as required.
{{< /alert >}}
@@ -2331,9 +2343,8 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements
| Sidekiq | 12.6 vCPU
28 GB memory (request)
56 GB memory (limit) | 4 x `n1-standard-4` | 4 x `m5.xlarge` |
| Supporting services | 8 vCPU
30 GB memory | 2 x `n1-standard-4` | 2 x `m5.xlarge` |
-- For this setup, we **recommend** and regularly [test](_index.md#validation-and-test-results)
- [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary.
-- GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy.
+- For this setup, we regularly [test](_index.md#validation-and-test-results) and recommended [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary.
+- Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported. See [Supported Machine Types](_index.md#supported-machine-types) for more information.
- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources are required for the chosen Kubernetes provider's system processes. The given examples take this into account.
- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes also require resources. The given examples take this into account.
- In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices.
@@ -2342,32 +2353,33 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements
Next are the backend components that run on static compute VMs using the Linux package (or External PaaS
services where applicable):
-| Service | Nodes | Configuration | GCP | AWS |
+| Service | Nodes | Configuration | GCP example1 | AWS example1 |
|------------------------------------------|-------|-----------------------|------------------|--------------|
-| Consul1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| PostgreSQL1 | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | `m5.2xlarge` |
-| PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| Internal load balancer3 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` |
-| Redis/Sentinel - Cache2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
-| Redis/Sentinel - Persistent2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
-| Gitaly5 | 3 | 16 vCPU, 60 GB memory6 | `n1-standard-16` | `m5.4xlarge` |
-| Praefect5 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| Object storage4 | - | - | - | - |
+| Consul2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| PostgreSQL2 | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | `m5.2xlarge` |
+| PgBouncer2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| Internal load balancer4 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` |
+| Redis/Sentinel - Cache3 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
+| Redis/Sentinel - Persistent3 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
+| Gitaly67 | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` |
+| Praefect6 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| Praefect PostgreSQL2 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| Object storage5 | - | - | - | - |
**Footnotes:**
-1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
-2. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instances](#provide-your-own-redis-instances) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
+1. Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported, including ARM variants if available. See [Supported Machine Types](_index.md#supported-machine-types) for more information.
+2. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
+3. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instances](#provide-your-own-redis-instances) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
- Redis is primarily single threaded and doesn't significantly benefit from an increase in CPU cores. For this size of architecture it's strongly recommended having separate Cache and Persistent instances as specified to achieve optimum performance.
-3. Recommended to be run with a reputable third-party load balancer or service (LB PaaS) which can provide HA capabilities.
+4. Recommended to be run with a reputable third-party load balancer or service (LB PaaS) which can provide HA capabilities.
Also, the sizing depends on selected Load Balancer and additional factors such as Network Bandwidth. Refer to [Load Balancers](_index.md#load-balancers) for more information.
-4. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
-5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
+5. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
+6. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/_index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`.
-6. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
+7. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required.
diff --git a/doc/administration/reference_architectures/1k_users.md b/doc/administration/reference_architectures/1k_users.md
index fdb787962f8636ee4c5be059b2478397695199da..376075892666779c6ffd13ca02562151b587f7ca 100644
--- a/doc/administration/reference_architectures/1k_users.md
+++ b/doc/administration/reference_architectures/1k_users.md
@@ -25,15 +25,16 @@ For a full list of reference architectures, see
> can follow a [modified hybrid reference architecture](#cloud-native-hybrid-reference-architecture-with-helm-charts).
> - **Unsure which Reference Architecture to use?** For more information, see [deciding which architecture to start with](_index.md#deciding-which-architecture-to-start-with).
-| Users | Configuration | GCP | AWS | Azure |
+| Users | Configuration | GCP example1 | AWS example1 | Azure example1 |
|--------------|----------------------|----------------|--------------|----------|
-| Up to 1,000 or 20 RPS | 8 vCPU, 16 GB memory | `n1-standard-8`1 | `c5.2xlarge` | `F8s v2` |
+| Up to 1,000 or 20 RPS | 8 vCPU, 16 GB memory | `n1-standard-8`2 | `c5.2xlarge` | `F8s v2` |
**Footnotes:**
-1. For GCP, the closest and equivalent standard machine type has been selected that matches the recommended requirement of 8 vCPU and 16 GB of RAM. A [custom machine type](https://cloud.google.com/compute/docs/instances/creating-instance-with-custom-machine-type) can also be used if desired.
+1. Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported, including ARM variants if available. See [Supported machine types](_index.md#supported-machine-types) for more information.
+2. For GCP, the closest and equivalent standard machine type has been selected that matches the recommended requirement of 8 vCPU and 16 GB of RAM. A [custom machine type](https://cloud.google.com/compute/docs/instances/creating-instance-with-custom-machine-type) can also be used if desired.
The following diagram shows that while GitLab can be installed on a single server, it is internally composed of multiple services. When an instance scales, these services are separated and independently scaled according to their specific demands.
@@ -76,32 +77,42 @@ monitor .[#7FFFD4,norank]--> redis
## Requirements
-Before starting, see the [requirements](_index.md#requirements) for reference architectures.
+Before proceeding, review the [requirements](_index.md#requirements) for the reference architectures.
{{< alert type="warning" >}}
**The node's specifications are based on high percentiles of both usage patterns and repository sizes in good health.**
-**However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads), it might *significantly* impact the performance of the environment.**
-If this applies to you, [further adjustments might be required](_index.md#scaling-an-environment). See the linked documentation and reach out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+**However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads), they might *significantly* impact the performance of the environment.**
+If this applies to you, [further adjustments might be required](_index.md#scaling-an-environment). See the linked documentation and contact us if required for further guidance.
{{< /alert >}}
## Testing methodology
-The 1k architecture is designed to cover a large majority of workflows. It is regularly
-[smoke and performance tested](_index.md#validation-and-test-results) by the Test Platform team
-against the following endpoint throughput targets:
+The 20 RPS / 1k user reference architecture is designed to accommodate most common workflows. The [Framework](https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/gitlab-delivery/framework/) team regularly conducts smoke and performance testing against the following endpoint throughput targets:
-- API: 20 RPS
-- Web: 2 RPS
-- Git (Pull): 2 RPS
-- Git (Push): 1 RPS
+| Endpoint type | Target throughput |
+| ------------- | ----------------- |
+| API | 20 RPS |
+| Web | 2 RPS |
+| Git (Pull) | 2 RPS |
+| Git (Push) | 1 RPS |
-These targets are selected based on the real customer data of total environmental loads corresponding to the user count,
-including CI and other workloads.
+These targets are based on actual customer data reflecting total environmental loads for the specified user count, including CI pipelines and other workloads.
-Testing is done regularly by using our [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) and its dataset, which is available for anyone to use.
-The results of this testing are [available publicly on the GPT wiki](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest). For more information on our testing strategy, see [validation and test results](_index.md#validation-and-test-results).
+### Performance considerations
+
+You may need additional adjustments if your environment has:
+
+- Consistently higher throughput than the listed targets
+- [Large monorepos](_index.md#large-monorepos)
+- Significant [additional workloads](_index.md#additional-workloads)
+
+In these cases, refer to [scaling an environment](_index.md#scaling-an-environment) for more information. If you believe these considerations may apply to you, contact us for additional guidance as required.
+
+### Testing tools and results
+
+We use the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) for testing, which includes a publicly available dataset. You can view detailed test results on the [GPT wiki](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest). For more information about our testing methodology, see the [validation and test results](_index.md#validation-and-test-results) section.
## Setup instructions
diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md
index 1b200fb345c6a902e9978ede2ab748388ccfc864..824efa5486575bee4d91b49e4b826c18ef0668f3 100644
--- a/doc/administration/reference_architectures/25k_users.md
+++ b/doc/administration/reference_architectures/25k_users.md
@@ -30,38 +30,39 @@ specifically the [Before you start](_index.md#before-you-start) and [Deciding wh
> - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)
> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](_index.md#deciding-which-architecture-to-start-with)
-| Service | Nodes | Configuration | GCP | AWS | Azure |
+| Service | Nodes | Configuration | GCP example1 | AWS example1 | Azure example1 |
|------------------------------------------|-------|-------------------------|------------------|--------------|-----------|
-| External load balancer3 | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5n.2xlarge` | `F8s v2` |
-| Consul1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| PostgreSQL1 | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` | `D16s v3` |
-| PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Internal load balancer3 | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5n.2xlarge` | `F8s v2` |
-| Redis/Sentinel - Cache2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| Redis/Sentinel - Persistent2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| Gitaly5 | 3 | 32 vCPU, 120 GB memory6 | `n1-standard-32` | `m5.8xlarge` | `D32s v3` |
-| Praefect5 | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
-| Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Sidekiq7 | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| GitLab Rails7 | 5 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | `F32s v2` |
+| External load balancer4 | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5n.2xlarge` | `F8s v2` |
+| Consul2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| PostgreSQL2 | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` | `D16s v3` |
+| PgBouncer2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Internal load balancer4 | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5n.2xlarge` | `F8s v2` |
+| Redis/Sentinel - Cache3 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| Redis/Sentinel - Persistent3 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| Gitaly67 | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` | `m5.8xlarge` | `D32s v3` |
+| Praefect6 | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
+| Praefect PostgreSQL2 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Sidekiq8 | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| GitLab Rails8 | 5 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | `F32s v2` |
| Monitoring node | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
-| Object storage4 | - | - | - | - | - |
+| Object storage5 | - | - | - | - | - |
**Footnotes:**
-1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
-2. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instances](#provide-your-own-redis-instances) for more information.
+1. Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported, including ARM variants if available. See [Supported machine types](_index.md#supported-machine-types) for more information.
+2. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
+3. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instances](#provide-your-own-redis-instances) for more information.
- Redis is primarily single threaded and doesn't significantly benefit from an increase in CPU cores. For this size of architecture it's strongly recommended having separate Cache and Persistent instances as specified to achieve optimum performance.
-3. Recommended to be run with a reputable third-party load balancer or service (LB PaaS) which can provide HA capabilities.
+4. Recommended to be run with a reputable third-party load balancer or service (LB PaaS) which can provide HA capabilities.
Also, the sizing depends on selected Load Balancer and additional factors such as Network Bandwidth. Refer to [Load Balancers](_index.md#load-balancers) for more information.
-4. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
-5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
+5. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
+6. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/_index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`.
-6. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
+7. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required.
-6. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](_index.md#autoscaling-of-stateful-nodes).
+8. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](_index.md#autoscaling-of-stateful-nodes).
However, [Cloud Native Hybrid setups](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) are generally preferred as certain components
such as like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes.
@@ -164,26 +165,37 @@ Before starting, see the [requirements](_index.md#requirements) for reference ar
## Testing methodology
-The 25k architecture is designed to cover a large majority of workflows and is regularly
-[smoke and performance tested](_index.md#validation-and-test-results) by the Test Platform team
-against the following endpoint throughput targets:
+The 500 RPS / 2k user reference architecture is designed to accommodate most common workflows. The [Framework](https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/gitlab-delivery/framework/) team regularly conducts smoke and performance testing against the following endpoint throughput targets:
-- API: 500 RPS
-- Web: 50 RPS
-- Git (Pull): 50 RPS
-- Git (Push): 10 RPS
+| Endpoint type | Target throughput |
+| ------------- | ----------------- |
+| API | 500 RPS |
+| Web | 50 RPS |
+| Git (Pull) | 50 RPS |
+| Git (Push) | 10 RPS |
-The above targets were selected based on real customer data of total environmental loads corresponding to the user count,
-including CI and other workloads.
+These targets are based on actual customer data reflecting total environmental loads for the specified user count, including CI pipelines and other workloads.
-If you have metrics to suggest that you have regularly higher throughput against the above endpoint targets, [large monorepos](_index.md#large-monorepos)
-or notable [additional workloads](_index.md#additional-workloads) these can notably impact the performance environment and [further adjustments may be required](_index.md#scaling-an-environment).
-If this applies to you, we strongly recommended referring to the linked documentation and reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+### Performance considerations
-Testing is done regularly by using the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) and its dataset, which is available for anyone to use.
-The results of this testing are [available publicly on the GPT wiki](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest). For more information on our testing strategy [refer to this section of the documentation](_index.md#validation-and-test-results).
+You may need additional adjustments if your environment has:
-The load balancers used for testing were HAProxy for Linux package environments or equivalent Cloud Provider services with NGINX Ingress for Cloud Native Hybrids. These selections do not represent a specific requirement or recommendation as most [reputable load balancers are expected to work](#configure-the-external-load-balancer).
+- Consistently higher throughput than the listed targets
+- [Large monorepos](_index.md#large-monorepos)
+- Significant [additional workloads](_index.md#additional-workloads)
+
+In these cases, refer to [scaling an environment](_index.md#scaling-an-environment) for more information. If you believe these considerations may apply to you, contact us for additional guidance as required.
+
+### Testing tools and results
+
+We use the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) for testing, which includes a publicly available dataset. You can view detailed test results on the [GPT wiki](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest). For more information about our testing methodology, see the [validation and test results](_index.md#validation-and-test-results) section.
+
+### Load Balancer configuration
+
+Our testing environment uses:
+
+- HAProxy for Linux package environments
+- Cloud Provider equivalents with NGINX Ingress for Cloud Native Hybrids
## Set up components
@@ -1208,7 +1220,7 @@ designated the primary, and failover occurs automatically if the primary node go
**Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.**
**However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact the performance of the environment and further adjustments may be required.**
-If this applies to you, we strongly recommended referring to the linked documentation as well as reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+If you believe this applies to you, contact us for additional guidance as required.
{{< /alert >}}
@@ -1555,7 +1567,7 @@ requirements that are dependent on data and load.
**Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.**
**However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact the performance of the environment and further adjustments may be required.**
-If this applies to you, we strongly recommended referring to the linked documentation as well as reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+If you believe this applies to you, contact us for additional guidance as required.
{{< /alert >}}
@@ -2339,9 +2351,8 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements
| Sidekiq | 12.6 vCPU
28 GB memory (request)
56 GB memory (limit) | 4 x `n1-standard-4` | 4 x `m5.xlarge` |
| Supporting services | 8 vCPU
30 GB memory | 2 x `n1-standard-4` | 2 x `m5.xlarge` |
-- For this setup, we **recommend** and regularly [test](_index.md#validation-and-test-results)
- [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary.
-- GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy.
+- For this setup, we regularly [test](_index.md#validation-and-test-results) and recommended [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary.
+- Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported. See [Supported Machine Types](_index.md#supported-machine-types) for more information.
- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources are required for the chosen Kubernetes provider's system processes. The given examples take this into account.
- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes also require resources. The given examples take this into account.
- In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices.
@@ -2350,31 +2361,32 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements
Next are the backend components that run on static compute VMs using the Linux package (or External PaaS
services where applicable):
-| Service | Nodes | Configuration | GCP | AWS |
+| Service | Nodes | Configuration | GCP example1 | AWS example1 |
|------------------------------------------|-------|------------------------|------------------|--------------|
-| Consul1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| PostgreSQL1 | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` |
-| PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| Internal load balancer3 | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` |
-| Redis/Sentinel - Cache2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
-| Redis/Sentinel - Persistent2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
-| Gitaly5 | 3 | 32 vCPU, 120 GB memory6 | `n1-standard-32` | `m5.8xlarge` |
-| Praefect5 | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` |
-| Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| Object storage4 | - | - | - | - |
+| Consul2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| PostgreSQL2 | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` |
+| PgBouncer2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| Internal load balancer4 | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` |
+| Redis/Sentinel - Cache3 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
+| Redis/Sentinel - Persistent3 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
+| Gitaly67 | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` | `m5.8xlarge` |
+| Praefect6 | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` |
+| Praefect PostgreSQL2 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| Object storage5 | - | - | - | - |
**Footnotes:**
-1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
-2. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instances](#provide-your-own-redis-instances) for more information.
+1. Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported, including ARM variants if available. See [Supported Machine Types](_index.md#supported-machine-types) for more information.
+2. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
+3. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instances](#provide-your-own-redis-instances) for more information.
- Redis is primarily single threaded and doesn't significantly benefit from an increase in CPU cores. For this size of architecture it's strongly recommended having separate Cache and Persistent instances as specified to achieve optimum performance.
-3. Can be optionally run on reputable third-party load balancing services (LB PaaS). See [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
-4. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
-5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
+4. Can be optionally run on reputable third-party load balancing services (LB PaaS). See [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
+5. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
+6. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/_index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`.
-6. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
+7. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required.
diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md
index 04633ca649f543150d81cf3a6108f3368211c0b0..b31144c055ddc73193116fe903fac10d2ea82d85 100644
--- a/doc/administration/reference_architectures/2k_users.md
+++ b/doc/administration/reference_architectures/2k_users.md
@@ -24,30 +24,31 @@ For a full list of reference architectures, see
> - **Cloud Native Hybrid:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)
> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](_index.md#deciding-which-architecture-to-start-with).
-| Service | Nodes | Configuration | GCP | AWS | Azure |
+| Service | Nodes | Configuration | GCP example1 | AWS example1 | Azure example1 |
|------------------------------------|-------|------------------------|-----------------|--------------|----------|
-| External Load balancer3 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` | `F4s v2` |
-| PostgreSQL1 | 1 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
-| Redis2 | 1 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `m5.large` | `D2s v3` |
-| Gitaly5 | 1 | 4 vCPU, 15 GB memory5 | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| Sidekiq6 | 1 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| GitLab Rails6 | 2 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | `F8s v2` |
+| External Load balancer4 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` | `F4s v2` |
+| PostgreSQL2 | 1 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
+| Redis3 | 1 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `m5.large` | `D2s v3` |
+| Gitaly6 | 1 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| Sidekiq7 | 1 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| GitLab Rails7 | 2 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | `F8s v2` |
| Monitoring node | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Object storage4 | - | - | - | - | - |
+| Object storage5 | - | - | - | - | - |
**Footnotes:**
-1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
-2. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instance](#provide-your-own-redis-instance) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
-3. Recommended to be run with a reputable third-party load balancer or service (LB PaaS).
+1. Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported, including ARM variants if available. See [Supported machine types](_index.md#supported-machine-types) for more information.
+2. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
+3. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instance](#provide-your-own-redis-instance) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
+4. Recommended to be run with a reputable third-party load balancer or service (LB PaaS).
Sizing depends on selected Load Balancer and additional factors such as Network Bandwidth. See [Load Balancers](_index.md#load-balancers) for more information.
-4. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
-5. Gitaly specifications are based on the use of normal-sized repositories in good health.
+5. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
+6. Gitaly specifications are based on the use of normal-sized repositories in good health.
However, if you have large monorepos (larger than several gigabytes) this can **significantly** impact Git and Gitaly performance and an increase of specifications will likely be required.
Refer to [large monorepos](_index.md#large-monorepos) for more information.
-6. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](_index.md#autoscaling-of-stateful-nodes).
+7. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](_index.md#autoscaling-of-stateful-nodes).
However, [Cloud Native Hybrid setups](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) are generally preferred as certain components
such as like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes.
@@ -100,30 +101,41 @@ monitor .[#7FFFD4]u-> sidekiq
## Requirements
-Before starting, see the [requirements](_index.md#requirements) for reference architectures.
+Before proceeding, review the [requirements](_index.md#requirements) for the reference architectures.
## Testing methodology
-The 2k architecture is designed to cover a large majority of workflows and is regularly
-[smoke and performance tested](_index.md#validation-and-test-results) by the Test Platform team
-against the following endpoint throughput targets:
+The 40 RPS / 2k user reference architecture is designed to accommodate most common workflows. The [Framework](https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/gitlab-delivery/framework/) team regularly conducts smoke and performance testing against the following endpoint throughput targets:
+
+| Endpoint Type | Target Throughput |
+| ------------- | ----------------- |
+| API | 40 RPS |
+| Web | 4 RPS |
+| Git (Pull) | 4 RPS |
+| Git (Push) | 1 RPS |
+
+These targets are based on actual customer data reflecting total environmental loads for the specified user count, including CI pipelines and other workloads.
+
+### Performance considerations
+
+You may need additional adjustments if your environment has:
-- API: 40 RPS
-- Web: 4 RPS
-- Git (Pull): 4 RPS
-- Git (Push): 1 RPS
+- Consistently higher throughput than the listed targets
+- [Large monorepos](_index.md#large-monorepos)
+- Significant [additional workloads](_index.md#additional-workloads)
-The above targets were selected based on real customer data of total environmental loads corresponding to the user count,
-including CI and other workloads.
+In these cases, refer to [scaling an environment](_index.md#scaling-an-environment) for more information. If you believe these considerations may apply to you, contact us for additional guidance as required.
-If you have metrics to suggest that you have regularly higher throughput against the above endpoint targets, [large monorepos](_index.md#large-monorepos)
-or notable [additional workloads](_index.md#additional-workloads) these can notably impact the performance environment and [further adjustments may be required](_index.md#scaling-an-environment).
-If this applies to you, we strongly recommended referring to the linked documentation and reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+### Testing tools and results
-Testing is done regularly by using our [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) and its dataset, which is available for anyone to use.
-The results of this testing are [available publicly on the GPT wiki](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest). For more information on our testing strategy [refer to this section of the documentation](_index.md#validation-and-test-results).
+We use the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) for testing, which includes a publicly available dataset. You can view detailed test results on the [GPT wiki](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest). For more information about our testing methodology, see the [validation and test results](_index.md#validation-and-test-results) section.
-The load balancers used for testing were HAProxy for Linux package environments or equivalent Cloud Provider services with NGINX Ingress for Cloud Native Hybrids. These selections do not represent a specific requirement or recommendation as most [reputable load balancers are expected to work](#configure-the-external-load-balancer).
+### Load Balancer configuration
+
+Our testing environment uses:
+
+- HAProxy for Linux package environments
+- Cloud Provider equivalents with NGINX Ingress for Cloud Native Hybrids
## Set up components
@@ -436,7 +448,7 @@ specifically the number of projects and those projects' sizes.
**Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.**
**However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact the performance of the environment and further adjustments may be required.**
-If this applies to you, we strongly recommended referring to the linked documentation as well as reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+If you believe this applies to you, contact us for additional guidance as required.
{{< /alert >}}
@@ -1171,9 +1183,8 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements
| Sidekiq | 3.6 vCPU
8 GB memory (request)
16 GB memory (limit) | 2 x `n1-standard-4` | 2 x `m5.xlarge` |
| Supporting services | 4 vCPU
15 GB memory | 2 x `n1-standard-2` | 2 x `m5.large` |
-- For this setup, we **recommend** and regularly [test](_index.md#validation-and-test-results)
- [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary.
-- GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy.
+- For this setup, we regularly [test](_index.md#validation-and-test-results) and recommend [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary.
+- Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported. See [Supported Machine Types](_index.md#supported-machine-types) for more information.
- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources are required for the chosen Kubernetes provider's system processes. The given examples take this into account.
- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment and any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes also require resources. The given examples take this into account.
- In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices.
@@ -1182,20 +1193,24 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements
Next are the backend components that run on static compute VMs using the Linux package (or External PaaS
services where applicable):
-| Service | Nodes | Configuration | GCP | AWS |
+| Service | Nodes | Configuration | GCP example1 | AWS example1 |
|-----------------------------|-------|------------------------|-----------------|-------------|
-| PostgreSQL1 | 1 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` |
-| Redis2 | 1 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `m5.large` |
-| Gitaly | 1 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
-| Object storage3 | - | - | - | - |
+| PostgreSQL2 | 1 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` |
+| Redis3 | 1 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `m5.large` |
+| Gitaly5 | 1 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
+| Object storage4 | - | - | - | - |
**Footnotes:**
-1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
-2. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instance](#provide-your-own-redis-instance) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
-3. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
+1. Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported, including ARM variants if available. See [Supported Machine Types](_index.md#supported-machine-types) for more information.
+2. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
+3. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instance](#provide-your-own-redis-instance) and [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
+4. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
+5. Gitaly specifications are based on the use of normal-sized repositories in good health.
+ However, if you have large monorepos (larger than several gigabytes) this can **significantly** impact Git and Gitaly performance and an increase of specifications will likely be required.
+ Refer to [large monorepos](_index.md#large-monorepos) for more information.
{{< alert type="note" >}}
diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md
index 29ef2bac0806e7ab7dc42d01cfab748b308640a0..748ee59d57a7db8dcaf657dfe368365a197ded60 100644
--- a/doc/administration/reference_architectures/3k_users.md
+++ b/doc/administration/reference_architectures/3k_users.md
@@ -27,36 +27,37 @@ For a full list of reference architectures, see
> - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)
> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](_index.md#deciding-which-architecture-to-start-with).
-| Service | Nodes | Configuration | GCP | AWS | Azure |
+| Service | Nodes | Configuration | GCP example1 | AWS example1 | Azure example1 |
|-------------------------------------------|-------|-----------------------|-----------------|--------------|----------|
-| External load balancer3 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` | `F4s v2` |
-| Consul1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| PostgreSQL1 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
-| PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Internal load balancer3 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` | `F4s v2` |
-| Redis/Sentinel2 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
-| Gitaly5 | 3 | 4 vCPU, 15 GB memory6 | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| Praefect5 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Sidekiq7 | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D2s v3` |
-| GitLab Rails7 | 3 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | `F8s v2` |
+| External load balancer4 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` | `F4s v2` |
+| Consul2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| PostgreSQL2 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
+| PgBouncer2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Internal load balancer4 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` | `F4s v2` |
+| Redis/Sentinel3 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
+| Gitaly67 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| Praefect6 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Praefect PostgreSQL2 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Sidekiq8 | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D2s v3` |
+| GitLab Rails8 | 3 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | `F8s v2` |
| Monitoring node | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Object storage4 | - | - | - | - | - |
+| Object storage5 | - | - | - | - | - |
**Footnotes:**
-1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
-2. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instance](#provide-your-own-redis-instance) for more information.
-3. Recommended to be run with a reputable third-party load balancer or service (LB PaaS) which can provide HA capabilities.
+1. Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported, including ARM variants if available. See [Supported machine types](_index.md#supported-machine-types) for more information.
+2. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
+3. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instance](#provide-your-own-redis-instance) for more information.
+4. Recommended to be run with a reputable third-party load balancer or service (LB PaaS) which can provide HA capabilities.
Sizing depends on selected Load Balancer and additional factors such as Network Bandwidth. Refer to [Load Balancers](_index.md#load-balancers) for more information.
-4. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
-5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
+5. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
+6. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/_index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`.
-1. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
+7. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required.
-6. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](_index.md#autoscaling-of-stateful-nodes).
+8. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](_index.md#autoscaling-of-stateful-nodes).
However, [Cloud Native Hybrid setups](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) are generally preferred as certain components
such as like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes.
@@ -152,30 +153,41 @@ monitor .[#7FFFD4,norank]u--> elb
## Requirements
-Before starting, see the [requirements](_index.md#requirements) for reference architectures.
+Before proceeding, review the [requirements](_index.md#requirements) for the reference architectures.
## Testing methodology
-The 3k architecture is designed to cover a large majority of workflows and is regularly
-[smoke and performance tested](_index.md#validation-and-test-results) by the Test Platform team
-against the following endpoint throughput targets:
+The 60 RPS / 5k user reference architecture is designed to accommodate most common workflows. The [Framework](https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/gitlab-delivery/framework/) team regularly conducts smoke and performance testing against the following endpoint throughput targets:
-- API: 60 RPS
-- Web: 6 RPS
-- Git (Pull): 6 RPS
-- Git (Push): 1 RPS
+| Endpoint type | Target throughput |
+| ------------- | ----------------- |
+| API | 60 RPS |
+| Web | 6 RPS |
+| Git (Pull) | 6 RPS |
+| Git (Push) | 1 RPS |
-The above targets were selected based on real customer data of total environmental loads corresponding to the user count,
-including CI and other workloads.
+These targets are based on actual customer data reflecting total environmental loads for the specified user count, including CI pipelines and other workloads.
-If you have metrics to suggest that you have regularly higher throughput against the above endpoint targets, [large monorepos](_index.md#large-monorepos)
-or notable [additional workloads](_index.md#additional-workloads) these can notably impact the performance environment and [further adjustments may be required](_index.md#scaling-an-environment).
-If this applies to you, we strongly recommended referring to the linked documentation and reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+### Performance Considerations
-Testing is done regularly by using our [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) and its dataset, which is available for anyone to use.
-The results of this testing are [available publicly on the GPT wiki](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest). For more information on our testing strategy [refer to this section of the documentation](_index.md#validation-and-test-results).
+You may need additional adjustments if your environment has:
-The load balancers used for testing were HAProxy for Linux package environments or equivalent Cloud Provider services with NGINX Ingress for Cloud Native Hybrids. These selections do not represent a specific requirement or recommendation as most [reputable load balancers are expected to work](#configure-the-external-load-balancer).
+- Consistently higher throughput than the listed targets
+- [Large monorepos](_index.md#large-monorepos)
+- Significant [additional workloads](_index.md#additional-workloads)
+
+In these cases, refer to [scaling an environment](_index.md#scaling-an-environment) for more information. If you believe these considerations may apply to you, contact us for additional guidance as required.
+
+### Testing tools and results
+
+We use the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) for testing, which includes a publicly available dataset. You can view detailed test results on the [GPT wiki](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest). For more information about our testing methodology, see the [validation and test results](_index.md#validation-and-test-results) section.
+
+### Load Balancer configuration
+
+Our testing environment uses:
+
+- HAProxy for Linux package environments
+- Cloud Provider equivalents with NGINX Ingress for Cloud Native Hybrids
## Set up components
@@ -1036,7 +1048,7 @@ designated the primary, and failover occurs automatically if the primary node go
**Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.**
**However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact the performance of the environment and further adjustments may be required.**
-If this applies to you, we strongly recommended referring to the linked documentation as well as reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+If you believe this applies to you, contact us for additional guidance as required.
{{< /alert >}}
@@ -1382,7 +1394,7 @@ requirements that are dependent on data and load.
**Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.**
**However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact the performance of the environment and further adjustments may be required.**
-If this applies to you, we strongly recommended referring to the linked documentation as well as reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+If you believe this applies to you, contact us for additional guidance as required.
{{< /alert >}}
@@ -2231,9 +2243,8 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements
| Sidekiq | 7.2 vCPU
16 GB memory (request)
32 GB memory (limit) | 3 x `n1-standard-4` | 3 x `m5.xlarge` |
| Supporting services | 4 vCPU
15 GB memory | 2 x `n1-standard-2` | 2 x `m5.large` |
-- For this setup, we **recommend** and regularly [test](_index.md#validation-and-test-results)
- [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary.
-- GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy.
+- For this setup, we regularly [test](_index.md#validation-and-test-results) and recommended [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary.
+- Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported. See [Supported Machine Types](_index.md#supported-machine-types) for more information.
- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources are required for the chosen Kubernetes provider's system processes. The given examples take this into account.
- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes also require resources. The given examples take this into account.
- In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices.
@@ -2242,30 +2253,31 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements
Next are the backend components that run on static compute VMs using the Linux package (or External PaaS
services where applicable):
-| Service | Nodes | Configuration | GCP | AWS |
+| Service | Nodes | Configuration | GCP example1 | AWS example1 |
|-------------------------------------------|-------|-----------------------|-----------------|-------------|
-| Consul1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| PostgreSQL1 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` |
-| PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| Internal load balancer3 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` |
-| Redis/Sentinel2 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` |
-| Gitaly5 | 3 | 4 vCPU, 15 GB memory6 | `n1-standard-4` | `m5.xlarge` |
-| Praefect5 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| Object storage4 | - | - | - | - |
+| Consul2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| PostgreSQL2 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` |
+| PgBouncer2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| Internal load balancer4 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` |
+| Redis/Sentinel3 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` |
+| Gitaly67 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
+| Praefect6 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| Praefect PostgreSQL2 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| Object storage5 | - | - | - | - |
**Footnotes:**
-1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
-2. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instance](#provide-your-own-redis-instance) for more information.
-3. Recommended to be run with a reputable third-party load balancer or service (LB PaaS) which can provide HA capabilities.
+1. Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported, including ARM variants if available. See [Supported Machine Types](_index.md#supported-machine-types) for more information.
+2. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
+3. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instance](#provide-your-own-redis-instance) for more information.
+4. Recommended to be run with a reputable third-party load balancer or service (LB PaaS) which can provide HA capabilities.
Sizing depends on selected Load Balancer and additional factors such as Network Bandwidth. Refer to [Load Balancers](_index.md#load-balancers) for more information.
-4. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
-5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
+5. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
+6. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/_index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`.
-6. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
+7. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required.
diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md
index 662f5d8314ca1d88338830a1ae6532f18a13be04..326e5a8c5d6a1e3612bf001315f8aba9573296a7 100644
--- a/doc/administration/reference_architectures/50k_users.md
+++ b/doc/administration/reference_architectures/50k_users.md
@@ -30,37 +30,38 @@ specifically the [Before you start](_index.md#before-you-start) and [Deciding wh
> - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)
> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](_index.md#deciding-which-architecture-to-start-with)
-| Service | Nodes | Configuration | GCP | AWS | Azure |
+| Service | Nodes | Configuration | GCP example1 | AWS example1 | Azure example1 |
|------------------------------------------|-------|-------------------------|------------------|---------------|-----------|
-| External load balancer3 | 1 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | `c5.4xlarge` | `F16s v2` |
-| Consul1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| PostgreSQL1 | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` | `m5.8xlarge` | `D32s v3` |
-| PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Internal load balancer3 | 1 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | `c5.4xlarge` | `F16s v2` |
-| Redis/Sentinel - Cache2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| Redis/Sentinel - Persistent2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| Gitaly5 | 3 | 64 vCPU, 240 GB memory6 | `n1-standard-64` | `m5.16xlarge` | `D64s v3` |
-| Praefect5 | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
-| Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Sidekiq7 | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| GitLab Rails7 | 12 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | `F32s v2` |
+| External load balancer4 | 1 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | `c5.4xlarge` | `F16s v2` |
+| Consul2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| PostgreSQL2 | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` | `m5.8xlarge` | `D32s v3` |
+| PgBouncer2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Internal load balancer4 | 1 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | `c5.4xlarge` | `F16s v2` |
+| Redis/Sentinel - Cache3 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| Redis/Sentinel - Persistent3 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| Gitaly67 | 3 | 64 vCPU, 240 GB memory | `n1-standard-64` | `m5.16xlarge` | `D64s v3` |
+| Praefect6 | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
+| Praefect PostgreSQL2 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Sidekiq8 | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| GitLab Rails8 | 12 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | `F32s v2` |
| Monitoring node | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | `F4s v2` |
-| Object storage4 | - | - | - | - | - |
+| Object storage5 | - | - | - | - | - |
**Footnotes:**
-1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
-2. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instances](#provide-your-own-redis-instances) for more information.
+1. Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported, including ARM variants if available. See [Supported machine types](_index.md#supported-machine-types) for more information.
+2. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
+3. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instances](#provide-your-own-redis-instances) for more information.
- Redis is primarily single threaded and doesn't significantly benefit from an increase in CPU cores. For this size of architecture it's strongly recommended having separate Cache and Persistent instances as specified to achieve optimum performance.
-3. Can be optionally run on reputable third-party load balancing services (LB PaaS). See [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
-4. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
-5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
+4. Can be optionally run on reputable third-party load balancing services (LB PaaS). See [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
+5. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
+6. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/_index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`.
-6. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
+7. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required.
-6. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](_index.md#autoscaling-of-stateful-nodes).
+8. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](_index.md#autoscaling-of-stateful-nodes).
However, [Cloud Native Hybrid setups](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) are generally preferred as certain components
such as like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes.
@@ -163,26 +164,37 @@ Before starting, see the [requirements](_index.md#requirements) for reference ar
## Testing methodology
-The 50k architecture is designed to cover a large majority of workflows and is regularly
-[smoke and performance tested](_index.md#validation-and-test-results) by the Test Platform team
-against the following endpoint throughput targets:
+The 1000 RPS / 50k user reference architecture is designed to accommodate most common workflows. The [Framework](https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/gitlab-delivery/framework/) team regularly conducts smoke and performance testing against the following endpoint throughput targets:
-- API: 1000 RPS
-- Web: 100 RPS
-- Git (Pull): 100 RPS
-- Git (Push): 20 RPS
+| Endpoint type | Target throughput |
+| ------------- | ----------------- |
+| API | 1000 RPS |
+| Web | 100 RPS |
+| Git (Pull) | 100 RPS |
+| Git (Push) | 20 RPS |
-The above targets were selected based on real customer data of total environmental loads corresponding to the user count,
-including CI and other workloads.
+These targets are based on actual customer data reflecting total environmental loads for the specified user count, including CI pipelines and other workloads.
-If you have metrics to suggest that you have regularly higher throughput against the above endpoint targets, [large monorepos](_index.md#large-monorepos)
-or notable [additional workloads](_index.md#additional-workloads) these can notably impact the performance environment and [further adjustments may be required](_index.md#scaling-an-environment).
-If this applies to you, we strongly recommended referring to the linked documentation and reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+### Performance considerations
-Testing is done regularly by using the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) and its dataset, which is available for anyone to use.
-The results of this testing are [available publicly on the GPT wiki](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest). For more information on our testing strategy [refer to this section of the documentation](_index.md#validation-and-test-results).
+You may need additional adjustments if your environment has:
-The load balancers used for testing were HAProxy for Linux package environments or equivalent Cloud Provider services with NGINX Ingress for Cloud Native Hybrids. These selections do not represent a specific requirement or recommendation as most [reputable load balancers are expected to work](#configure-the-external-load-balancer).
+- Consistently higher throughput than the listed targets
+- [Large monorepos](_index.md#large-monorepos)
+- Significant [additional workloads](_index.md#additional-workloads)
+
+In these cases, refer to [scaling an environment](_index.md#scaling-an-environment) for more information. If you believe these considerations may apply to you, contact us for additional guidance as required.
+
+### Testing tools and results
+
+We use the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) for testing, which includes a publicly available dataset. You can view detailed test results on the [GPT wiki](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest). For more information about our testing methodology, see the [validation and test results](_index.md#validation-and-test-results) section.
+
+### Load Balancer configuration
+
+Our testing environment uses:
+
+- HAProxy for Linux package environments
+- Cloud Provider equivalents with NGINX Ingress for Cloud Native Hybrids
## Set up components
@@ -1215,7 +1227,7 @@ designated the primary, and failover occurs automatically if the primary node go
**Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.**
**However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact the performance of the environment and further adjustments may be required.**
-If this applies to you, we strongly recommended referring to the linked documentation as well as reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+If you believe this applies to you, contact us for additional guidance as required.
{{< /alert >}}
@@ -1562,7 +1574,7 @@ requirements that are dependent on data and load.
**Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.**
**However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact the performance of the environment and further adjustments may be required.**
-If this applies to you, we strongly recommended referring to the linked documentation as well as reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+If you believe this applies to you, contact us for additional guidance as required.
{{< /alert >}}
@@ -2354,9 +2366,8 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements
| Sidekiq | 12.6 vCPU
28 GB memory (request)
56 GB memory (limit) | 4 x `n1-standard-4` | 4 x `m5.xlarge` |
| Supporting services | 8 vCPU
30 GB memory | 2 x `n1-standard-4` | 2 x `m5.xlarge` |
-- For this setup, we **recommend** and regularly [test](_index.md#validation-and-test-results)
- [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary.
-- GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy.
+- For this setup, we regularly [test](_index.md#validation-and-test-results) and recommended [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary.
+- Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported. See [Supported Machine Types](_index.md#supported-machine-types) for more information.
- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources are required for the chosen Kubernetes provider's system processes. The given examples take this into account.
- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes also require resources. The given examples take this into account.
- In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices.
@@ -2365,31 +2376,32 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements
Next are the backend components that run on static compute VMs using the Linux package (or External PaaS
services where applicable):
-| Service | Nodes | Configuration | GCP | AWS |
+| Service | Nodes | Configuration | GCP example1 | AWS example1 |
|------------------------------------------|-------|------------------------|------------------|---------------|
-| Consul1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| PostgreSQL1 | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` | `m5.8xlarge` |
-| PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| Internal load balancer3 | 1 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | `c5.4xlarge` |
-| Redis/Sentinel - Cache2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
-| Redis/Sentinel - Persistent2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
-| Gitaly5 | 3 | 64 vCPU, 240 GB memory6 | `n1-standard-64` | `m5.16xlarge` |
-| Praefect5 | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` |
-| Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| Object storage4 | - | - | - | - |
+| Consul2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| PostgreSQL2 | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` | `m5.8xlarge` |
+| PgBouncer2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| Internal load balancer4 | 1 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | `c5.4xlarge` |
+| Redis/Sentinel - Cache3 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
+| Redis/Sentinel - Persistent3 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
+| Gitaly67 | 3 | 64 vCPU, 240 GB memory | `n1-standard-64` | `m5.16xlarge` |
+| Praefect6 | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` |
+| Praefect PostgreSQL2 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| Object storage5 | - | - | - | - |
**Footnotes:**
-1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
-2. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instances](#provide-your-own-redis-instances) for more information.
+1. Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported, including ARM variants if available. See [Supported Machine Types](_index.md#supported-machine-types) for more information.
+2. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
+3. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instances](#provide-your-own-redis-instances) for more information.
- Redis is primarily single threaded and doesn't significantly benefit from an increase in CPU cores. For this size of architecture it's strongly recommended having separate Cache and Persistent instances as specified to achieve optimum performance.
-3. Can be optionally run on reputable third-party load balancing services (LB PaaS). See [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
-4. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
-5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
+4. Can be optionally run on reputable third-party load balancing services (LB PaaS). See [Recommended cloud providers and services](_index.md#recommended-cloud-providers-and-services) for more information.
+5. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
+6. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/_index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`.
-6. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
+7. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required.
diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md
index c2e24087ef39e2e4989bd3ddec78bdc3c47bde72..9709499933f01dfdd19894d2b1ffb73c0b38279c 100644
--- a/doc/administration/reference_architectures/5k_users.md
+++ b/doc/administration/reference_architectures/5k_users.md
@@ -30,36 +30,37 @@ specifically the [Before you start](_index.md#before-you-start) and [Deciding wh
> - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative)
> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](_index.md#deciding-which-architecture-to-start-with)
-| Service | Nodes | Configuration | GCP | AWS | Azure |
+| Service | Nodes | Configuration | GCP example1 | AWS example1 | Azure example1 |
|-------------------------------------------|-------|-------------------------|-----------------|--------------|----------|
-| External load balancer3 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` | `F4s v2` |
-| Consul1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| PostgreSQL1 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
-| PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Internal load balancer3 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` | `F4s v2` |
-| Redis/Sentinel2 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
-| Gitaly5 | 3 | 8 vCPU, 30 GB memory6 | `n1-standard-8` | `m5.2xlarge` | `D8s v3` |
-| Praefect5 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Sidekiq7 | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D2s v3` |
-| GitLab Rails7 | 3 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | `c5.4xlarge` | `F16s v2`|
+| External load balancer4 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` | `F4s v2` |
+| Consul2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| PostgreSQL2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` |
+| PgBouncer2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Internal load balancer4 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` | `F4s v2` |
+| Redis/Sentinel3 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` |
+| Gitaly67 | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | `m5.2xlarge` | `D8s v3` |
+| Praefect6 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Praefect PostgreSQL2 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
+| Sidekiq8 | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D2s v3` |
+| GitLab Rails8 | 3 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | `c5.4xlarge` | `F16s v2`|
| Monitoring node | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` |
-| Object storage4 | - | - | - | - | - |
+| Object storage5 | - | - | - | - | - |
**Footnotes:**
-1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
-2. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instance](#provide-your-own-redis-instance) for more information.
-3. Recommended to be run with a reputable third-party load balancer or service (LB PaaS) which can provide HA capabilities.
+1. Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported, including ARM variants if available. See [Supported machine types](_index.md#supported-machine-types) for more information.
+2. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
+3. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instance](#provide-your-own-redis-instance) for more information.
+4. Recommended to be run with a reputable third-party load balancer or service (LB PaaS) which can provide HA capabilities.
Also, the sizing depends on selected Load Balancer and additional factors such as Network Bandwidth. Refer to [Load Balancers](_index.md#load-balancers) for more information.
-4. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
-5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
+5. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
+6. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/_index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`.
-6. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
+7. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required.
-6. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](_index.md#autoscaling-of-stateful-nodes).
+8. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](_index.md#autoscaling-of-stateful-nodes).
However, [Cloud Native Hybrid setups](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) are generally preferred as certain components
such as like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes.
@@ -155,30 +156,41 @@ monitor .[#7FFFD4,norank]u--> elb
## Requirements
-Before starting, see the [requirements](_index.md#requirements) for reference architectures.
+Before proceeding, review the [requirements](_index.md#requirements) for the reference architectures.
## Testing methodology
-The 5k architecture is designed to cover a large majority of workflows and is regularly
-[smoke and performance tested](_index.md#validation-and-test-results) by the Test Platform team
-against the following endpoint throughput targets:
+The 100 RPS / 5k user reference architecture is designed to accommodate most common workflows. The [Framework](https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/gitlab-delivery/framework/) team regularly conducts smoke and performance testing against the following endpoint throughput targets:
-- API: 100 RPS
-- Web: 10 RPS
-- Git (Pull): 10 RPS
-- Git (Push): 2 RPS
+| Endpoint Type | Target Throughput |
+| ------------- | ----------------- |
+| API | 100 RPS |
+| Web | 10 RPS |
+| Git (Pull) | 10 RPS |
+| Git (Push) | 2 RPS |
-The above targets were selected based on real customer data of total environmental loads corresponding to the user count,
-including CI and other workloads.
+These targets are based on actual customer data reflecting total environmental loads for the specified user count, including CI pipelines and other workloads.
-If you have metrics to suggest that you have regularly higher throughput against the above endpoint targets, [large monorepos](_index.md#large-monorepos)
-or notable [additional workloads](_index.md#additional-workloads) these can notably impact the performance environment and [further adjustments may be required](_index.md#scaling-an-environment).
-If this applies to you, we strongly recommended referring to the linked documentation and reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+### Performance considerations
-Testing is done regularly by using the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) and its dataset, which is available for anyone to use.
-The results of this testing are [available publicly on the GPT wiki](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest). For more information on our testing strategy [refer to this section of the documentation](_index.md#validation-and-test-results).
+You may need additional adjustments if your environment has:
-The load balancers used for testing were HAProxy for Linux package environments or equivalent Cloud Provider services with NGINX Ingress for Cloud Native Hybrids. These selections do not represent a specific requirement or recommendation as most [reputable load balancers are expected to work](#configure-the-external-load-balancer).
+- Consistently higher throughput than the listed targets
+- [Large monorepos](_index.md#large-monorepos)
+- Significant [additional workloads](_index.md#additional-workloads)
+
+In these cases, refer to [scaling an environment](_index.md#scaling-an-environment) for more information. If you believe these considerations may apply to you, contact us for additional guidance as required.
+
+### Testing tools and results
+
+We use the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) for testing, which includes a publicly available dataset. You can view detailed test results on the [GPT wiki](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest). For more information about our testing methodology, see the [validation and test results](_index.md#validation-and-test-results) section.
+
+### Load Balancer configuration
+
+Our testing environment uses:
+
+- HAProxy for Linux package environments
+- Cloud Provider equivalents with NGINX Ingress for Cloud Native Hybrids
## Set up components
@@ -1039,7 +1051,7 @@ designated the primary, and failover occurs automatically if the primary node go
**Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.**
**However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact the performance of the environment and further adjustments may be required.**
-If this applies to you, we strongly recommended referring to the linked documentation as well as reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+If you believe this applies to you, contact us for additional guidance as required.
{{< /alert >}}
@@ -1386,7 +1398,7 @@ requirements that are dependent on data and load.
**Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.**
**However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact the performance of the environment and further adjustments may be required.**
-If this applies to you, we strongly recommended referring to the linked documentation as well as reaching out to your [Customer Success Manager](https://handbook.gitlab.com/job-families/sales/customer-success-management/) or our [Support team](https://about.gitlab.com/support/) for further guidance.
+If you believe this applies to you, contact us for additional guidance as required.
{{< /alert >}}
@@ -2204,9 +2216,8 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements
| Sidekiq | 7.2 vCPU
16 GB memory (request)
32 GB memory (limit) | 3 x `n1-standard-4` | 3 x `m5.xlarge` |
| Supporting services | 4 vCPU
15 GB memory | 2 x `n1-standard-2` | 2 x `m5.large` |
-- For this setup, we **recommend** and regularly [test](_index.md#validation-and-test-results)
- [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary.
-- GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy.
+- For this setup, we regularly [test](_index.md#validation-and-test-results) and recommended [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary.
+- Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported. See [Supported Machine Types](_index.md#supported-machine-types) for more information.
- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources are required for the chosen Kubernetes provider's system processes. The given examples take this into account.
- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes also require resources. The given examples take this into account.
- In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices.
@@ -2215,30 +2226,31 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements
Next are the backend components that run on static compute VMs using the Linux package (or External PaaS
services where applicable):
-| Service | Nodes | Configuration | GCP | AWS |
+| Service | Nodes | Configuration | GCP example1 | AWS example1 |
|-------------------------------------------|-------|-----------------------|-----------------|--------------|
-| Consul1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| PostgreSQL1 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
-| PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| Internal load balancer3 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` |
-| Redis/Sentinel2 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` |
-| Gitaly5 | 3 | 8 vCPU, 30 GB memory6 | `n1-standard-8` | `m5.2xlarge` |
-| Praefect5 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
-| Object storage4 | - | - | - | - |
+| Consul2 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| PostgreSQL2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` |
+| PgBouncer3 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| Internal load balancer4 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5n.xlarge` |
+| Redis/Sentinel3 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` |
+| Gitaly67 | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | `m5.2xlarge` |
+| Praefect6 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| Praefect PostgreSQL2 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` |
+| Object storage5 | - | - | - | - |
**Footnotes:**
-1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
-2. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instance](#provide-your-own-redis-instance) for more information.
-3. Recommended to be run with a reputable third-party load balancer or service (LB PaaS).
+1. Machine type examples are given for illustration purposes. These types are used in [validation and testing](_index.md#validation-and-test-results) but are not intended as prescriptive defaults. Switching to other machine types that meet the requirements as listed is supported, including ARM variants if available. See [Supported Machine Types](_index.md#supported-machine-types) for more information.
+2. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. See [Provide your own PostgreSQL instance](#provide-your-own-postgresql-instance) for more information.
+3. Can be optionally run on reputable third-party external PaaS Redis solutions. See [Provide your own Redis instance](#provide-your-own-redis-instance) for more information.
+4. Recommended to be run with a reputable third-party load balancer or service (LB PaaS).
Also, the sizing depends on selected Load Balancer and additional factors such as Network Bandwidth. Refer to [Load Balancers](_index.md#load-balancers) for more information.
-4. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
-5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
+5. Should be run on reputable Cloud Provider or Self Managed solutions. See [Configure the object storage](#configure-the-object-storage) for more information.
+6. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management.
Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/_index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`.
-6. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
+7. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health.
However, if you have [large monorepos](_index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](_index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required.
diff --git a/doc/administration/reference_architectures/_index.md b/doc/administration/reference_architectures/_index.md
index d24bea0c84e0671624269e60a8985b76efb06f94..34d45d1594bd0da24146e654efacfe57e6cab4ce 100644
--- a/doc/administration/reference_architectures/_index.md
+++ b/doc/administration/reference_architectures/_index.md
@@ -308,19 +308,17 @@ linkStyle default fill:none,stroke:#7759C2
Before implementing a reference architecture, see the following requirements and guidance.
-### Supported CPUs
+### Supported machine types
-The architectures are built and tested across various cloud providers, primarily GCP and AWS.
-To ensure the widest range of compatibility, CPU targets are intentionally set to the lowest common denominator across these platforms:
+The architectures are designed to be flexible in terms of machine type selection while ensuring consistent performance. While we provide specific machine type examples in each reference architecture, these are not intended to be prescriptive defaults.
-- The [`n1` series](https://cloud.google.com/compute/docs/general-purpose-machines#n1_machines) for GCP.
-- The [`m5` series](https://aws.amazon.com/ec2/instance-types/) for AWS.
+You can use any machine types that meet or exceed the specified requirements for each component, such as:
-Depending on other requirements such as memory or network bandwidth and cloud provider availability, different machine types are used accordingly throughout the architectures. We expect that the target CPUs above perform well.
+- Newer generation machine types (like GCP `n2` series or AWS `m6` series)
+- Different architectures like ARM-based instances (such as AWS Graviton)
+- Alternative machine type families that better match your specific workload characteristics (such as higher network bandwidth)
-If you want, you can select a newer machine type series and have improved performance as a result.
-
-Additionally, ARM CPUs are supported for Linux package environments and for any [cloud provider services](#cloud-provider-services).
+This guidance is also applicable for any Cloud Provider services such as AWS RDS.
{{< alert type="note" >}}
@@ -328,6 +326,12 @@ Any "burstable" instance types are not recommended due to inconsistent performan
{{< /alert >}}
+{{< alert type="note" >}}
+
+For details about what machine types we test against and how, refer to [validation and test results](#validation-and-test-results).
+
+{{< /alert >}}
+
### Supported disk types
Most standard disk types are expected to work for GitLab. However, be aware of the following specific call-outs:
@@ -613,39 +617,25 @@ For deploying GitLab over multiple data centers or regions, we offer [GitLab Geo
## Validation and test results
-The [Test Platform team](https://handbook.gitlab.com/handbook/engineering/quality/)
+The [Framework team](https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/gitlab-delivery/framework/)
does regular smoke and performance tests for these architectures to ensure they
remain compliant.
-### Why we perform the tests
-
-The Quality Department measures and improves the performance of GitLab. They create and validate architectures
-to ensure reliable configurations for GitLab Self-Managed.
-
-For more information, see our [handbook page](https://handbook.gitlab.com/handbook/engineering/infrastructure/test-platform/performance-and-scalability/).
-
### How we perform the tests
-Testing occurs against all architectures and cloud providers in an automated and ad-hoc fashion. Two tools are used for testing:
+Testing is conducted using specific coded workloads derived from sample customer data with the following tools:
-- The [GitLab Environment Toolkit](https://gitlab.com/gitlab-org/gitlab-environment-toolkit) Terraform and Ansible scripts for building the environments.
-- The [GitLab Performance Tool](https://gitlab.com/gitlab-org/quality/performance) for performance testing.
+- [GitLab Environment Toolkit (GET)](https://gitlab.com/gitlab-org/gitlab-environment-toolkit) - Terraform and Ansible scripts for building the environments.
+- [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance) - Test wrapper tool based on k6.
-Network latency on the test environments between components on all cloud providers were measured at <5 ms. This an observation, not a recommendation.
+We test architectures across cloud providers, primarily GCP and AWS, using the following as baseline machine types:
-We aim to have a _test smart_ approach where architectures tested have a good range and can also apply to others. Testing focuses on installing a 10k Linux package
-on GCP. This approach serves as a reliable indicator for other architectures, cloud providers, and Cloud Native Hybrids.
+- The [`n1` series](https://cloud.google.com/compute/docs/general-purpose-machines#n1_machines) for GCP
+- The [`m5` series](https://aws.amazon.com/ec2/instance-types/) for AWS
-The architectures are cross-platform. Everything runs on VMs through [the Linux package](https://docs.gitlab.com/omnibus/). Testing occurs primarily on GCP.
-However, they perform similarly on hardware with equivalent specifications on other cloud providers or if run on-premises (bare-metal).
+These machine types were selected as a lowest common denominator target to ensure broad compatibility. Using different or newer machine types that meet the CPU and memory requirements is fully supported - see [Supported Machine Types](#supported-machine-types) for more information. The architectures are expected to perform similarly on any hardware meeting the specifications, whether on other cloud providers or on-premises.
-GitLab tests these architectures using the
-[GitLab Performance Tool](https://gitlab.com/gitlab-org/quality/performance).
-We use specific coded workloads based on sample customer data. Select the
-[architecture](#available-reference-architectures) that matches your scale.
-
-Each endpoint type is tested with the following number of RPS
-per 1,000 users:
+Each reference architecture is tested against specific throughput targets based on real customer data. For every 1,000 users, we test:
- API: 20 RPS
- Web: 2 RPS
@@ -654,6 +644,12 @@ per 1,000 users:
The above RPS targets were selected based on real customer data of total environmental loads corresponding to the user count, including CI and other workloads.
+{{< alert type="note" >}}
+
+Network latency between components in test environments was observed at <5 ms but note this is not intended as a hard requirement.
+
+{{< /alert >}}
+
### How to interpret the results
{{< alert type="note" >}}
@@ -892,6 +888,10 @@ The following is a history of notable updates for reference architectures (2021-
You can find a full history of changes [on the GitLab project](https://gitlab.com/gitlab-org/gitlab/-/merge_requests?scope=all&state=merged&label_name%5B%5D=Reference%20Architecture&label_name%5B%5D=documentation).
+**2025:**
+
+- [2025-02](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/181145): Added further clarity around supported machine types and that the listed examples are not intended as prescriptive defaults.
+
**2024:**
- [2024-12](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/175854): Added _Start Large_ section as further guidance for choosing initial sizing.