diff --git a/Makefile b/Makefile index bdce5f1b..0ec46e8d 100644 --- a/Makefile +++ b/Makefile @@ -38,6 +38,9 @@ help: ## Display this help. clean: ## Remove any locally built or downloaded files rm -rf $(shell pwd)/bin +docs: crd-ref-docs ## Generate CRD reference docs + $(CRD_REF_DOCS) --config docs/config/config.yaml --renderer markdown --output-path docs/crd-reference.md --source-path api/v1beta1 + ##@ Development manifests: controller-gen ## Generate WebhookConfiguration, ClusterRole and CustomResourceDefinition objects. @@ -111,7 +114,7 @@ undeploy: ## Undeploy controller from the K8s cluster specified in ~/.kube/confi ##@ Dependencies -deps: controller-gen kustomize kind kpt golangci-lint ## Download the following dependencies locally (in './bin') if necessary +deps: controller-gen kustomize kind kpt golangci-lint crd-ref-docs ## Download the following dependencies locally (in './bin') if necessary CONTROLLER_GEN = $(shell pwd)/bin/controller-gen controller-gen: ## Download controller-gen locally if necessary. @@ -138,6 +141,10 @@ GOLANGCI_LINT = $(shell pwd)/bin/golangci-lint golangci-lint: ## Downlaod 'golangci-lint' locally if necessary $(call go-get-tool,$(GOLANGCI_LINT),github.com/golangci/golangci-lint/cmd/golangci-lint@v1.44.0) +CRD_REF_DOCS = $(shell pwd)/bin/crd-ref-docs +crd-ref-docs: ## Downlaod 'crd-ref-docs' locally if necessary + $(call go-get-tool,$(CRD_REF_DOCS),github.com/elastic/crd-ref-docs@v0.0.8) + # go-get-tool will 'go get' any package $2 and install it to $1. PROJECT_DIR := $(shell dirname $(abspath $(lastword $(MAKEFILE_LIST)))) define go-get-tool diff --git a/README.md b/README.md index dd3cc751..3f24ea90 100644 --- a/README.md +++ b/README.md @@ -9,21 +9,37 @@ Spanner Autoscaler is a [Kubernetes Operator](https://coreos.com/operators/) to ## Overview [Cloud Spanner](https://cloud.google.com/spanner) is scalable. -When CPU utilization gets high, we can [reduce CPU utilization by increasing compute capacity](https://cloud.google.com/spanner/docs/cpu-utilization?hl=en#add-compute-capacity). +When CPU utilization becomes high, we can [reduce it by increasing compute capacity](https://cloud.google.com/spanner/docs/cpu-utilization?hl=en#add-compute-capacity). -Spanner Autoscaler is created to reconcile Cloud Spanner compute capacity like [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) by configuring `minNodes`, `maxNodes`, and `targetCPUUtilization`. +Spanner Autoscaler is created to reconcile Cloud Spanner compute capacity like [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) by configuring a compute capacity range and `targetCPUUtilization`. -When CPU Utilization(High Priority) is above `targetCPUUtilization`, Spanner Autoscaler calculates desired compute capacity and increases compute capacity. +When CPU Utilization(High Priority) is above (or below) `targetCPUUtilization`, Spanner Autoscaler tries to bring it back to the threshold by calculating desired compute capacity and then increasing (or decreasing) compute capacity. The [pricing of Cloud Spanner](https://cloud.google.com/spanner/pricing) states that any compute capacity which is provisioned will be billed for a minimum of one hour, so Spanner Autoscaler maintains the increased compute capacity for about an hour. Spanner Autoscaler has `--scale-down-interval` flag (default: 55min) for achieving this. -While scaling down, if Spanner Autoscaler reduces a lot of compute capacity at once like 10000 PU -> 1000 PU, it will cause a latency increase. Spanner Autoscaler decreases the compute capacity in steps to avoid such large disruptions. This step size can be provided with the `maxScaleDownNodes` parameter (default: 2). +While scaling down, removing large amounts of compute capacity at once (like 10000 PU -> 1000 PU) can cause a latency increase. Therefore, Spanner Autoscaler decreases the compute capacity in steps to avoid such large disruptions. This step size can be provided with the `scaledownStepSize` parameter (default: 2000 PU). +### Scheduled scaling feature + +If there are some batch jobs or any other compute intensive tasks which are run periodically on the Cloud Spanner, it is now possible to bump up the scaling range only for a specified duration. For example, the following `SpannerAutoscaleSchedule` will add an extra compute capacity of 600 Processing Units to the spanner instance every day at 2 o'clock, just for 3 hours: +```yaml +apiVersion: spanner.mercari.com/v1beta1 +kind: SpannerAutoscaleSchedule +metadata: + name: spannerautoscaleschedule-sample + namespace: your-namespace +spec: + targetResource: spannerautoscaler-sample + additionalProcessingUnits: 600 + schedule: + cron: "0 2 * * *" + duration: 3h +``` ## Installation @@ -54,33 +70,14 @@ Spanner Autoscaler can be installed using [KPT](https://kpt.dev/installation/) b ```console $ kubectl apply -f spanner-autoscaler/samples ``` - Examples of CRDs can be found [below](#examples).\ + Examples of CustomResources can be found [below](#examples).\ For authentication using a GCP service account JSON key, follow [these steps](#gcp-setup) to create a k8s secret with credentials. -## `SpannerAutoscaler` CRD reference +## CRD reference -Following is a reference of the parameters which can be provided in the `spec` section of the `SpannerAutoscaler` CRD: - -Parameter | Type | Required | Description ---- | --- | --- | --- -`scaleTargetRef` | object | yes | Spanner Instance which will be auto scaled -`scaleTargetRef.projectId` | string | yes | GCP Project ID -`scaleTargetRef.instanceId` | string | yes | Cloud Spanner Instance ID -`serviceAccountSecretRef` | object | no | Secret created [here](#authenticate-with-service-account-json-key) -`serviceAccountSecretRef.name` | string | yes | Name of the k8s secret -`serviceAccountSecretRef.namespace` | string | yes | Namespace of the k8s secret -`serviceAccountSecretRef.key` | string | yes | Name of the key in the secret which holds the authentication information -`impersonateConfig` | object | no | Impersonation config -`impersonateConfog.targetServiceAccount` | string | yes | Email address of the service account to impersonate ([`GSA_SPANNER`](#using-service-accounts-with-workload-identity-and-impersonation)) -`impersonateConfog.delegates` | list of string | yes | List of target service account emails in a delegation chain ([Ref](https://pkg.go.dev/google.golang.org/api/impersonate#CredentialsConfig)) -`minProcessingUnits` | integer | no | Minimum [processing units](https://cloud.google.com/spanner/docs/compute-capacity#compute_capacity) -`maxProcessingUnits` | integer | no | Maximum [processing units](https://cloud.google.com/spanner/docs/compute-capacity#compute_capacity) -`minNodes` | integer | no | Equals [`minProcessingUnits / 1000`](https://cloud.google.com/spanner/docs/compute-capacity#compute_capacity) -`maxNodes` | integer | no | Equals [`maxProcessingUnits / 1000`](https://cloud.google.com/spanner/docs/compute-capacity#compute_capacity) -`maxScaleDownNodes` | integer | no | Maximum number of nodes to remove in one scale-down cycle -`targetCPUUtilization` | object | yes | Spanner [CPU utilization metrics](https://cloud.google.com/spanner/docs/cpu-utilization) -`targetCPUUtilization.highPriority` | integer | yes | High Priority CPU Utilization value +- [`SpannerAutoscaler` CRD reference](docs/crd-reference.md#spannerautoscaler) +- [`SpannerAutoscaleSchedule` CRD reference](docs/crd-reference.md#spannerautoscaleschedule) ## Examples @@ -88,64 +85,72 @@ Parameter | Type | Required | Description #### Single Service Account using Workload Identity: ```yaml -apiVersion: spanner.mercari.com/v1alpha1 +apiVersion: spanner.mercari.com/v1beta1 kind: SpannerAutoscaler metadata: name: spannerautoscaler-sample namespace: your-namespace spec: - scaleTargetRef: + targetInstance: projectId: your-gcp-project-id instanceId: your-spanner-instance-id - minNodes: 1 - maxNodes: 4 - maxScaleDownNodes: 1 - targetCPUUtilization: - highPriority: 60 + scaleConfig: + processingUnits: + min: 1000 + max: 4000 + scaledownStepSize: 1000 + targetCPUUtilization: + highPriority: 60 ``` #### Using Service Account JSON key for each `SpannerAutoscaler`: ```diff - apiVersion: spanner.mercari.com/v1alpha1 + apiVersion: spanner.mercari.com/v1beta1 kind: SpannerAutoscaler metadata: name: spannerautoscaler-sample namespace: your-namespace spec: - scaleTargetRef: + targetInstance: projectId: your-gcp-project-id instanceId: your-spanner-instance-id -+ serviceAccountSecretRef: -+ namespace: your-namespace -+ name: spanner-autoscaler-gcp-sa -+ key: service-account - minNodes: 1 - maxNodes: 4 - maxScaleDownNodes: 1 - targetCPUUtilization: - highPriority: 60 ++ authentication: ++ iamKeySecret: ++ namespace: your-namespace ++ name: spanner-autoscaler-gcp-sa ++ key: service-account + scaleConfig: + processingUnits: + min: 1000 + max: 4000 + scaledownStepSize: 1000 + targetCPUUtilization: + highPriority: 60 ``` #### Using Service Accounts with Workload Identity and impersonation: ```diff - apiVersion: spanner.mercari.com/v1alpha1 + apiVersion: spanner.mercari.com/v1beta1 kind: SpannerAutoscaler metadata: name: spannerautoscaler-sample namespace: your-namespace spec: - scaleTargetRef: + targetInstance: projectId: your-gcp-project-id instanceId: your-spanner-instance-id -+ impersonateConfig: -+ targetServiceAccount: GSA_SPANNER@TENANT_PROJECT.iam.gserviceaccount.com - minNodes: 1 - maxNodes: 4 - maxScaleDownNodes: 1 - targetCPUUtilization: - highPriority: 60 ++ authentication: ++ impersonateConfig: ++ targetServiceAccount: GSA_SPANNER@TENANT_PROJECT.iam.gserviceaccount.com + scaleConfig: + processingUnits: + min: 1000 + max: 4000 + scaledownStepSize: 1000 + targetCPUUtilization: + highPriority: 60 ``` @@ -250,34 +255,15 @@ Following are some other advanced methods which can also be used for GCP authent -## Development - -Run `make help` for a list of useful targets. The installation basically has 3 steps: - -``` -## 1. Installation of CRD -$ make install - -## 2. Deployment of the operator -$ make deploy - -## 3. Creation of a CRD -$ kubectl apply -f config/samples -``` +## Development and Contribution -Test the operator with `make test` +See [docs/development.md](docs/development.md) and [CONTRIBUTING.md](.github/CONTRIBUTING.md) respectively. -> :warning: **Migration from `v0.1.5`:** Names of some resources (`Deployment`, `serviceAccount`,`Role` etc) have changed since version `0.1.5`. Thus, you must first uninstall the old version before installing the new version. To uninstall the old version: -> ```console -> $ git checkout v0.1.5 -> $ kustomize build config/default | kubectl delete -f - -> ``` -> Specifically, the kubernetes service account used for running the spanner-autoscaler has changed from `default` to `spanner-autoscaler-controller-manager`. Please keep this in mind. It is recommended to follow the below configuration steps and re-create any resources if needed. +### :information_source: Migration from `0.3.0` to `0.4.0`: +The older version `0.3.0` (with `apiVersion: spanner.mercari.com/v1alpha1`) is now deprecated in favor of `0.4.0` (with `apiVersion: spanner.mercari.com/v1beta1`). -## Contribution - -See [CONTRIBUTING.md](.github/CONTRIBUTING.md). +Version `0.4.0` is backward compatible with `0.3.0`, but there is a restructuring of the `SpannerAutoscaler` resource definition and names of many fields have changed. Thus it is recommended to go through the [`SpannerAutoscaler` CRD reference](docs/crd-reference.md#spannerautoscaler) and replace `v1alpha1` resources with `v1beta1` spec definition. ## License @@ -290,6 +276,8 @@ Spanner Autoscaler is released under the [Apache License 2.0](./LICENSE). 1. It doesn't check [the storage size and the number of databases](https://cloud.google.com/spanner/quotas?hl=en#database_limits) as well. You must take care of these metrics by yourself. +:information_source: More information and background of spanner-autoscaler is available on [this blog](https://engineering.mercari.com/en/blog/entry/20211222-kubernetes-based-spanner-autoscaler)! + [actions-workflow-test]: https://github.com/mercari/spanner-autoscaler/actions?query=workflow%3ATest diff --git a/api/v1beta1/spannerautoscaler_types.go b/api/v1beta1/spannerautoscaler_types.go index ef4b6a32..10e4c243 100644 --- a/api/v1beta1/spannerautoscaler_types.go +++ b/api/v1beta1/spannerautoscaler_types.go @@ -85,10 +85,10 @@ type ScaleConfig struct { // This is only used at the time of CustomResource creation. If compute capacity is provided in `nodes`, then it is automatically converted to `processing-units` at the time of resource creation, and internally, only `ProcessingUnits` are used for computations and scaling. ComputeType ComputeType `json:"computeType,omitempty"` - // If `nodes` are provided at the time of resource creation, then they are automatically converted to `processing-units`. So it is recommended to use only the processing units. + // If `nodes` are provided at the time of resource creation, then they are automatically converted to `processing-units`. So it is recommended to use only the processing units. Ref: [Spanner Compute Capacity](https://cloud.google.com/spanner/docs/compute-capacity#compute_capacity) Nodes ScaleConfigNodes `json:"nodes,omitempty"` - // ProcessingUnits for scaling of the Spanner instance: https://cloud.google.com/spanner/docs/compute-capacity#compute_capacity + // ProcessingUnits for scaling of the Spanner instance. Ref: [Spanner Compute Capacity](https://cloud.google.com/spanner/docs/compute-capacity#compute_capacity) ProcessingUnits ScaleConfigPUs `json:"processingUnits,omitempty"` // The maximum number of processing units which can be deleted in one scale-down operation @@ -96,24 +96,32 @@ type ScaleConfig struct { // +kubebuilder:validation:MultipleOf=1000 ScaledownStepSize int `json:"scaledownStepSize,omitempty"` - // The CPU utilization which the autoscaling will try to achieve + // The CPU utilization which the autoscaling will try to achieve. Ref: [Spanner CPU utilization](https://cloud.google.com/spanner/docs/cpu-utilization#task-priority) TargetCPUUtilization TargetCPUUtilization `json:"targetCPUUtilization"` } +// Compute capacity in terms of Nodes type ScaleConfigNodes struct { + // Minimum number of Nodes for the autoscaling range Min int `json:"min,omitempty"` + + // Maximum number of Nodes for the autoscaling range Max int `json:"max,omitempty"` } +// Compute capacity in terms of Processing Units type ScaleConfigPUs struct { + // Minimum number of Processing Units for the autoscaling range // +kubebuilder:validation:MultipleOf=100 Min int `json:"min"` + // Maximum number of Processing Units for the autoscaling range // +kubebuilder:validation:MultipleOf=100 Max int `json:"max"` } type TargetCPUUtilization struct { + // Desired CPU utilization for 'High Priority' CPU consumption category. Ref: [Spanner CPU utilization](https://cloud.google.com/spanner/docs/cpu-utilization#task-priority) // +kubebuilder:validation:Minimum=0 // +kubebuilder:validation:Maximum=100 // +kubebuilder:validation:ExclusiveMinimum=true @@ -123,9 +131,14 @@ type TargetCPUUtilization struct { // SpannerAutoscalerSpec defines the desired state of SpannerAutoscaler type SpannerAutoscalerSpec struct { + // The Spanner instance which will be managed for autoscaling TargetInstance TargetInstance `json:"targetInstance"` - Authentication Authentication `json:"authentication"` - ScaleConfig ScaleConfig `json:"scaleConfig"` + + // Authentication details for the Spanner instance + Authentication Authentication `json:"authentication,omitempty"` + + // Details of the autoscaling parameters for the Spanner instance + ScaleConfig ScaleConfig `json:"scaleConfig"` } type InstanceState string @@ -143,22 +156,31 @@ const ( InstanceStateReady InstanceState = "ready" ) +// A `SpannerAutoscaleSchedule` which is currently active and will be used for calculating the autoscaling range. type ActiveSchedule struct { - ScheduleName string `json:"name"` - EndTime metav1.Time `json:"endTime"` - AdditionalPU int `json:"additionalPU"` + // Name of the `SpannerAutoscaleSchedule` + ScheduleName string `json:"name"` + + // The time until when this schedule will remain active + EndTime metav1.Time `json:"endTime"` + + // The extra compute capacity which will be added because of this schedule + AdditionalPU int `json:"additionalPU"` } // SpannerAutoscalerStatus defines the observed state of SpannerAutoscaler type SpannerAutoscalerStatus struct { - Schedules []string `json:"schedules,omitempty"` + // List of schedules which are registered with this spanner-autoscaler instance + Schedules []string `json:"schedules,omitempty"` + + // List of all the schedules which are currently active and will be used in calculating compute capacity CurrentlyActiveSchedules []ActiveSchedule `json:"currentlyActiveSchedules,omitempty"` - // Last time the SpannerAutoscaler scaled the number of Spanner nodes + // Last time the `SpannerAutoscaler` scaled the number of Spanner nodes. // Used by the autoscaler to control how often the number of nodes are changed LastScaleTime metav1.Time `json:"lastScaleTime,omitempty"` - // Last time the SpannerAutoscaler fetched and synced this status + // Last time the `SpannerAutoscaler` fetched and synced the metrics from Spanner LastSyncTime metav1.Time `json:"lastSyncTime,omitempty"` // Current number of processing-units in the Spanner instance @@ -173,6 +195,7 @@ type SpannerAutoscalerStatus struct { // Maximum number of processing units based on the currently active schedules DesiredMaxPUs int `json:"desiredMaxPUs,omitempty"` + // State of the Cloud Spanner instance InstanceState InstanceState `json:"instanceState,omitempty"` // Current average CPU utilization for high priority task, represented as a percentage diff --git a/api/v1beta1/spannerautoscaleschedule_types.go b/api/v1beta1/spannerautoscaleschedule_types.go index fe081ac1..725ab724 100644 --- a/api/v1beta1/spannerautoscaleschedule_types.go +++ b/api/v1beta1/spannerautoscaleschedule_types.go @@ -20,18 +20,25 @@ import ( metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" ) -// TODO: Add comments for each struct - +// The recurring frequency and the length of time for which a schedule will remain active type Schedule struct { - Cron string `json:"cron"` + // The recurring frequency of the schedule in [standard cron](https://en.wikipedia.org/wiki/Cron) format. Examples and verification utility: https://crontab.guru + Cron string `json:"cron"` + + // The length of time for which this schedule will remain active each time the cron is triggered. Duration string `json:"duration"` } // SpannerAutoscaleScheduleSpec defines the desired state of SpannerAutoscaleSchedule type SpannerAutoscaleScheduleSpec struct { - TargetResource string `json:"targetResource"` - AdditionalProcessingUnits int `json:"additionalProcessingUnits"` - Schedule Schedule `json:"schedule"` + // The `SpannerAutoscaler` resource name with which this schedule will be registered + TargetResource string `json:"targetResource"` + + // The extra compute capacity which will be added when this schedule is active + AdditionalProcessingUnits int `json:"additionalProcessingUnits"` + + // The details of when and for how long this schedule will be active + Schedule Schedule `json:"schedule"` } // SpannerAutoscaleScheduleStatus defines the observed state of SpannerAutoscaleSchedule diff --git a/config/crd/bases/spanner.mercari.com_spannerautoscalers.yaml b/config/crd/bases/spanner.mercari.com_spannerautoscalers.yaml index cea90f0f..fa5c6901 100644 --- a/config/crd/bases/spanner.mercari.com_spannerautoscalers.yaml +++ b/config/crd/bases/spanner.mercari.com_spannerautoscalers.yaml @@ -312,23 +312,30 @@ spec: - processing-units type: string nodes: - description: If `nodes` are provided at the time of resource creation, - then they are automatically converted to `processing-units`. - So it is recommended to use only the processing units. + description: 'If `nodes` are provided at the time of resource + creation, then they are automatically converted to `processing-units`. + So it is recommended to use only the processing units. Ref: + [Spanner Compute Capacity](https://cloud.google.com/spanner/docs/compute-capacity#compute_capacity)' properties: max: + description: Maximum number of Nodes for the autoscaling range type: integer min: + description: Minimum number of Nodes for the autoscaling range type: integer type: object processingUnits: - description: 'ProcessingUnits for scaling of the Spanner instance: - https://cloud.google.com/spanner/docs/compute-capacity#compute_capacity' + description: 'ProcessingUnits for scaling of the Spanner instance. + Ref: [Spanner Compute Capacity](https://cloud.google.com/spanner/docs/compute-capacity#compute_capacity)' properties: max: + description: Maximum number of Processing Units for the autoscaling + range multipleOf: 100 type: integer min: + description: Minimum number of Processing Units for the autoscaling + range multipleOf: 100 type: integer required: @@ -342,10 +349,12 @@ spec: multipleOf: 1000 type: integer targetCPUUtilization: - description: The CPU utilization which the autoscaling will try - to achieve + description: 'The CPU utilization which the autoscaling will try + to achieve. Ref: [Spanner CPU utilization](https://cloud.google.com/spanner/docs/cpu-utilization#task-priority)' properties: highPriority: + description: 'Desired CPU utilization for ''High Priority'' + CPU consumption category. Ref: [Spanner CPU utilization](https://cloud.google.com/spanner/docs/cpu-utilization#task-priority)' exclusiveMaximum: true exclusiveMinimum: true maximum: 100 @@ -371,7 +380,6 @@ spec: - projectId type: object required: - - authentication - scaleConfig - targetInstance type: object @@ -386,14 +394,22 @@ spec: description: Current number of processing-units in the Spanner instance type: integer currentlyActiveSchedules: + description: List of all the schedules which are currently active + and will be used in calculating compute capacity items: + description: A `SpannerAutoscaleSchedule` which is currently active + and will be used for calculating the autoscaling range. properties: additionalPU: + description: The extra compute capacity which will be added + because of this schedule type: integer endTime: + description: The time until when this schedule will remain active format: date-time type: string name: + description: Name of the `SpannerAutoscaleSchedule` type: string required: - additionalPU @@ -413,19 +429,22 @@ spec: description: Desired number of processing-units in the Spanner instance type: integer instanceState: + description: State of the Cloud Spanner instance type: string lastScaleTime: - description: Last time the SpannerAutoscaler scaled the number of - Spanner nodes Used by the autoscaler to control how often the number + description: Last time the `SpannerAutoscaler` scaled the number of + Spanner nodes. Used by the autoscaler to control how often the number of nodes are changed format: date-time type: string lastSyncTime: - description: Last time the SpannerAutoscaler fetched and synced this - status + description: Last time the `SpannerAutoscaler` fetched and synced + the metrics from Spanner format: date-time type: string schedules: + description: List of schedules which are registered with this spanner-autoscaler + instance items: type: string type: array diff --git a/config/crd/bases/spanner.mercari.com_spannerautoscaleschedules.yaml b/config/crd/bases/spanner.mercari.com_spannerautoscaleschedules.yaml index 2b41bdb8..2de5c00f 100644 --- a/config/crd/bases/spanner.mercari.com_spannerautoscaleschedules.yaml +++ b/config/crd/bases/spanner.mercari.com_spannerautoscaleschedules.yaml @@ -23,6 +23,9 @@ spec: - jsonPath: .spec.schedule.duration name: Duration type: string + - jsonPath: .spec.additionalProcessingUnits + name: Additional PU + type: integer name: v1beta1 schema: openAPIV3Schema: @@ -46,18 +49,29 @@ spec: SpannerAutoscaleSchedule properties: additionalProcessingUnits: + description: The extra compute capacity which will be added when this + schedule is active type: integer schedule: + description: The details of when and for how long this schedule will + be active properties: cron: + description: 'The recurring frequency of the schedule in [standard + cron](https://en.wikipedia.org/wiki/Cron) format. Examples and + verification utility: https://crontab.guru' type: string duration: + description: The length of time for which this schedule will remain + active each time the cron is triggered. type: string required: - cron - duration type: object targetResource: + description: The `SpannerAutoscaler` resource name with which this + schedule will be registered type: string required: - additionalProcessingUnits diff --git a/docs/config/config.yaml b/docs/config/config.yaml new file mode 100644 index 00000000..19ff86a2 --- /dev/null +++ b/docs/config/config.yaml @@ -0,0 +1,10 @@ +## config file for `crd-ref-docs`: https://github.com/elastic/crd-ref-docs#configuration + +processor: + ignoreTypes: + - "(SpannerAutoscaler|SpannerAutoscaleSchedule)List$" + ignoreFields: + - "TypeMeta$" + +render: + kubernetesVersion: 1.22 diff --git a/docs/crd-reference.md b/docs/crd-reference.md new file mode 100644 index 00000000..d326f0ca --- /dev/null +++ b/docs/crd-reference.md @@ -0,0 +1,262 @@ +# API Reference + +## Packages +- [spanner.mercari.com/v1beta1](#spannermercaricomv1beta1) + + +## spanner.mercari.com/v1beta1 + +Package v1beta1 contains API Schema definitions for the spanner v1beta1 API group + +### Resource Types +- [SpannerAutoscaleSchedule](#spannerautoscaleschedule) +- [SpannerAutoscaler](#spannerautoscaler) + + + +#### ActiveSchedule + + + +A `SpannerAutoscaleSchedule` which is currently active and will be used for calculating the autoscaling range. + +_Appears in:_ +- [SpannerAutoscalerStatus](#spannerautoscalerstatus) + +| Field | Description | +| --- | --- | +| `name` _string_ | Name of the `SpannerAutoscaleSchedule` | +| `endTime` _[Time](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#time-v1-meta)_ | The time until when this schedule will remain active | +| `additionalPU` _integer_ | The extra compute capacity which will be added because of this schedule | + + +#### Authentication + + + +Authentication details for the Spanner instance + +_Appears in:_ +- [SpannerAutoscalerSpec](#spannerautoscalerspec) + +| Field | Description | +| --- | --- | +| `type` _AuthType_ | Authentication method to be used for GCP authentication. If `ImpersonateConfig` as well as `IAMKeySecret` is nil, this will be set to use ADC be default. | +| `impersonateConfig` _[ImpersonateConfig](#impersonateconfig)_ | Details of the GCP service account which will be impersonated, for authentication to GCP. This can used only on GKE clusters, when workload identity is enabled. Ref: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity This is a pointer because structs with string slices can not be compared for zero values | +| `iamKeySecret` _[IAMKeySecret](#iamkeysecret)_ | Details of the k8s secret which contains the GCP service account authentication key (in JSON). Ref: https://cloud.google.com/kubernetes-engine/docs/tutorials/authenticating-to-cloud-platform This is a pointer because structs with string slices can not be compared for zero values | + + +#### IAMKeySecret + + + +Details of the secret which has the GCP service account key for authentication + +_Appears in:_ +- [Authentication](#authentication) + +| Field | Description | +| --- | --- | +| `name` _string_ | | +| `namespace` _string_ | | +| `key` _string_ | | + + +#### ImpersonateConfig + + + +Details of the impersonation service account for GCP authentication + +_Appears in:_ +- [Authentication](#authentication) + +| Field | Description | +| --- | --- | +| `targetServiceAccount` _string_ | | +| `delegates` _string array_ | | + + +#### ScaleConfig + + + +Details of the autoscaling parameters for the Spanner instance + +_Appears in:_ +- [SpannerAutoscalerSpec](#spannerautoscalerspec) + +| Field | Description | +| --- | --- | +| `computeType` _ComputeType_ | Whether to use `nodes` or `processing-units` for scaling. This is only used at the time of CustomResource creation. If compute capacity is provided in `nodes`, then it is automatically converted to `processing-units` at the time of resource creation, and internally, only `ProcessingUnits` are used for computations and scaling. | +| `nodes` _[ScaleConfigNodes](#scaleconfignodes)_ | If `nodes` are provided at the time of resource creation, then they are automatically converted to `processing-units`. So it is recommended to use only the processing units. Ref: [Spanner Compute Capacity](https://cloud.google.com/spanner/docs/compute-capacity#compute_capacity) | +| `processingUnits` _[ScaleConfigPUs](#scaleconfigpus)_ | ProcessingUnits for scaling of the Spanner instance. Ref: [Spanner Compute Capacity](https://cloud.google.com/spanner/docs/compute-capacity#compute_capacity) | +| `scaledownStepSize` _integer_ | The maximum number of processing units which can be deleted in one scale-down operation | +| `targetCPUUtilization` _[TargetCPUUtilization](#targetcpuutilization)_ | The CPU utilization which the autoscaling will try to achieve. Ref: [Spanner CPU utilization](https://cloud.google.com/spanner/docs/cpu-utilization#task-priority) | + + +#### ScaleConfigNodes + + + +Compute capacity in terms of Nodes + +_Appears in:_ +- [ScaleConfig](#scaleconfig) + +| Field | Description | +| --- | --- | +| `min` _integer_ | Minimum number of Nodes for the autoscaling range | +| `max` _integer_ | Maximum number of Nodes for the autoscaling range | + + +#### ScaleConfigPUs + + + +Compute capacity in terms of Processing Units + +_Appears in:_ +- [ScaleConfig](#scaleconfig) + +| Field | Description | +| --- | --- | +| `min` _integer_ | Minimum number of Processing Units for the autoscaling range | +| `max` _integer_ | Maximum number of Processing Units for the autoscaling range | + + +#### Schedule + + + +The recurring frequency and the length of time for which a schedule will remain active + +_Appears in:_ +- [SpannerAutoscaleScheduleSpec](#spannerautoscaleschedulespec) + +| Field | Description | +| --- | --- | +| `cron` _string_ | The recurring frequency of the schedule in [standard cron](https://en.wikipedia.org/wiki/Cron) format. Examples and verification utility: https://crontab.guru | +| `duration` _string_ | The length of time for which this schedule will remain active each time the cron is triggered. | + + +#### SpannerAutoscaleSchedule + + + +SpannerAutoscaleSchedule is the Schema for the spannerautoscaleschedules API + + + +| Field | Description | +| --- | --- | +| `apiVersion` _string_ | `spanner.mercari.com/v1beta1` +| `kind` _string_ | `SpannerAutoscaleSchedule` +| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | +| `spec` _[SpannerAutoscaleScheduleSpec](#spannerautoscaleschedulespec)_ | | +| `status` _[SpannerAutoscaleScheduleStatus](#spannerautoscaleschedulestatus)_ | | + + +#### SpannerAutoscaleScheduleSpec + + + +SpannerAutoscaleScheduleSpec defines the desired state of SpannerAutoscaleSchedule + +_Appears in:_ +- [SpannerAutoscaleSchedule](#spannerautoscaleschedule) + +| Field | Description | +| --- | --- | +| `targetResource` _string_ | The `SpannerAutoscaler` resource name with which this schedule will be registered | +| `additionalProcessingUnits` _integer_ | The extra compute capacity which will be added when this schedule is active | +| `schedule` _[Schedule](#schedule)_ | The details of when and for how long this schedule will be active | + + + + +#### SpannerAutoscaler + + + +SpannerAutoscaler is the Schema for the spannerautoscalers API + + + +| Field | Description | +| --- | --- | +| `apiVersion` _string_ | `spanner.mercari.com/v1beta1` +| `kind` _string_ | `SpannerAutoscaler` +| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | +| `spec` _[SpannerAutoscalerSpec](#spannerautoscalerspec)_ | | +| `status` _[SpannerAutoscalerStatus](#spannerautoscalerstatus)_ | | + + +#### SpannerAutoscalerSpec + + + +SpannerAutoscalerSpec defines the desired state of SpannerAutoscaler + +_Appears in:_ +- [SpannerAutoscaler](#spannerautoscaler) + +| Field | Description | +| --- | --- | +| `targetInstance` _[TargetInstance](#targetinstance)_ | The Spanner instance which will be managed for autoscaling | +| `authentication` _[Authentication](#authentication)_ | Authentication details for the Spanner instance | +| `scaleConfig` _[ScaleConfig](#scaleconfig)_ | Details of the autoscaling parameters for the Spanner instance | + + +#### SpannerAutoscalerStatus + + + +SpannerAutoscalerStatus defines the observed state of SpannerAutoscaler + +_Appears in:_ +- [SpannerAutoscaler](#spannerautoscaler) + +| Field | Description | +| --- | --- | +| `schedules` _string array_ | List of schedules which are registered with this spanner-autoscaler instance | +| `currentlyActiveSchedules` _[ActiveSchedule](#activeschedule) array_ | List of all the schedules which are currently active and will be used in calculating compute capacity | +| `lastScaleTime` _[Time](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#time-v1-meta)_ | Last time the `SpannerAutoscaler` scaled the number of Spanner nodes. Used by the autoscaler to control how often the number of nodes are changed | +| `lastSyncTime` _[Time](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#time-v1-meta)_ | Last time the `SpannerAutoscaler` fetched and synced the metrics from Spanner | +| `currentProcessingUnits` _integer_ | Current number of processing-units in the Spanner instance | +| `desiredProcessingUnits` _integer_ | Desired number of processing-units in the Spanner instance | +| `desiredMinPUs` _integer_ | Minimum number of processing units based on the currently active schedules | +| `desiredMaxPUs` _integer_ | Maximum number of processing units based on the currently active schedules | +| `instanceState` _InstanceState_ | State of the Cloud Spanner instance | +| `currentHighPriorityCPUUtilization` _integer_ | Current average CPU utilization for high priority task, represented as a percentage | + + +#### TargetCPUUtilization + + + + + +_Appears in:_ +- [ScaleConfig](#scaleconfig) + +| Field | Description | +| --- | --- | +| `highPriority` _integer_ | Desired CPU utilization for 'High Priority' CPU consumption category. Ref: [Spanner CPU utilization](https://cloud.google.com/spanner/docs/cpu-utilization#task-priority) | + + +#### TargetInstance + + + +The Spanner instance which will be managed for autoscaling + +_Appears in:_ +- [SpannerAutoscalerSpec](#spannerautoscalerspec) + +| Field | Description | +| --- | --- | +| `projectId` _string_ | The GCP Project id of the Spanner instance | +| `instanceId` _string_ | The instance id of the Spanner instance | + + diff --git a/docs/webhook_development.md b/docs/development.md similarity index 58% rename from docs/webhook_development.md rename to docs/development.md index 75399b71..0cbd8d80 100644 --- a/docs/webhook_development.md +++ b/docs/development.md @@ -1,29 +1,26 @@ -# Webhook development +# Development (when webhooks are enabled) -This doc explains how to setup a local environment for development and testing of conversion, validation or mutation webhooks. This approach can also be used for development and testing of the k8s controller too. +This doc explains how to setup a local environment for development and testing of the spanner-autoscaler and the spanner-autoscale-schedule controllers. Since there are conversion, validation and mutation webhooks enabled for some CRDs, we need to perform some additional steps to ensure that k8s can communicate with our webhook servers correctly. The default approach is to: -1. make any desired chagnes to the webhook code in `api//_webhook.go` +1. make any desired chagnes to the code 1. build and deploy the docker image and the CRDs: ```console $ make kind-cluster-reset - $ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.6.1/cert-manager.yaml $ export IMG=mercari/spanner-autoscaler:local - $ make docker-build - $ kind load docker-image --name spanner-autoscaler $IMG + $ make docker-build kind-load-docker-image $ make deploy $ kubectl apply -f config/samples ``` -If there are any errors in the conversion or the validation, then the `kubectl apply` command above, will fail with the corresponding errors. +If there are any errors in the conversion or the validation of the sample CRs, then the `kubectl apply` command above, will fail with the corresponding errors. -While this approach is useful, it is very time consuming for testing minor changes in real time during development. Thus, during development, it is preferable to run the controller and the webhooks (since they are all part of the same binary) locally, and forward any requests for these components from the k8s cluster to the locally running server. +While this approach is useful, it is very time consuming for testing minor changes in real time during development (because `make docker-build` step takes a very long time sometimes). Thus, during development, it is preferable to run the controller and the webhooks locally (since they are all part of the same binary), and forward any requests for these components from the k8s cluster to the locally running server. This can be achieved in the following way: -1. Modify the `/etc/hosts` to add a DNS record for your LAN IP (by adding `192.168. dummy.local` to the `/etc/hosts` file) +1. Modify your `/etc/hosts` to add a DNS record for your LAN IP (by adding `192.168. dummy.local` to the `/etc/hosts` file) 1. Deploy the CRDs and the other resources to the kind cluster: ```console $ make kind-cluster-reset - $ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.6.1/cert-manager.yaml $ make deploy-dev ``` 1. Save the TLS certificates which the local server can use: @@ -48,6 +45,9 @@ This can be achieved in the following way: + CertDir: "./bin/dummytls", }) ``` -1. Continue with development and testing by running the local server with `make run` command. To test any new changes, make the desired changes, stop the controller with `Ctrl-C` and the run `make run` again. +1. Continue with development and testing by running the local server with `make run` command. To test any new changes, make the desired changes, stop the controller with `Ctrl-C` and then run `make run` again. This will deploy the CRDs and other resources to the cluster, but will forward any controller or webhook related requests from k8s cluster to our locally running controller. + +### Things to take care before sending a PR +- Run `make manifests docs` if you made any changes to any of the CRD defintions (related files: `api/*/*_types.go`)