Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs for v0.10.5 #91

Merged
merged 4 commits into from
Dec 22, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions content/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,21 @@

## v0.10.x

### v0.10.5

#### Fix

- `TemplateGroupInstance` controller now correctly updates the its status and the namespace count upon the deletion of a namespace.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LanguageTool] reported by reviewdog 🐶
A determiner cannot be combined with a possessive pronoun. Did you mean simply “the” or “its”? (A_MY[136])
Suggestions: the, its
Rule: https://community.languagetool.org/rule/show/A_MY?lang=en-US&subId=136
Category: COLLOCATIONS

- Conflict between `TemplateGroupInstance` controller and `kube-contoller-manager` over mentioning of secret names in `secrets` or `imagePullSecrets` field in `ServiceAccounts` has been fixed by temporarily ignoring updates to or from `ServiceAccounts`.

#### Enhanced

- Privileged service accounts mentioned in the `IntegrationConfig` have now access over all types of namespaces. Previously operations were denied on orphaned namespaces (the namespaces which are not part of both privileged and tenant scope). More info in [FAQs](./faq.md)
- `TemplateGroupInstance` controller now ensures that its underlying resources are force-synced when a namespace is created or deleted.
- Optimizations were made to ensure the reconciler in the TGI controller runs only once per watch event, reducing reconcile times.
- The `TemplateGroupInstance` reconcile flow has been refined to process only the namespace for which the event was received, streamlining resource creation/deletion and improving overall efficiency.
- Introduced new metrics to enhance the monitoring capabilities of the operator. Details at [TGI Metrics Explanation](./explanation/logs-metrics.md)

### v0.10.0

#### Feature
Expand Down
77 changes: 77 additions & 0 deletions content/explanation/logs-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Metrics and Logs Documentation

This document offers an overview of the Prometheus metrics implemented by the `multi_tenant_operator` controllers, along with an interpretation guide for the logs and statuses generated by these controllers. Each metric is designed to provide specific insights into the controllers' operational performance, while the log interpretation guide aids in understanding their behavior and workflow processes. Additionally, the status descriptions for custom resources provide operational snapshots. Together, these elements form a comprehensive toolkit for monitoring and enhancing the performance and health of the controllers.

## Metrics List

**`multi_tenant_operator_resources_deployed_total`**

- **Description**: Tracks the total number of resources deployed by the operator.
- **Type**: Gauge
- **Labels**: `kind`, `name`, `namespace`
- **Usage**: Helps to understand the overall workload managed by the operator.

**`multi_tenant_operator_resources_deployed`**

- **Description**: Monitors resources currently deployed by the operator.
- **Type**: Gauge
- **Labels**: `kind`, `name`, `namespace`, `type`
- **Usage**: Useful for tracking the current state and type of resources managed by the operator.

**`multi_tenant_operator_reconcile_error`**

- **Description**: Indicates resources in an error state, broken down by resource kind, name, and namespace.
- **Type**: Gauge
- **Labels**: `kind`, `name`, `namespace`, `state`, `errors`
- **Usage**: Essential for identifying and analyzing errors in resource management.

**`multi_tenant_operator_reconcile_count`**

- **Description**: Counts the number of reconciliations performed for a template group instance, categorized by name.
- **Type**: Gauge
- **Labels**: `kind`, `name`
- **Usage**: Provides insight into the frequency of reconciliation processes.

**`multi_tenant_operator_reconcile_seconds`**

- **Description**: Represents the cumulative duration, in seconds, taken to reconcile a template group instance, categorized by instance name.
- **Type**: Gauge
- **Labels**: `kind`, `name`
- **Usage**: Critical for assessing the time efficiency of the reconciliation process.

**`multi_tenant_operator_reconcile_seconds_total`**

- **Description**: Tracks the total duration, in seconds, for all reconciliation processes of a template group instance, categorized by instance name.
- **Type**: Gauge
- **Labels**: `kind`, `name`
- **Usage**: Useful for understanding the overall time spent on reconciliation processes.

## Custom Resource Status

In this section, we delve into the status of various custom resources managed by our controllers. The `kubectl describe` command can be used to fetch the status of these resources.

### Template Group Instance

Status from the `templategroupinstances.tenantoperator.stakater.com` custom resource:

- **Current Operational State**: Provides a snapshot of the resource's current condition.
- **Conditions**: Offers a detailed view of the resource's status, which includes:
- `InstallSucceeded`: Indicates the success of the instance's installation.
- `Ready`: Shows the readiness of the instance, with details on the last reconciliation process, its duration, and relevant messages.
- `Running`: Reports on active processes like ongoing resource reconciliation.
- **Deployed Namespaces**: Enumerates the namespaces where the instance has been deployed, along with their statuses and associated template manifests.
- **Manifest Hashes**: Includes the `Template Manifests Hash` and `Resource Mapping Hash`, which provide versioning and change tracking for template manifests and resource mappings.

## Log Interpretation Guide

### Template Group Instance Controller

Logs from the `tenant-operator-templategroupinstance-controller`:

- **Reconciliation Process**: Logs starting with `Reconciling!` mark the beginning of a reconciliation process for a TemplateGroupInstance. Subsequent actions like `Creating/Updating TemplateGroupInstance` and `Retrieving list of namespaces Matching to TGI` outline the reconciliation steps.
- **Namespace and Resource Management**: Logs such as `Namespaces test-namespace-1 is new or failed...` and `Creating/Updating resource...` detail the management of Kubernetes resources in specific namespaces.
- **Worker Activities**: Logs labeled `[Worker X]` show tasks being processed in parallel, including steps like `Validating parameters`, `Gathering objects from manifest`, and `Apply manifests`.
- **Reconciliation Completion**: Entries like `End Reconciling` and `Defering XXth Reconciling, with duration XXXms` indicate the end of a reconciliation process and its duration, aiding in performance analysis.
- **Watcher Events**: Logs from `Watcher` such as `Delete call received for object...` and `Following resource is recreated...` are key for tracking changes to Kubernetes objects.

These logs are crucial for tracking the system's behavior, diagnosing issues, and comprehending the resource management workflow.
10 changes: 10 additions & 0 deletions content/faq.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# FAQs

## Pod Creation Error

### Q. Errors in ReplicaSet Events about pods not being able to schedule on Openshift because scc annotation is not found

Check failure on line 5 in content/faq.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/faq.md#L5

[Vale.Terms] Use 'OpenShift' instead of 'Openshift'.
Raw output
{"message": "[Vale.Terms] Use 'OpenShift' instead of 'Openshift'.", "location": {"path": "content/faq.md", "range": {"start": {"line": 5, "column": 77}}}, "severity": "ERROR"}

```terminal
unable to find annotation openshift.io/sa.scc.uid-range
```

**Answer.** Openshift recently updated its process of handling SCC and it's now managed by annotations like `openshift.io/sa.scc.uid-range` on the namespaces. Absense of them wont let pods schedule. The fix for the above error is to make sure ServiceAccount `system:serviceaccount:openshift-infra.` regex is always mentioned in `PrivilegedServiceAccounts` section of `IntegrationConfig`. This regex will allow operations from all `ServiceAccounts` present in `openshift-infra` namespace. More info at [Privileged Service Accounts](./integration-config.md#privileged-serviceaccounts)

Check failure on line 11 in content/faq.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/faq.md#L11

[Vale.Terms] Use 'OpenShift' instead of 'Openshift'.
Raw output
{"message": "[Vale.Terms] Use 'OpenShift' instead of 'Openshift'.", "location": {"path": "content/faq.md", "range": {"start": {"line": 11, "column": 13}}}, "severity": "ERROR"}

Check failure on line 11 in content/faq.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/faq.md#L11

[Vale.Spelling] Did you really mean 'Absense'?
Raw output
{"message": "[Vale.Spelling] Did you really mean 'Absense'?", "location": {"path": "content/faq.md", "range": {"start": {"line": 11, "column": 160}}}, "severity": "ERROR"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LanguageTool] reported by reviewdog 🐶
Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short). (COMMA_COMPOUND_SENTENCE[1])
Suggestions: , and
URL: https://languagetool.org/insights/post/types-of-sentences/#compound-sentence
Rule: https://community.languagetool.org/rule/show/COMMA_COMPOUND_SENTENCE?lang=en-US&subId=1
Category: PUNCTUATION


## Namespace Admission Webhook

### Q. Error received while performing Create, Update or Delete action on Namespace
Expand Down
9 changes: 6 additions & 3 deletions content/how-to-guides/integration-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -262,12 +262,15 @@ users:
For example:

- To ignore the `default` namespace, we can specify `^default$`
- To ignore all namespaces starting with the `openshift-` prefix, we can specify `^openshift-*`.
- To ignore any namespace containing `stakater` in its name, we can specify `stakater`. (A constant word given as a regex pattern will match any namespace containing that word.)
- To ignore all namespaces starting with the `openshift-` prefix, we can specify `^openshift-.*`.
- To ignore any namespace containing `stakater` in its name, we can specify `^stakater.`. (A constant word given as a regex pattern will match any namespace containing that word.)

### Privileged ServiceAccounts

`privilegedServiceAccounts:` Contains the list of `ServiceAccounts` ignored by MTO. MTO will not manage the `ServiceAccounts` in this list. Values in this list are regex patterns. For example, to ignore all `ServiceAccounts` starting with the `system:serviceaccount:openshift-` prefix, we can use `^system:serviceaccount:openshift-*`; and to ignore the `system:serviceaccount:builder` service account we can use `^system:serviceaccount:builder$.`
`privilegedServiceAccounts:` Contains the list of `ServiceAccounts` ignored by MTO. MTO will not manage the `ServiceAccounts` in this list. Values in this list are regex patterns. For example, to ignore all `ServiceAccounts` starting with the `system:serviceaccount:openshift-` prefix, we can use `^system:serviceaccount:openshift-.*`; and to ignore a specific service account like `system:serviceaccount:builder` service account we can use `^system:serviceaccount:builder$.`

!!! note
`stakater`, `stakater.` and `stakater.*` will have the same effect. To check out the combinations, go to [Regex101](https://regex101.com/), select Golang, and type your expected regex and test string.

### Namespace Access Policy

Expand Down
9 changes: 6 additions & 3 deletions content/integration-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -262,12 +262,15 @@ users:
For example:

- To ignore the `default` namespace, we can specify `^default$`
- To ignore all namespaces starting with the `openshift-` prefix, we can specify `^openshift-*`.
- To ignore any namespace containing `stakater` in its name, we can specify `stakater`. (A constant word given as a regex pattern will match any namespace containing that word.)
- To ignore all namespaces starting with the `openshift-` prefix, we can specify `^openshift-.*`.
- To ignore any namespace containing `stakater` in its name, we can specify `^stakater.`. (A constant word given as a regex pattern will match any namespace containing that word.)

### Privileged ServiceAccounts

`privilegedServiceAccounts:` Contains the list of `ServiceAccounts` ignored by MTO. MTO will not manage the `ServiceAccounts` in this list. Values in this list are regex patterns. For example, to ignore all `ServiceAccounts` starting with the `system:serviceaccount:openshift-` prefix, we can use `^system:serviceaccount:openshift-*`; and to ignore the `system:serviceaccount:builder` service account we can use `^system:serviceaccount:builder$.`
`privilegedServiceAccounts:` Contains the list of `ServiceAccounts` ignored by MTO. MTO will not manage the `ServiceAccounts` in this list. Values in this list are regex patterns. For example, to ignore all `ServiceAccounts` starting with the `system:serviceaccount:openshift-` prefix, we can use `^system:serviceaccount:openshift-.*`; and to ignore a specific service account like `system:serviceaccount:builder` service account we can use `^system:serviceaccount:builder$.`

!!! note
`stakater`, `stakater.` and `stakater.*` will have the same effect. To check out the combinations, go to [Regex101](https://regex101.com/), select Golang, and type your expected regex and test string.

### Namespace Access Policy

Expand Down
3 changes: 1 addition & 2 deletions content/reference-guides/resource-sync-by-tgi.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,7 @@ As we can see, in our TGI, we have a field `spec.sync` which is set to `true`. T
- If, for any reason, the underlying resource gets updated or deleted, `TemplateGroupInstance` CR will try to revert it back to the state mentioned in the `Template` CR.

!!! note
If the updated field of the deployed manifest is not mentioned in the Template, it will not get reverted.
For example, if `secrets` field is not mentioned in ServiceAcoount in the above Template, it will not get reverted if changed
Updates to ServiceAccounts are ignored by both, reconciler and informers, in an attempt to avoid conflict between the TGI controller and Kube Controller Manager. ServiceAccounts are only reverted back in case of unexpected deletions when sync is true.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LanguageTool] reported by reviewdog 🐶
Consider using just “reverted”. (RETURN_BACK[1])
Suggestions: reverted
Rule: https://community.languagetool.org/rule/show/RETURN_BACK?lang=en-US&subId=1
Category: REDUNDANCY


## Ignore Resources Updates on Resources

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ nav:
- explanation/console.md
- explanation/auth.md
- explanation/why-argocd-multi-tenancy.md
- explanation/logs-metrics.md
- faq.md
- changelog.md
- eula.md
Expand Down
Loading