General

GKE-ENKI-GitLab-agent

GKE-ENKI-GitLab-agent is a project to manage and configure GitLab Kubernetes Agent for the ENKI Google Cloud cluster and to configure all required cluster tools for production, monitoring, and backup.

promptBeginner5 min to valuemarkdown

0 views

Jan 14, 2026

Prompt Playground

1 Variables

Fill Variables

CLUSTERNAME>

Preview

# GK[CLUSTERNAME>]-[CLUSTERNAME>][CLUSTERNAME>]KI-Git[CLUSTERNAME>]ab-agent

GK[CLUSTERNAME>]-[CLUSTERNAME>][CLUSTERNAME>]KI-Git[CLUSTERNAME>]ab-agent is a project to manage and configure Git[CLUSTERNAME>]ab Kubernetes [CLUSTERNAME>]gent for the [CLUSTERNAME>][CLUSTERNAME>]KI Google [CLUSTERNAME>]loud cluster and to configure all required cluster tools for production, monitoring, and backup.


## [CLUSTERNAME>]ontents
▪️ [Future roadmap](#future-roadmap)  
▪️ [Installed cluster components (dependent order)](#installed-cluster-components-dependent-order)  
▪️ [[CLUSTERNAME>]I/[CLUSTERNAME>]D for automated deployment and maintenance](#cicd-for-automated-deployment-and-maintenance)  
▪️ [[CLUSTERNAME>]ome useful kubectl commands](#some-useful-kubectl-commands)  
▪️ [Kubernetes cluster configuration](#kubernetes-cluster-configuration)  
▪️ [Git[CLUSTERNAME>]ab Kubernetes agent installation](#gitlab-kubernetes-agent-installation)  
▪️ [[CLUSTERNAME>]earing down and reinstalling the agent](#tearing-down-and-reinstalling-the-agent)  


## Future roadmap 
- Fully integrate Git[CLUSTERNAME>]ab Kubernetes [CLUSTERNAME>]gent for GitOps as an alternative to using Git[CLUSTERNAME>]ab [CLUSTERNAME>]unner and Helm.  
  [CLUSTERNAME>] [CLUSTERNAME>]he agent has to mature to handle sequenced Y[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] deploys, and the agent must operate with clusterwide admin privileges to make this integration possible.
- [CLUSTERNAME>]onsider adding the following:
    - [CLUSTERNAME>]ser billing and tracking (using Kubecost)
    - [CLUSTERNAME>]unbooks for Jupyter[CLUSTERNAME>]ab (notebook-based) GitOps using [CLUSTERNAME>]ubix/[CLUSTERNAME>]urtch 
    - [CLUSTERNAME>]loudwatch integration
    - [CLUSTERNAME>]lastic [CLUSTERNAME>]ontainer [CLUSTERNAME>]ervice
    
- Investigate the Google [CLUSTERNAME>]loud [CLUSTERNAME>]un serverless platform.
  [CLUSTERNAME>] Port knative Geobarometer and [CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] web services to remove any dependence on the Kubernetes cluster. 

## Installed cluster components (dependent order)

1. **Git[CLUSTERNAME>]ab Kubernetes [CLUSTERNAME>]gent**
    [CLUSTERNAME>] [CLUSTERNAME>]ntity that attaches a GK[CLUSTERNAME>] cluster to this repository (configuration notes below).
1. **Git[CLUSTERNAME>]ab [CLUSTERNAME>]unner**
    [CLUSTERNAME>] Gitlab [CLUSTERNAME>]unner allows [CLUSTERNAME>]I jobs to run on the cluster in privileged mode, which allows us to execute *kubectl* and *helm* commands to perform GitOps tasks using Y[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] files stored in this repository.  Basically, the runner gives us the functionality of Google [CLUSTERNAME>]loud [CLUSTERNAME>]hell or a desktop connection of gcloud/kubectl using Git[CLUSTERNAME>]ab [CLUSTERNAME>]I. 
1. **Kubernetes [CLUSTERNAME>]GI[CLUSTERNAME>]X Ingress [CLUSTERNAME>]ontroller**
    [CLUSTERNAME>] [CLUSTERNAME>]he ingress controller is utilized to expose endpoints of services to external ports.  [CLUSTERNAME>]here are multiple ingress controllers operating on the cluster. [CLUSTERNAME>]his one is used to expose Grafana and Kasten K10 endpoints. [CLUSTERNAME>]nother is built into JupyterHub to expose that endpoint.
1. **[CLUSTERNAME>]ert [CLUSTERNAME>]anager**
    [CLUSTERNAME>] [CLUSTERNAME>]sed by the ingress controller to acquire and attach [CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] certificates to ingress external endpoints so that ports can support *https* and encrypted traffic.
1. **Prometheus** and **Grafana** (exposed at https://cluster.enki-portal.org/)
    [CLUSTERNAME>] [CLUSTERNAME>]he Kube Prometheus stack (with Grafana) monitors the cluster and exposes metrics at an external endpoint so that cluster performance can be assessed.
1. **Google [CLUSTERNAME>]loud [CLUSTERNAME>]torage**
    [CLUSTERNAME>] [CLUSTERNAME>]torage independent of the Kubernetes cluster that is utilized for backups of cluster resources.  [CLUSTERNAME>]he backup service (Kasten K10) is capable of restoring and migrating the cluster using this independent storage.
1. **Kasten K10** (exposed at https://k10.enki-portal.org/k10/)
    [CLUSTERNAME>] Backup, restoration, and migration tool for Kubernetes
1. **JupyterHub**
    [CLUSTERNAME>] [CLUSTERNAME>]ervice that hosts the [CLUSTERNAME>][CLUSTERNAME>]KI server. JupyterHub exposes single-user pods that host the [CLUSTERNAME>]hermo[CLUSTERNAME>]ngine Docker container image with a Jupyter[CLUSTERNAME>]ab user interface. It also allocates and maintains access to user-based persistent storage.
    1. [CLUSTERNAME>]esting installation
        [CLUSTERNAME>] [CLUSTERNAME>]his installation is for testing options and configuring possible upgrades to the production server. For cost reasons, it is normally not running.
    1. Production installation
        [CLUSTERNAME>] [CLUSTERNAME>]his installation is the production server exposed at https://server.enki-portal.org/ .
1. **Knative** web services
    [CLUSTERNAME>] [CLUSTERNAME>]ervice to expose stateless, scalable web services.  [CLUSTERNAME>]hese services should probably be moved outside the cluster and exposed using the Google [CLUSTERNAME>]loud [CLUSTERNAME>]un serverless platform. [CLUSTERNAME>]ee *Future [CLUSTERNAME>]oad[CLUSTERNAME>]ap* above.
1. **[CLUSTERNAME>]y[CLUSTERNAME>]Q[CLUSTERNAME>]** (exposed as http://mysql.enki-portal.org:3306/ )
    [CLUSTERNAME>] Database server that currently holds the [CLUSTERNAME>][CLUSTERNAME>]P[CLUSTERNAME>]/[CLUSTERNAME>]raceDs as well as some smaller databases ([CLUSTERNAME>]tixrude, Berman, Inforex, etc.) that are used by cluster apps.

## [CLUSTERNAME>]I/[CLUSTERNAME>]D for automated deployment and maintenance
[CLUSTERNAME>]he *.gitlab-ci.yml* Y[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] file performs a number of functions:
- Deploys manifests using Git[CLUSTERNAME>]ab Kubernetes [CLUSTERNAME>]gent to perform GitOps tasks
- [CLUSTERNAME>]uns *helm* and *kubectl* jobs on the cluster to perform GitOps tasks
- Functions as the downstream pipeline for related projects that generate content related to the cluster ([CLUSTERNAME>]ee the Git[CLUSTERNAME>]ab project https://gitlab.com/[CLUSTERNAME>][CLUSTERNAME>]KI-portal/jupyterhub_custom)

## [CLUSTERNAME>]ome useful kubectl commands
- [CLUSTERNAME>]ommands for managing namespaces and their resources:
    ```
    kubectl create ns gitlab-runner
    kubectl delete all --all -n {namespace}
    ``` 
- Get Git[CLUSTERNAME>]ab usernames associated with persistent storage volumes:
    ```
    kubectl --namespace jhub describe persistentvolumeclaims | grep "hub.jupyter.org/username"
    ```
- [CLUSTERNAME>]estart hub on cluster using Google [CLUSTERNAME>]loud [CLUSTERNAME>]hell in order to update [CLUSTERNAME>][CLUSTERNAME>]KI-portal/jupyterhub_custom to amend login page:
    ```
    helm upgrade --cleanup-on-fail jhub jupyterhub/jupyterhub --version=1.1.3 --namespace jhub --reuse-values
    ```

## Kubernetes cluster configuration 
[CLUSTERNAME>]he following Google [CLUSTERNAME>]loud setup instructions are from the **Zero to JupyterHub** document https://zero-to-jupyterhub.readthedocs.io/en/latest/kubernetes/google/step-zero-gcp.html, as found in October 2021.
1. [CLUSTERNAME>]sing Google [CLUSTERNAME>]loud [CLUSTERNAME>]hell, install **kubectl** and **helm** using **gcloud** after enabling the Kubernetes [CLUSTERNAME>]ngine [CLUSTERNAME>]PI.
1. [CLUSTERNAME>]reate a managed kubernetes cluster with a default node pool:
    ```
    gcloud container clusters create \
      --machine-type n1-standard-2 \
      --enable-autoscaling \
      --max-nodes=6 \
      --min-nodes=2 \
      --zone <compute zone from the list linked below[CLUSTERNAME>] \
      --cluster-version latest \
      <[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]
    ```
    - *\<[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]\[CLUSTERNAME>]* is **enkiserver**
    - *\<compute zone from the list linked below\[CLUSTERNAME>]* is **us-west1-a**
1. [CLUSTERNAME>]levate the user Google [CLUSTERNAME>]loud account for administrative functions:
    ```
    kubectl create clusterrolebinding cluster-admin-binding \
      --clusterrole=cluster-admin \
      --user=<GOOG[CLUSTERNAME>][CLUSTERNAME>]-[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]I[CLUSTERNAME>]-[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]O[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]
    ```
    - *\<GOOG[CLUSTERNAME>][CLUSTERNAME>]-[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]I[CLUSTERNAME>]-[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]O[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]\[CLUSTERNAME>]* is *email address* of Google [CLUSTERNAME>]loud account owner
1. [CLUSTERNAME>]reate a node pool for users:
    ```
    gcloud beta container node-pools create user-pool \
      --machine-type n1-standard-2 \
      --num-nodes 0 \
      --enable-autoscaling \
      --min-nodes 0 \
      --max-nodes 6 \
      --node-labels hub.jupyter.org/node-purpose=user \
      --node-taints hub.jupyter.org_dedicated=user:[CLUSTERNAME>]o[CLUSTERNAME>]chedule \
      --zone us-central1-b \
      --cluster <[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]
    ```
[CLUSTERNAME>]fter you complete these steps, two node pools are up and running. [CLUSTERNAME>]he default node pool is used to run cluster-wide apps, while the tainted user node pool is used to launch nodes for single-user Jupyter pods. [CLUSTERNAME>]ix nodes in the user pool should be able to accommodate about 100 users doing small-scale [CLUSTERNAME>][CLUSTERNAME>]KI-related modeling.

## Git[CLUSTERNAME>]ab Kubernetes [CLUSTERNAME>]gent installation
[CLUSTERNAME>]he following instructions are from the Git[CLUSTERNAME>]ab document https://docs.gitlab.com/ee/user/clusters/agent/#set-up-the-kubernetes-agent-server, as found in October 2021.
1. [CLUSTERNAME>]reate a config.yaml file in the repository at *.gitlab/agents/primary-agent* with the contents:
    ```
    gitops:
      manifest_projects:
      - id: "enki-portal/gke-enki-gitlab-agent"
        paths:
        - glob: 'generated-manifests/**/*.{yaml,yml,json}'
        inventory_policy: adopt_if_no_inventory
    ```
    - [CLUSTERNAME>]he *ID* is the repository name that contains the manifest files (this repository).
    - [CLUSTERNAME>]he *glob* is altered from the default suggestion to look only at Y[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] files in the folder and subfolders of *generated-manifests*.
    - [CLUSTERNAME>]he *inventory_policy* is changed from the default suggestion to allow the agent to inherit the management of applications that are already running on the cluster when their Y[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] manifests are added to the *generated-manifests* file hierarchy.      
    
    [CLUSTERNAME>]ultiple manifest projects can be defined; future plans will allow these to be *private* repositories.  
    
    [CLUSTERNAME>]urrently the agent repository must be public; future plans will allow the agent to be associated with a *group*.
1. [CLUSTERNAME>]reate the agent in Git[CLUSTERNAME>]ab (*Infrastructure [CLUSTERNAME>] Kubernetes clusters*) and generate a *secret token*. [CLUSTERNAME>]ssign this token to a pipeline environment variable (*[CLUSTERNAME>]ettings* [CLUSTERNAME>] *[CLUSTERNAME>]I/[CLUSTERNAME>]D* [CLUSTERNAME>] *Variables*) with the name *GI[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]B_[CLUSTERNAME>]G[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]_[CLUSTERNAME>]OK[CLUSTERNAME>][CLUSTERNAME>]*. [CLUSTERNAME>]ake sure that the value is *protected* and *masked* in order to keep it hidden in pipeline logs.
1. In Google [CLUSTERNAME>]loud [CLUSTERNAME>]hell, execute the following to create a namespace for the agent:
    ```
    kubectl create ns gitlab-kubernetes-agent
    ```
    [CLUSTERNAME>]hen install the agent, with the appropriate token value substituted for *$(GI[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]B_[CLUSTERNAME>]G[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]_[CLUSTERNAME>]OK[CLUSTERNAME>][CLUSTERNAME>])*:
    ```
    docker run --pull=always --rm \
        registry.gitlab.com/gitlab-org/cluster-integration/gitlab-agent/cli:stable generate \
        --agent-token=$(GI[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]B_[CLUSTERNAME>]G[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]_[CLUSTERNAME>]OK[CLUSTERNAME>][CLUSTERNAME>]) \
        --kas-address=wss://kas.gitlab.com \
        --agent-version stable \
        --namespace gitlab-kubernetes-agent | kubectl apply -f -
    ```
    
1. [CLUSTERNAME>]pgrade the Git[CLUSTERNAME>]ab agent service account to have a cluster-admin role (so that it can create *secrets*, *pods*, *config maps*, etc. in arbitrary cluster *namespaces*) by executing first in Google [CLUSTERNAME>]loud [CLUSTERNAME>]hell:
    ```
    kubectl get rolebindings,clusterrolebindings --all-namespaces  \
        -o custom-columns='KI[CLUSTERNAME>]D:kind,[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]P[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]:metadata.namespace,[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]:metadata.name,[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]VI[CLUSTERNAME>][CLUSTERNAME>]_[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]O[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]:subjects[?(@.kind=="[CLUSTERNAME>]ervice[CLUSTERNAME>]ccount")].name' | grep gitlab-agent
    ```
    [CLUSTERNAME>]ote that this critical step is missing from the Git[CLUSTERNAME>]ab documentation. [CLUSTERNAME>]he command gives the output:
    ```
    [CLUSTERNAME>]luster[CLUSTERNAME>]oleBinding   <none[CLUSTERNAME>]          cilium-alert-read                                      gitlab-agent
    [CLUSTERNAME>]luster[CLUSTERNAME>]oleBinding   <none[CLUSTERNAME>]          gitlab-agent-gitops-read-all                           gitlab-agent
    [CLUSTERNAME>]luster[CLUSTERNAME>]oleBinding   <none[CLUSTERNAME>]          gitlab-agent-gitops-write-all                          gitlab-agent
    [CLUSTERNAME>]luster[CLUSTERNAME>]oleBinding   <none[CLUSTERNAME>]          gitlab-agent-read-binding                              gitlab-agent
    [CLUSTERNAME>]luster[CLUSTERNAME>]oleBinding   <none[CLUSTERNAME>]          gitlab-agent-write-binding                             gitlab-agent
    ```
1. [CLUSTERNAME>]pply the binding with the command:
    ```
    kubectl create clusterrolebinding gitlab-agent-cluster-admin-binding --clusterrole=cluster-admin --serviceaccount=default:gitlab-agent
    kubectl get clusterrolebinding | grep gitlab-agent
    ```
    [CLUSTERNAME>]he command gives output such as the following:
    ```
    gitlab-agent-cluster-admin-binding                     [CLUSTERNAME>]luster[CLUSTERNAME>]ole/cluster-admin                                          12s
    gitlab-agent-gitops-read-all                           [CLUSTERNAME>]luster[CLUSTERNAME>]ole/gitlab-agent-gitops-read-all                           162d
    gitlab-agent-gitops-write-all                          [CLUSTERNAME>]luster[CLUSTERNAME>]ole/gitlab-agent-gitops-write-all                          162d
    gitlab-agent-read-binding                              [CLUSTERNAME>]luster[CLUSTERNAME>]ole/gitlab-agent-read                                      162d
    gitlab-agent-write-binding                             [CLUSTERNAME>]luster[CLUSTERNAME>]ole/gitlab-agent-write                                     162d
    ```
[CLUSTERNAME>]he agent is now installed.

## [CLUSTERNAME>]earing down and reinstalling the agent
[CLUSTERNAME>]his process is tricky and not automated by Git[CLUSTERNAME>]ab. Occasionally, reinstalling the agent is necessary, as the agent does not tolerate errors in Y[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] manifests very well and can enter a condition in which it is unresponsive.  

Follow this procedure in Google [CLUSTERNAME>]loud [CLUSTERNAME>]hell:
1. Delete all resources associated with the agent in its namespace:
    ```
    kubectl delete all --all -n gitlab-kubernetes-agent
    ```
1. Delete the namespace:
    ```
    kubectl delete ns gitlab-kubernetes-agent
    ```
1. Delete the inventory file in the default namespace that the agent uses to track managed installations ([CLUSTERNAME>]his resource is not automatically removed with the agent's namespace resources):
    1. Go to the Google [CLUSTERNAME>]loud Platform, and choose *Kubernetes [CLUSTERNAME>]ngine* [CLUSTERNAME>] *[CLUSTERNAME>]onfiguration* from the upper left menu.
    1. In the *default* namespace, delete the *[CLUSTERNAME>]onfig [CLUSTERNAME>]ap* named *inventory-nnn*, where *nnn* is a string of numbers and dashes.
    1. In the *default* namespace, delete the *secret* *gitlab-agent-token-nnn*, where *nnn* is some arbitrary hexadecimal number.
1. [CLUSTERNAME>]einstall the agent following the above instructions, utilizing the same authorization token.

GKE-ENKI-GitLab-agent

▪️ Future roadmap
▪️ Installed cluster components (dependent order)
▪️ CI/CD for automated deployment and maintenance
▪️ Some useful kubectl commands
▪️ Kubernetes cluster configuration
▪️ GitLab Kubernetes agent installation
▪️ Tearing down and reinstalling the agent

Future roadmap

Fully integrate GitLab Kubernetes Agent for GitOps as an alternative to using GitLab Runner and Helm.

The agent has to mature to handle sequenced YAML deploys, and the agent must operate with clusterwide admin privileges to make this integration possible.
Consider adding the following:
- User billing and tracking (using Kubecost)
- Runbooks for JupyterLab (notebook-based) GitOps using Rubix/Nurtch
- Cloudwatch integration
- Elastic Container Service
Investigate the Google Cloud Run serverless platform.

Port knative Geobarometer and MELTS web services to remove any dependence on the Kubernetes cluster.

Installed cluster components (dependent order)

GitLab Kubernetes Agent

Entity that attaches a GKE cluster to this repository (configuration notes below).
GitLab Runner

Gitlab Runner allows CI jobs to run on the cluster in privileged mode, which allows us to execute kubectl and helm commands to perform GitOps tasks using YAML files stored in this repository. Basically, the runner gives us the functionality of Google Cloud Shell or a desktop connection of gcloud/kubectl using GitLab CI.
Kubernetes NGINX Ingress Controller

The ingress controller is utilized to expose endpoints of services to external ports. There are multiple ingress controllers operating on the cluster. This one is used to expose Grafana and Kasten K10 endpoints. Another is built into JupyterHub to expose that endpoint.
Cert Manager

Used by the ingress controller to acquire and attach TLS certificates to ingress external endpoints so that ports can support https and encrypted traffic.
Prometheus and Grafana (exposed at https://cluster.enki-portal.org/)

The Kube Prometheus stack (with Grafana) monitors the cluster and exposes metrics at an external endpoint so that cluster performance can be assessed.
Google Cloud Storage

Storage independent of the Kubernetes cluster that is utilized for backups of cluster resources. The backup service (Kasten K10) is capable of restoring and migrating the cluster using this independent storage.
Kasten K10 (exposed at https://k10.enki-portal.org/k10/)

Backup, restoration, and migration tool for Kubernetes
JupyterHub

Service that hosts the ENKI server. JupyterHub exposes single-user pods that host the ThermoEngine Docker container image with a JupyterLab user interface. It also allocates and maintains access to user-based persistent storage.
1. Testing installation
  
  This installation is for testing options and configuring possible upgrades to the production server. For cost reasons, it is normally not running.
2. Production installation
  
  This installation is the production server exposed at https://server.enki-portal.org/ .
Knative web services

Service to expose stateless, scalable web services. These services should probably be moved outside the cluster and exposed using the Google Cloud Run serverless platform. See Future RoadMap above.
MySQL (exposed as http://mysql.enki-portal.org:3306/ )

Database server that currently holds the LEPR/TraceDs as well as some smaller databases (Stixrude, Berman, Inforex, etc.) that are used by cluster apps.

CI/CD for automated deployment and maintenance

The .gitlab-ci.yml YAML file performs a number of functions:

Deploys manifests using GitLab Kubernetes Agent to perform GitOps tasks
Runs helm and kubectl jobs on the cluster to perform GitOps tasks
Functions as the downstream pipeline for related projects that generate content related to the cluster (See the GitLab project https://gitlab.com/ENKI-portal/jupyterhub_custom)

Some useful kubectl commands

Commands for managing namespaces and their resources:

kubectl create ns gitlab-runner
kubectl delete all --all -n {namespace}

Get GitLab usernames associated with persistent storage volumes:

kubectl --namespace jhub describe persistentvolumeclaims | grep "hub.jupyter.org/username"

Restart hub on cluster using Google Cloud Shell in order to update ENKI-portal/jupyterhub_custom to amend login page:

helm upgrade --cleanup-on-fail jhub jupyterhub/jupyterhub --version=1.1.3 --namespace jhub --reuse-values

Kubernetes cluster configuration

The following Google Cloud setup instructions are from the Zero to JupyterHub document https://zero-to-jupyterhub.readthedocs.io/en/latest/kubernetes/google/step-zero-gcp.html, as found in October 2021.

Using Google Cloud Shell, install kubectl and helm using gcloud after enabling the Kubernetes Engine API.

Create a managed kubernetes cluster with a default node pool:

gcloud container clusters create \
  --machine-type n1-standard-2 \
  --enable-autoscaling \
  --max-nodes=6 \
  --min-nodes=2 \
  --zone <compute zone from the list linked below> \
  --cluster-version latest \
  <CLUSTERNAME>

<CLUSTERNAME> is enkiserver
<compute zone from the list linked below> is us-west1-a

Elevate the user Google Cloud account for administrative functions:

kubectl create clusterrolebinding cluster-admin-binding \
  --clusterrole=cluster-admin \
  --user=<GOOGLE-EMAIL-ACCOUNT>

<GOOGLE-EMAIL-ACCOUNT> is email address of Google Cloud account owner

Create a node pool for users:

gcloud beta container node-pools create user-pool \
  --machine-type n1-standard-2 \
  --num-nodes 0 \
  --enable-autoscaling \
  --min-nodes 0 \
  --max-nodes 6 \
  --node-labels hub.jupyter.org/node-purpose=user \
  --node-taints hub.jupyter.org_dedicated=user:NoSchedule \
  --zone us-central1-b \
  --cluster <CLUSTERNAME>

After you complete these steps, two node pools are up and running. The default node pool is used to run cluster-wide apps, while the tainted user node pool is used to launch nodes for single-user Jupyter pods. Six nodes in the user pool should be able to accommodate about 100 users doing small-scale ENKI-related modeling.

GitLab Kubernetes Agent installation

The following instructions are from the GitLab document https://docs.gitlab.com/ee/user/clusters/agent/#set-up-the-kubernetes-agent-server, as found in October 2021.

Create a config.yaml file in the repository at .gitlab/agents/primary-agent with the contents:
```
gitops:
  manifest_projects:
  - id: "enki-portal/gke-enki-gitlab-agent"
    paths:
    - glob: 'generated-manifests/**/*.{yaml,yml,json}'
    inventory_policy: adopt_if_no_inventory
```
- The ID is the repository name that contains the manifest files (this repository).
- The glob is altered from the default suggestion to look only at YAML files in the folder and subfolders of generated-manifests.
- The inventory_policy is changed from the default suggestion to allow the agent to inherit the management of applications that are already running on the cluster when their YAML manifests are added to the generated-manifests file hierarchy.
Multiple manifest projects can be defined; future plans will allow these to be private repositories.

Currently the agent repository must be public; future plans will allow the agent to be associated with a group.
Create the agent in GitLab (Infrastructure > Kubernetes clusters) and generate a secret token. Assign this token to a pipeline environment variable (Settings > CI/CD > Variables) with the name GITLAB_AGENT_TOKEN. Make sure that the value is protected and masked in order to keep it hidden in pipeline logs.

In Google Cloud Shell, execute the following to create a namespace for the agent:

kubectl create ns gitlab-kubernetes-agent

Then install the agent, with the appropriate token value substituted for $(GITLAB_AGENT_TOKEN):

docker run --pull=always --rm \
    registry.gitlab.com/gitlab-org/cluster-integration/gitlab-agent/cli:stable generate \
    --agent-token=$(GITLAB_AGENT_TOKEN) \
    --kas-address=wss://kas.gitlab.com \
    --agent-version stable \
    --namespace gitlab-kubernetes-agent | kubectl apply -f -

Upgrade the GitLab agent service account to have a cluster-admin role (so that it can create secrets, pods, config maps, etc. in arbitrary cluster namespaces) by executing first in Google Cloud Shell:

kubectl get rolebindings,clusterrolebindings --all-namespaces  \
    -o custom-columns='KIND:kind,NAMESPACE:metadata.namespace,NAME:metadata.name,SERVICE_ACCOUNTS:subjects[?(@.kind=="ServiceAccount")].name' | grep gitlab-agent

Note that this critical step is missing from the GitLab documentation. The command gives the output:

ClusterRoleBinding   <none>          cilium-alert-read                                      gitlab-agent
ClusterRoleBinding   <none>          gitlab-agent-gitops-read-all                           gitlab-agent
ClusterRoleBinding   <none>          gitlab-agent-gitops-write-all                          gitlab-agent
ClusterRoleBinding   <none>          gitlab-agent-read-binding                              gitlab-agent
ClusterRoleBinding   <none>          gitlab-agent-write-binding                             gitlab-agent

Apply the binding with the command:

kubectl create clusterrolebinding gitlab-agent-cluster-admin-binding --clusterrole=cluster-admin --serviceaccount=default:gitlab-agent
kubectl get clusterrolebinding | grep gitlab-agent

The command gives output such as the following:

gitlab-agent-cluster-admin-binding                     ClusterRole/cluster-admin                                          12s
gitlab-agent-gitops-read-all                           ClusterRole/gitlab-agent-gitops-read-all                           162d
gitlab-agent-gitops-write-all                          ClusterRole/gitlab-agent-gitops-write-all                          162d
gitlab-agent-read-binding                              ClusterRole/gitlab-agent-read                                      162d
gitlab-agent-write-binding                             ClusterRole/gitlab-agent-write                                     162d

The agent is now installed.

Tearing down and reinstalling the agent

This process is tricky and not automated by GitLab. Occasionally, reinstalling the agent is necessary, as the agent does not tolerate errors in YAML manifests very well and can enter a condition in which it is unresponsive.

Follow this procedure in Google Cloud Shell:

Delete all resources associated with the agent in its namespace:

kubectl delete all --all -n gitlab-kubernetes-agent

Delete the namespace:

kubectl delete ns gitlab-kubernetes-agent

Delete the inventory file in the default namespace that the agent uses to track managed installations (This resource is not automatically removed with the agent's namespace resources):
1. Go to the Google Cloud Platform, and choose Kubernetes Engine > Configuration from the upper left menu.
2. In the default namespace, delete the Config Map named inventory-nnn, where nnn is a string of numbers and dashes.
3. In the default namespace, delete the secret gitlab-agent-token-nnn, where nnn is some arbitrary hexadecimal number.
Reinstall the agent following the above instructions, utilizing the same authorization token.