General

GKE-ENKI-GitLab-agent

GKE-ENKI-GitLab-agent is a project to manage and configure GitLab Kubernetes Agent for the ENKI Google Cloud cluster and to configure all required cluster tools for production, monitoring, and backup.

promptBeginner5 min to valuemarkdown
0 views
Jan 14, 2026

Sign in to like and favorite skills

Prompt Playground

1 Variables

Fill Variables

Preview

# GK[CLUSTERNAME>]-[CLUSTERNAME>][CLUSTERNAME>]KI-Git[CLUSTERNAME>]ab-agent

GK[CLUSTERNAME>]-[CLUSTERNAME>][CLUSTERNAME>]KI-Git[CLUSTERNAME>]ab-agent is a project to manage and configure Git[CLUSTERNAME>]ab Kubernetes [CLUSTERNAME>]gent for the [CLUSTERNAME>][CLUSTERNAME>]KI Google [CLUSTERNAME>]loud cluster and to configure all required cluster tools for production, monitoring, and backup.


## [CLUSTERNAME>]ontents
▪️ [Future roadmap](#future-roadmap)  
▪️ [Installed cluster components (dependent order)](#installed-cluster-components-dependent-order)  
▪️ [[CLUSTERNAME>]I/[CLUSTERNAME>]D for automated deployment and maintenance](#cicd-for-automated-deployment-and-maintenance)  
▪️ [[CLUSTERNAME>]ome useful kubectl commands](#some-useful-kubectl-commands)  
▪️ [Kubernetes cluster configuration](#kubernetes-cluster-configuration)  
▪️ [Git[CLUSTERNAME>]ab Kubernetes agent installation](#gitlab-kubernetes-agent-installation)  
▪️ [[CLUSTERNAME>]earing down and reinstalling the agent](#tearing-down-and-reinstalling-the-agent)  


## Future roadmap 
- Fully integrate Git[CLUSTERNAME>]ab Kubernetes [CLUSTERNAME>]gent for GitOps as an alternative to using Git[CLUSTERNAME>]ab [CLUSTERNAME>]unner and Helm.  
  [CLUSTERNAME>] [CLUSTERNAME>]he agent has to mature to handle sequenced Y[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] deploys, and the agent must operate with clusterwide admin privileges to make this integration possible.
- [CLUSTERNAME>]onsider adding the following:
    - [CLUSTERNAME>]ser billing and tracking (using Kubecost)
    - [CLUSTERNAME>]unbooks for Jupyter[CLUSTERNAME>]ab (notebook-based) GitOps using [CLUSTERNAME>]ubix/[CLUSTERNAME>]urtch 
    - [CLUSTERNAME>]loudwatch integration
    - [CLUSTERNAME>]lastic [CLUSTERNAME>]ontainer [CLUSTERNAME>]ervice
    
- Investigate the Google [CLUSTERNAME>]loud [CLUSTERNAME>]un serverless platform.
  [CLUSTERNAME>] Port knative Geobarometer and [CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] web services to remove any dependence on the Kubernetes cluster. 

## Installed cluster components (dependent order)

1. **Git[CLUSTERNAME>]ab Kubernetes [CLUSTERNAME>]gent**
    [CLUSTERNAME>] [CLUSTERNAME>]ntity that attaches a GK[CLUSTERNAME>] cluster to this repository (configuration notes below).
1. **Git[CLUSTERNAME>]ab [CLUSTERNAME>]unner**
    [CLUSTERNAME>] Gitlab [CLUSTERNAME>]unner allows [CLUSTERNAME>]I jobs to run on the cluster in privileged mode, which allows us to execute *kubectl* and *helm* commands to perform GitOps tasks using Y[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] files stored in this repository.  Basically, the runner gives us the functionality of Google [CLUSTERNAME>]loud [CLUSTERNAME>]hell or a desktop connection of gcloud/kubectl using Git[CLUSTERNAME>]ab [CLUSTERNAME>]I. 
1. **Kubernetes [CLUSTERNAME>]GI[CLUSTERNAME>]X Ingress [CLUSTERNAME>]ontroller**
    [CLUSTERNAME>] [CLUSTERNAME>]he ingress controller is utilized to expose endpoints of services to external ports.  [CLUSTERNAME>]here are multiple ingress controllers operating on the cluster. [CLUSTERNAME>]his one is used to expose Grafana and Kasten K10 endpoints. [CLUSTERNAME>]nother is built into JupyterHub to expose that endpoint.
1. **[CLUSTERNAME>]ert [CLUSTERNAME>]anager**
    [CLUSTERNAME>] [CLUSTERNAME>]sed by the ingress controller to acquire and attach [CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] certificates to ingress external endpoints so that ports can support *https* and encrypted traffic.
1. **Prometheus** and **Grafana** (exposed at https://cluster.enki-portal.org/)
    [CLUSTERNAME>] [CLUSTERNAME>]he Kube Prometheus stack (with Grafana) monitors the cluster and exposes metrics at an external endpoint so that cluster performance can be assessed.
1. **Google [CLUSTERNAME>]loud [CLUSTERNAME>]torage**
    [CLUSTERNAME>] [CLUSTERNAME>]torage independent of the Kubernetes cluster that is utilized for backups of cluster resources.  [CLUSTERNAME>]he backup service (Kasten K10) is capable of restoring and migrating the cluster using this independent storage.
1. **Kasten K10** (exposed at https://k10.enki-portal.org/k10/)
    [CLUSTERNAME>] Backup, restoration, and migration tool for Kubernetes
1. **JupyterHub**
    [CLUSTERNAME>] [CLUSTERNAME>]ervice that hosts the [CLUSTERNAME>][CLUSTERNAME>]KI server. JupyterHub exposes single-user pods that host the [CLUSTERNAME>]hermo[CLUSTERNAME>]ngine Docker container image with a Jupyter[CLUSTERNAME>]ab user interface. It also allocates and maintains access to user-based persistent storage.
    1. [CLUSTERNAME>]esting installation
        [CLUSTERNAME>] [CLUSTERNAME>]his installation is for testing options and configuring possible upgrades to the production server. For cost reasons, it is normally not running.
    1. Production installation
        [CLUSTERNAME>] [CLUSTERNAME>]his installation is the production server exposed at https://server.enki-portal.org/ .
1. **Knative** web services
    [CLUSTERNAME>] [CLUSTERNAME>]ervice to expose stateless, scalable web services.  [CLUSTERNAME>]hese services should probably be moved outside the cluster and exposed using the Google [CLUSTERNAME>]loud [CLUSTERNAME>]un serverless platform. [CLUSTERNAME>]ee *Future [CLUSTERNAME>]oad[CLUSTERNAME>]ap* above.
1. **[CLUSTERNAME>]y[CLUSTERNAME>]Q[CLUSTERNAME>]** (exposed as http://mysql.enki-portal.org:3306/ )
    [CLUSTERNAME>] Database server that currently holds the [CLUSTERNAME>][CLUSTERNAME>]P[CLUSTERNAME>]/[CLUSTERNAME>]raceDs as well as some smaller databases ([CLUSTERNAME>]tixrude, Berman, Inforex, etc.) that are used by cluster apps.

## [CLUSTERNAME>]I/[CLUSTERNAME>]D for automated deployment and maintenance
[CLUSTERNAME>]he *.gitlab-ci.yml* Y[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] file performs a number of functions:
- Deploys manifests using Git[CLUSTERNAME>]ab Kubernetes [CLUSTERNAME>]gent to perform GitOps tasks
- [CLUSTERNAME>]uns *helm* and *kubectl* jobs on the cluster to perform GitOps tasks
- Functions as the downstream pipeline for related projects that generate content related to the cluster ([CLUSTERNAME>]ee the Git[CLUSTERNAME>]ab project https://gitlab.com/[CLUSTERNAME>][CLUSTERNAME>]KI-portal/jupyterhub_custom)

## [CLUSTERNAME>]ome useful kubectl commands
- [CLUSTERNAME>]ommands for managing namespaces and their resources:
    ```
    kubectl create ns gitlab-runner
    kubectl delete all --all -n {namespace}
    ``` 
- Get Git[CLUSTERNAME>]ab usernames associated with persistent storage volumes:
    ```
    kubectl --namespace jhub describe persistentvolumeclaims | grep "hub.jupyter.org/username"
    ```
- [CLUSTERNAME>]estart hub on cluster using Google [CLUSTERNAME>]loud [CLUSTERNAME>]hell in order to update [CLUSTERNAME>][CLUSTERNAME>]KI-portal/jupyterhub_custom to amend login page:
    ```
    helm upgrade --cleanup-on-fail jhub jupyterhub/jupyterhub --version=1.1.3 --namespace jhub --reuse-values
    ```

## Kubernetes cluster configuration 
[CLUSTERNAME>]he following Google [CLUSTERNAME>]loud setup instructions are from the **Zero to JupyterHub** document https://zero-to-jupyterhub.readthedocs.io/en/latest/kubernetes/google/step-zero-gcp.html, as found in October 2021.
1. [CLUSTERNAME>]sing Google [CLUSTERNAME>]loud [CLUSTERNAME>]hell, install **kubectl** and **helm** using **gcloud** after enabling the Kubernetes [CLUSTERNAME>]ngine [CLUSTERNAME>]PI.
1. [CLUSTERNAME>]reate a managed kubernetes cluster with a default node pool:
    ```
    gcloud container clusters create \
      --machine-type n1-standard-2 \
      --enable-autoscaling \
      --max-nodes=6 \
      --min-nodes=2 \
      --zone <compute zone from the list linked below[CLUSTERNAME>] \
      --cluster-version latest \
      <[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]
    ```
    - *\<[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]\[CLUSTERNAME>]* is **enkiserver**
    - *\<compute zone from the list linked below\[CLUSTERNAME>]* is **us-west1-a**
1. [CLUSTERNAME>]levate the user Google [CLUSTERNAME>]loud account for administrative functions:
    ```
    kubectl create clusterrolebinding cluster-admin-binding \
      --clusterrole=cluster-admin \
      --user=<GOOG[CLUSTERNAME>][CLUSTERNAME>]-[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]I[CLUSTERNAME>]-[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]O[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]
    ```
    - *\<GOOG[CLUSTERNAME>][CLUSTERNAME>]-[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]I[CLUSTERNAME>]-[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]O[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]\[CLUSTERNAME>]* is *email address* of Google [CLUSTERNAME>]loud account owner
1. [CLUSTERNAME>]reate a node pool for users:
    ```
    gcloud beta container node-pools create user-pool \
      --machine-type n1-standard-2 \
      --num-nodes 0 \
      --enable-autoscaling \
      --min-nodes 0 \
      --max-nodes 6 \
      --node-labels hub.jupyter.org/node-purpose=user \
      --node-taints hub.jupyter.org_dedicated=user:[CLUSTERNAME>]o[CLUSTERNAME>]chedule \
      --zone us-central1-b \
      --cluster <[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]
    ```
[CLUSTERNAME>]fter you complete these steps, two node pools are up and running. [CLUSTERNAME>]he default node pool is used to run cluster-wide apps, while the tainted user node pool is used to launch nodes for single-user Jupyter pods. [CLUSTERNAME>]ix nodes in the user pool should be able to accommodate about 100 users doing small-scale [CLUSTERNAME>][CLUSTERNAME>]KI-related modeling.

## Git[CLUSTERNAME>]ab Kubernetes [CLUSTERNAME>]gent installation
[CLUSTERNAME>]he following instructions are from the Git[CLUSTERNAME>]ab document https://docs.gitlab.com/ee/user/clusters/agent/#set-up-the-kubernetes-agent-server, as found in October 2021.
1. [CLUSTERNAME>]reate a config.yaml file in the repository at *.gitlab/agents/primary-agent* with the contents:
    ```
    gitops:
      manifest_projects:
      - id: "enki-portal/gke-enki-gitlab-agent"
        paths:
        - glob: 'generated-manifests/**/*.{yaml,yml,json}'
        inventory_policy: adopt_if_no_inventory
    ```
    - [CLUSTERNAME>]he *ID* is the repository name that contains the manifest files (this repository).
    - [CLUSTERNAME>]he *glob* is altered from the default suggestion to look only at Y[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] files in the folder and subfolders of *generated-manifests*.
    - [CLUSTERNAME>]he *inventory_policy* is changed from the default suggestion to allow the agent to inherit the management of applications that are already running on the cluster when their Y[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] manifests are added to the *generated-manifests* file hierarchy.      
    
    [CLUSTERNAME>]ultiple manifest projects can be defined; future plans will allow these to be *private* repositories.  
    
    [CLUSTERNAME>]urrently the agent repository must be public; future plans will allow the agent to be associated with a *group*.
1. [CLUSTERNAME>]reate the agent in Git[CLUSTERNAME>]ab (*Infrastructure [CLUSTERNAME>] Kubernetes clusters*) and generate a *secret token*. [CLUSTERNAME>]ssign this token to a pipeline environment variable (*[CLUSTERNAME>]ettings* [CLUSTERNAME>] *[CLUSTERNAME>]I/[CLUSTERNAME>]D* [CLUSTERNAME>] *Variables*) with the name *GI[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]B_[CLUSTERNAME>]G[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]_[CLUSTERNAME>]OK[CLUSTERNAME>][CLUSTERNAME>]*. [CLUSTERNAME>]ake sure that the value is *protected* and *masked* in order to keep it hidden in pipeline logs.
1. In Google [CLUSTERNAME>]loud [CLUSTERNAME>]hell, execute the following to create a namespace for the agent:
    ```
    kubectl create ns gitlab-kubernetes-agent
    ```
    [CLUSTERNAME>]hen install the agent, with the appropriate token value substituted for *$(GI[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]B_[CLUSTERNAME>]G[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]_[CLUSTERNAME>]OK[CLUSTERNAME>][CLUSTERNAME>])*:
    ```
    docker run --pull=always --rm \
        registry.gitlab.com/gitlab-org/cluster-integration/gitlab-agent/cli:stable generate \
        --agent-token=$(GI[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]B_[CLUSTERNAME>]G[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]_[CLUSTERNAME>]OK[CLUSTERNAME>][CLUSTERNAME>]) \
        --kas-address=wss://kas.gitlab.com \
        --agent-version stable \
        --namespace gitlab-kubernetes-agent | kubectl apply -f -
    ```
    
1. [CLUSTERNAME>]pgrade the Git[CLUSTERNAME>]ab agent service account to have a cluster-admin role (so that it can create *secrets*, *pods*, *config maps*, etc. in arbitrary cluster *namespaces*) by executing first in Google [CLUSTERNAME>]loud [CLUSTERNAME>]hell:
    ```
    kubectl get rolebindings,clusterrolebindings --all-namespaces  \
        -o custom-columns='KI[CLUSTERNAME>]D:kind,[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]P[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]:metadata.namespace,[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]:metadata.name,[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]VI[CLUSTERNAME>][CLUSTERNAME>]_[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]O[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>]:subjects[?(@.kind=="[CLUSTERNAME>]ervice[CLUSTERNAME>]ccount")].name' | grep gitlab-agent
    ```
    [CLUSTERNAME>]ote that this critical step is missing from the Git[CLUSTERNAME>]ab documentation. [CLUSTERNAME>]he command gives the output:
    ```
    [CLUSTERNAME>]luster[CLUSTERNAME>]oleBinding   <none[CLUSTERNAME>]          cilium-alert-read                                      gitlab-agent
    [CLUSTERNAME>]luster[CLUSTERNAME>]oleBinding   <none[CLUSTERNAME>]          gitlab-agent-gitops-read-all                           gitlab-agent
    [CLUSTERNAME>]luster[CLUSTERNAME>]oleBinding   <none[CLUSTERNAME>]          gitlab-agent-gitops-write-all                          gitlab-agent
    [CLUSTERNAME>]luster[CLUSTERNAME>]oleBinding   <none[CLUSTERNAME>]          gitlab-agent-read-binding                              gitlab-agent
    [CLUSTERNAME>]luster[CLUSTERNAME>]oleBinding   <none[CLUSTERNAME>]          gitlab-agent-write-binding                             gitlab-agent
    ```
1. [CLUSTERNAME>]pply the binding with the command:
    ```
    kubectl create clusterrolebinding gitlab-agent-cluster-admin-binding --clusterrole=cluster-admin --serviceaccount=default:gitlab-agent
    kubectl get clusterrolebinding | grep gitlab-agent
    ```
    [CLUSTERNAME>]he command gives output such as the following:
    ```
    gitlab-agent-cluster-admin-binding                     [CLUSTERNAME>]luster[CLUSTERNAME>]ole/cluster-admin                                          12s
    gitlab-agent-gitops-read-all                           [CLUSTERNAME>]luster[CLUSTERNAME>]ole/gitlab-agent-gitops-read-all                           162d
    gitlab-agent-gitops-write-all                          [CLUSTERNAME>]luster[CLUSTERNAME>]ole/gitlab-agent-gitops-write-all                          162d
    gitlab-agent-read-binding                              [CLUSTERNAME>]luster[CLUSTERNAME>]ole/gitlab-agent-read                                      162d
    gitlab-agent-write-binding                             [CLUSTERNAME>]luster[CLUSTERNAME>]ole/gitlab-agent-write                                     162d
    ```
[CLUSTERNAME>]he agent is now installed.

## [CLUSTERNAME>]earing down and reinstalling the agent
[CLUSTERNAME>]his process is tricky and not automated by Git[CLUSTERNAME>]ab. Occasionally, reinstalling the agent is necessary, as the agent does not tolerate errors in Y[CLUSTERNAME>][CLUSTERNAME>][CLUSTERNAME>] manifests very well and can enter a condition in which it is unresponsive.  

Follow this procedure in Google [CLUSTERNAME>]loud [CLUSTERNAME>]hell:
1. Delete all resources associated with the agent in its namespace:
    ```
    kubectl delete all --all -n gitlab-kubernetes-agent
    ```
1. Delete the namespace:
    ```
    kubectl delete ns gitlab-kubernetes-agent
    ```
1. Delete the inventory file in the default namespace that the agent uses to track managed installations ([CLUSTERNAME>]his resource is not automatically removed with the agent's namespace resources):
    1. Go to the Google [CLUSTERNAME>]loud Platform, and choose *Kubernetes [CLUSTERNAME>]ngine* [CLUSTERNAME>] *[CLUSTERNAME>]onfiguration* from the upper left menu.
    1. In the *default* namespace, delete the *[CLUSTERNAME>]onfig [CLUSTERNAME>]ap* named *inventory-nnn*, where *nnn* is a string of numbers and dashes.
    1. In the *default* namespace, delete the *secret* *gitlab-agent-token-nnn*, where *nnn* is some arbitrary hexadecimal number.
1. [CLUSTERNAME>]einstall the agent following the above instructions, utilizing the same authorization token. 
Share: