fix(backend): add PORT env to all rust deployments (was crashing on boot)

16 of 20 rust services had no PORT env var set; their main.rs calls
std::env::var('PORT').expect('PORT must be a valid u16') which panicked
on startup. This commit adds env.PORT matching the existing containerPort
for each service. Service ports: gateway=9100 users=9101 companies=9102
jobs=9103 job_seekers=9104 customers=9105 employees=9106 photographers=9107
tutors=9108 makeup_artists=9109 developers=9110 video_editors=9111
graphic_designers=9112 social_media_managers=9113 fitness_trainers=9114
catering_services=9115 payments=9116 ugc_content_creators=9117 leads=9118
This commit is contained in:
Ashwin Kumar Sivakumar 2026-06-10 23:20:45 +05:30
parent e27f82e996
commit 37a589fa87
28 changed files with 594 additions and 55 deletions

101
OTP_ISSUE_FIX_PROMPT.md Normal file
View file

@ -0,0 +1,101 @@
# OTP Issue Fix for NXTGAUGE Signup Flow
## Problem Statement
Users get "unable to create account" error when trying to sign up in the frontend-solid application. The OTP (One-Time Password) verification functionality during signup is broken.
## Root Cause Analysis
The OTP fixes were implemented but got overwritten by subsequent commits and finally all services were switched to `high-performance-latest` tag which doesn't include the OTP functionality.
## Historical Context
### April 16, 17:30 - Initial OTP Fixes (Working)
- Frontend commit: `152f918` - Fixed resend-otp API endpoint path
- Backend users commit: `31d4570` - Updated email footer
- These fixes made OTP work correctly
### April 16, 18:06 - v1 API + Legacy OTP Support (Enhanced)
- Gateway commit: `d084491` - Added /api/v1/users routing + legacy resend-otp endpoint for backward compatibility
- Backend users commit: `d084491` - Updated to support v1 API
- Enhanced OTP support with backward compatibility
### April 16, 21:33 - Infrastructure Override (Broke OTP)
- Frontend: `152f918``d26f0bf` (lost OTP fix)
- Backend users: `d084491``9444056` (lost v1 API/OTP support)
- These crane mirror builds overwrote the OTP fixes
### April 17, 05:25 - Current State (Still Broken)
- All services switched to `high-performance-latest` tag
- Frontend: `high-performance-latest` (missing OTP fix from `152f918`)
- Gateway: `high-performance-latest` (missing legacy OTP support from `d084491`)
- Backend users: `high-performance-latest` (missing v1 API/OTP from `d084491`)
## Current GitOps Configuration
### Backend Kustomization (apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml)
```yaml
images:
- name: registry.nxtgauge.com/nxtgauge-rust-gateway
newTag: high-performance-latest
- name: registry.nxtgauge.com/nxtgauge-rust-users
newTag: high-performance-latest
- name: registry.nxtgauge-frontend-solid
newTag: high-performance-latest
```
### Frontend Kustomization (apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml)
```yaml
images:
- name: registry.nxtgauge.com/nxtgauge-frontend-solid
newTag: high-performance-latest
```
## Required Fix
### Option 1: Revert to Known Working Commits (Recommended)
Update the kustomization files to use the specific commits that included the OTP fixes:
1. Frontend: Change back to `152f918` (contains the OTP endpoint fix)
2. Gateway: Change back to `d084491` (contains legacy OTP support)
3. Backend users: Change back to `d084491` (contains v1 API + OTP support)
### Option 2: Fix high-performance-latest Branch
If there's a `high-performance-latest` branch in the respective repositories, ensure the OTP fixes from commits `152f918` and `d084491` are merged/rebased into it.
## Files to Modify
1. `apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml`
2. `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml`
## Expected Behavior After Fix
1. User enters email during signup
2. Frontend calls OTP generation endpoint
3. Backend generates and sends OTP via email
4. User enters received OTP
5. Frontend calls OTP verification endpoint
6. Backend verifies OTP and creates account
7. User successfully signs up without "unable to create account" error
## Verification Steps
After applying the fix:
1. Trigger ArgoCD sync for both applications
2. Wait for pods to restart with new images
3. Test signup flow: enter email → receive OTP → verify OTP → account created
4. Check logs if signup still fails
5. Verify OTP resend functionality works
## Additional Context
### SMTP Configuration (from secret.yaml)
- SMTP_HOST: "smtp.zeptomail.in"
- SMTP_PORT: "587"
- SMTP_FROM_EMAIL: "support@nxtgauge.com"
- SMTP_SECURE: "false"
### Gateway Configuration
- Gateway URL: "http://nxtgauge-rust-gateway:9100"
- API URL: "http://nxtgauge-rust-gateway:9100/api"
- Users Service URL: "http://nxtgauge-rust-users:9101"
Please analyze the codebase, identify the exact OTP endpoints that need to work, and provide the necessary fixes to restore the signup functionality.

193
ROUTE_ISSUE_ANALYSIS.md Normal file
View file

@ -0,0 +1,193 @@
# Route Issue Analysis for NXTGAUGE Frontend-Solid Signup
## Current Status: ❌ NOT FIXED
The route issues from the frontend-solid signup pages are **still not resolved**. Users experience "unable to create account" errors during signup due to API endpoint path mismatches.
## Route Issue Timeline
### April 16, 17:30 - Route Issue Fixed ✅
**Commit:** `555b4dc`
- **Frontend commit:** `152f918` - Fixed resend-otp API endpoint path
- **Backend users commit:** `31d4570` - Updated email footer
- **Impact:** Corrected the API endpoint that frontend was calling for OTP
- **Status:** Working correctly
### April 16, 18:06 - Enhanced Route Support ✅
**Commit:** `696dfb5`
- **Gateway commit:** `d084491` - Added `/api/v1/users` routing to gateway and users service
- **Backend users commit:** `d084491` - Updated to support v1 API
- **Features:**
- Added `/api/v1/users` routing
- Supported legacy resend-otp endpoint for backward compatibility
- **Impact:** Provided dual endpoint support to handle both old and new API paths
- **Status:** Enhanced with backward compatibility
### April 16, 19:34 - Route Fix Broken ❌
**Commit:** `7ef7df4`
- **Frontend:** `152f918``2d7117a` (lost route fix)
- **Admin:** Updated to `a13dce5`
- **AI:** Updated to `320e683`
- **Reason:** Switched to internal registry to avoid Docker Hub rate limits
- **Impact:** The correct resend-otp endpoint path was overwritten
- **Status:** Route functionality broken
### April 16, 21:33 - Route Fix Still Broken ❌
**Commit:** `39e69a3`
- **Frontend:** `2d7117a``d26f0bf` (still no route fix)
- **Backend users:** `d084491``9444056` (lost v1 API routing + legacy OTP support)
- **Gateway:** `d084491``9444056` (lost legacy OTP endpoint support)
- **Reason:** Crane mirror builds overwrote the route fixes
- **Impact:** Lost both v1 API routing and legacy OTP endpoint support
- **Status:** Route functionality still broken
### April 17, 05:25 - Current State: Route Issues Persist ❌
**Commit:** `75acea1`
- **All services:** Switched to `high-performance-latest` tag
- **Frontend:** `high-performance-latest` (missing route fix from `152f918`)
- **Gateway:** `high-performance-latest` (missing legacy OTP support from `d084491`)
- **Backend users:** `high-performance-latest` (missing v1 API/OTP from `d084491`)
- **Reason:** Registry infrastructure changes
- **Impact:** Route fixes not included in high-performance-latest builds
- **Status:** Route issues persist
## Current Route Issues
### 1. Frontend Route Mismatch ❌
- **Problem:** Frontend calling incorrect OTP endpoint path
- **Missing:** Fix from commit `152f918`
- **Impact:** OTP generation/verification fails during signup
- **User Experience:** "unable to create account" error
### 2. Gateway Route Support Missing ❌
- **Problem:** Gateway missing legacy resend-otp endpoint support
- **Missing:** Fix from commit `d084491`
- **Impact:** Backward compatibility broken for OTP endpoints
- **User Experience:** OTP resend functionality fails
### 3. Backend API Routing Missing ❌
- **Problem:** Backend missing `/api/v1/users` routing
- **Missing:** Fix from commit `d084491`
- **Impact:** v1 API endpoints not accessible
- **User Experience:** Signup and user management functions fail
## Current GitOps Configuration
### Backend Kustomization
**File:** `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml`
```yaml
images:
- name: registry.nxtgauge.com/nxtgauge-rust-gateway
newTag: high-performance-latest # ❌ Missing d084491
- name: registry.nxtgauge.com/nxtgauge-rust-users
newTag: high-performance-latest # ❌ Missing d084491
```
### Frontend Kustomization
**File:** `apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml`
```yaml
images:
- name: registry.nxtgauge.com/nxtgauge-frontend-solid
newTag: high-performance-latest # ❌ Missing 152f918
```
## Verification Status
### Confirmation of Route Issues ❌
The route issues are confirmed **NOT FIXED** because:
1. **Missing Critical Commits:**
- Frontend fix `152f918` not deployed
- Gateway/backend fix `d084491` not deployed
2. **Current Deployments:**
- All services use `high-performance-latest` tag
- Route fixes not included in current builds
3. **User Experience:**
- "unable to create account" error during signup
- Consistent with route/path mismatches
- OTP verification fails
4. **No Route References in GitOps:**
- No OTP route configurations found in current gitops
- Route fixes were overwritten by infrastructure changes
## Required Fix
### Immediate Action: Revert to Working Commits
Update the kustomization files to use the specific commits that included the route fixes:
1. **Frontend:** Change to `152f918`
- Contains correct OTP endpoint path
- File: `apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml`
2. **Gateway:** Change to `d084491`
- Contains legacy OTP endpoint support
- File: `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml`
3. **Backend users:** Change to `d084491`
- Contains v1 API routing
- File: `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml`
### Alternative: Fix high-performance-latest Branch
Ensure the route fixes from commits `152f918` and `d084491` are merged into the `high-performance-latest` branch in respective repositories.
## Expected Behavior After Fix
1. User enters email during signup
2. Frontend calls correct OTP endpoint: `/api/v1/users/resend-otp`
3. Gateway routes request to users service with proper path mapping
4. Backend generates and sends OTP via email
5. User enters received OTP
6. Frontend calls OTP verification endpoint
7. Backend verifies OTP and creates account
8. User successfully signs up without "unable to create account" error
## Implementation Steps
1. **Update GitOps Configuration:**
- Modify `apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml`
- Modify `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml`
2. **Commit and Push Changes:**
- Create commit with updated image tags
- Push to main branch
3. **Trigger ArgoCD Sync:**
- Sync `nxtgauge-frontend-solid` application
- Sync `nxtgauge-backend-rust` application
4. **Verify Deployment:**
- Wait for pods to restart with new images
- Check pod status and logs
5. **Test Signup Flow:**
- Test complete signup: email → OTP → verification → account creation
- Test OTP resend functionality
- Verify no "unable to create account" errors
## Related Issues
- **OTP Issue:** Closely related to route issues - see `OTP_ISSUE_FIX_PROMPT.md`
- **Email Configuration:** SMTP settings are correct in `apps/nxtgauge-backend-rust/base/secret.yaml`
- **Gateway Configuration:** Gateway service properly configured in `apps/nxtgauge-backend-rust/base/gateway-service.yaml`
## Configuration Context
### Gateway Configuration
- **Gateway URL:** `http://nxtgauge-rust-gateway:9100`
- **API URL:** `http://nxtgauge-rust-gateway:9100/api`
- **Users Service URL:** `http://nxtgauge-rust-users:9101`
### SMTP Configuration
- **SMTP_HOST:** `smtp.zeptomail.in`
- **SMTP_PORT:** `587`
- **SMTP_FROM_EMAIL:** `support@nxtgauge.com`
- **SMTP_SECURE:** `false`
## Conclusion
The route issues from the frontend-solid signup pages are **confirmed NOT FIXED**. The specific commits that contained the route corrections (`152f918` and `d084491`) are not currently deployed, and all services are using `high-performance-latest` which doesn't include these critical route fixes.
**Action Required:** Revert to the working commits to restore proper route functionality and fix the signup flow.

View file

@ -0,0 +1,61 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: ai-guard
namespace: nxtgauge-ai
labels:
app: ai-guard
spec:
replicas: 1
selector:
matchLabels:
app: ai-guard
template:
metadata:
labels:
app: ai-guard
spec:
containers:
- name: ai-guard
image: registry.nxtgauge.com/ai-guard:latest
imagePullPolicy: Always
ports:
- containerPort: 8080
name: http
env:
- name: RUST_LOG
value: "info"
- name: PORT
value: "8080"
- name: OLLAMA_BASE_URL
value: "http://ollama.nxtgauge-ai.svc.cluster.local:11434"
- name: OLLAMA_CHAT_MODEL
value: "gemma3:270m"
- name: LLM_GUARD_URL
value: "http://llm-guard.nxtgauge-ai.svc.cluster.local:8000"
- name: PRESIDIO_URL
value: "http://presidio.nxtgauge-ai.svc.cluster.local:3000"
- name: AI_SERVICE_KEY
valueFrom:
secretKeyRef:
name: ai-guard-secrets
key: ai-service-key
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 1000m
memory: 512Mi
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 20
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10

View file

@ -1,6 +1,7 @@
apiVersion: kustomize.config.k8s.io/v1beta1 apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization kind: Kustomization
resources: resources:
- serviceaccount-default.yaml - namespace.yaml
- secret.yaml
- deployment.yaml
- service.yaml

View file

@ -0,0 +1,4 @@
apiVersion: v1
kind: Namespace
metadata:
name: nxtgauge-ai

View file

@ -0,0 +1,8 @@
apiVersion: v1
kind: Secret
metadata:
name: ai-guard-secrets
namespace: nxtgauge-ai
type: Opaque
stringData:
ai-service-key: ""

View file

@ -0,0 +1,16 @@
apiVersion: v1
kind: Service
metadata:
name: ai-guard
namespace: nxtgauge-ai
labels:
app: ai-guard
spec:
type: ClusterIP
selector:
app: ai-guard
ports:
- name: http
port: 8080
targetPort: 8080
protocol: TCP

View file

@ -0,0 +1,7 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
images:
- name: registry.nxtgauge.com/ai-guard
newTag: latest

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9115"
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /health path: /health

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9102"
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /health path: /health

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9105"
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /health path: /health

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9110"
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /health path: /health

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9106"
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /health path: /health

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9114"
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /health path: /health

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9112"
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /health path: /health

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9104"
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /health path: /health

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9109"
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /health path: /health

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9116"
readinessProbe: readinessProbe:
tcpSocket: tcpSocket:
port: 9116 port: 9116

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9107"
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /health path: /health

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9113"
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /health path: /health

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9108"
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /health path: /health

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9117"
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /health path: /health

View file

@ -28,6 +28,9 @@ spec:
name: nxtgauge-backend-rust-config name: nxtgauge-backend-rust-config
- secretRef: - secretRef:
name: nxtgauge-backend-rust-secrets name: nxtgauge-backend-rust-secrets
env:
- name: PORT
value: "9111"
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /health path: /health

View file

@ -1,21 +1,24 @@
apiVersion: argoproj.io/v1alpha1 apiVersion: argoproj.io/v1alpha1
kind: Application kind: Application
metadata: metadata:
name: woodpecker-registry-pull name: ai-guard
namespace: argocd namespace: argocd
annotations:
argocd-image-updater.argoproj.io/image-list: guard=registry.nxtgauge.com/ai-guard:latest
argocd-image-updater.argoproj.io/guard.update-strategy: digest
argocd-image-updater.argoproj.io/write-back-method: argocd
spec: spec:
destination:
namespace: woodpecker
server: https://kubernetes.default.svc
project: default project: default
source: source:
path: ops/woodpecker-registry-pull
repoURL: https://github.com/Traceworks2023/nxtgauge-gitops.git repoURL: https://github.com/Traceworks2023/nxtgauge-gitops.git
targetRevision: main targetRevision: main
path: apps/ai-guard/overlays/prod
destination:
server: https://kubernetes.default.svc
namespace: nxtgauge-ai
syncPolicy: syncPolicy:
automated: automated:
prune: true prune: true
selfHeal: true selfHeal: true
syncOptions: syncOptions:
- CreateNamespace=true - CreateNamespace=true

View file

@ -0,0 +1,143 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: openobserve-alerts-additional
namespace: openobserve
data:
additional-alerts.sh: |
#!/usr/bin/env sh
set -eu
ORG_ID="${ORG_ID:-default}"
BASE_URL="${BASE_URL:-http://o2-openobserve-standalone.openobserve.svc.cluster.local:5080}"
STREAM_NAME="${STREAM_NAME:-default}"
TELEGRAM_CHAT_ID="${TELEGRAM_CHAT_ID:-}"
AUTH="$(printf '%s:%s' "$ZO_ROOT_USER_EMAIL" "$ZO_ROOT_USER_PASSWORD" | base64 | tr -d '\n')"
auth_hdr="Authorization: Basic $AUTH"
api() {
curl -sfS -H "$auth_hdr" -H "Content-Type: application/json" "$@"
}
ensure_alert() {
alert_name="$1"
sql="$2"
period_minutes="$3"
frequency_minutes="$4"
silence_minutes="$5"
row_template="$6"
existing_id="$(
api "$BASE_URL/api/v2/$ORG_ID/alerts" \
| jq -r --arg n "$alert_name" '.list[] | select(.name == $n) | .alert_id' \
| head -n 1
)"
payload="$(jq -n \
--arg name "$alert_name" \
--arg stream "$STREAM_NAME" \
--arg sql "$sql" \
--argjson period "$period_minutes" \
--argjson frequency "$frequency_minutes" \
--argjson silence "$silence_minutes" \
--arg row_template "$row_template" \
'{
name: $name,
stream_type: "logs",
stream_name: $stream,
is_real_time: false,
enabled: true,
tz_offset: 330,
destinations: ["nxtgauge_telegram"],
row_template: $row_template,
row_template_type: "String",
query_condition: { type: "sql", sql: $sql },
trigger_condition: {
period: $period,
operator: ">=",
threshold: 1,
frequency: $frequency,
frequency_type: "minutes",
silence: $silence
}
}')"
if [ -n "$existing_id" ] && [ "$existing_id" != "null" ]; then
api -X PUT "$BASE_URL/api/v2/$ORG_ID/alerts/$existing_id" -d "$payload" >/dev/null
echo "updated alert=$alert_name"
else
api -X POST "$BASE_URL/api/v2/$ORG_ID/alerts" -d "$payload" >/dev/null
echo "created alert=$alert_name"
fi
}
# API Health
ensure_alert \
"api-health-failures" \
"SELECT service, endpoint, status_code, COUNT(*) as count FROM \"default\" WHERE service ILIKE '%api%' AND (status_code >= 500 OR status_code = 0) GROUP BY service, endpoint, status_code ORDER BY count DESC LIMIT 50" \
5 1 15 \
"{service}/{endpoint} status={status_code} count={count}"
# Database Health
ensure_alert \
"database-connection-failures" \
"SELECT k8s_namespace_name, k8s_pod_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE body ILIKE '%connection refused%' OR body ILIKE '%database%' OR body ILIKE '%postgres%' OR body ILIKE '%sqlx%' ORDER BY _timestamp DESC LIMIT 50" \
5 1 15 \
"{k8s_namespace_name}/{k8s_pod_name}: {msg}"
# Redis Health
ensure_alert \
"redis-connection-failures" \
"SELECT k8s_namespace_name, k8s_pod_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE body ILIKE '%redis%' OR body ILIKE '%cache%' OR body ILIKE '%connection timeout%' ORDER BY _timestamp DESC LIMIT 50" \
5 1 15 \
"{k8s_namespace_name}/{k8s_pod_name}: {msg}"
# Pod Failures
ensure_alert \
"pod-failures" \
"SELECT k8s_namespace_name, k8s_pod_name, body_object_reason, body_object_message FROM \"default\" WHERE body_object_reason IN ('Failed', 'Evicted', 'NodeAffinity', 'UnexpectedAdmissionError') ORDER BY _timestamp DESC LIMIT 50" \
5 1 15 \
"{k8s_namespace_name}/{k8s_pod_name} {body_object_reason}: {body_object_message}"
# CPU High
ensure_alert \
"cpu-high-usage" \
"SELECT k8s_namespace_name, k8s_pod_name, k8s_container_name, AVG(cpu_usage_cores) as avg_cpu FROM \"default\" WHERE cpu_usage_cores > 0.8 GROUP BY k8s_namespace_name, k8s_pod_name, k8s_container_name ORDER BY avg_cpu DESC LIMIT 50" \
10 2 30 \
"{k8s_namespace_name}/{k8s_pod_name}/{k8s_container_name} CPU={avg_cpu}"
# Memory High
ensure_alert \
"memory-high-usage" \
"SELECT k8s_namespace_name, k8s_pod_name, k8s_container_name, AVG(memory_usage_bytes) as avg_mem FROM \"default\" WHERE memory_usage_bytes > 1073741824 GROUP BY k8s_namespace_name, k8s_pod_name, k8s_container_name ORDER BY avg_mem DESC LIMIT 50" \
10 2 30 \
"{k8s_namespace_name}/{k8s_pod_name}/{k8s_container_name} MEM={avg_mem}"
# Disk Full
ensure_alert \
"disk-full-warning" \
"SELECT k8s_node_name, k8s_namespace_name, k8s_pod_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE body ILIKE '%disk full%' OR body ILIKE '%no space left%' OR body ILIKE '%DiskPressure%' ORDER BY _timestamp DESC LIMIT 50" \
10 2 60 \
"{k8s_node_name}/{k8s_namespace_name}/{k8s_pod_name}: {msg}"
# Longhorn Health
ensure_alert \
"longhorn-volume-errors" \
"SELECT k8s_namespace_name, k8s_pod_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'longhorn-system' AND (body ILIKE '%error%' OR body ILIKE '%degraded%' OR body ILIKE '%faulted%') ORDER BY _timestamp DESC LIMIT 50" \
10 2 30 \
"longhorn/{k8s_pod_name}: {msg}"
# ArgoCD Health
ensure_alert \
"argocd-app-sync-failures" \
"SELECT k8s_pod_name, k8s_container_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'argocd' AND (body ILIKE '%SyncFailed%' OR body ILIKE '%ComparisonError%' OR body ILIKE '%ResourceQuota%') ORDER BY _timestamp DESC LIMIT 50" \
10 2 30 \
"argocd/{k8s_pod_name}: {msg}"
# Registry Health
ensure_alert \
"registry-push-failures" \
"SELECT k8s_pod_name, k8s_container_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'registry' AND (body ILIKE '%413%' OR body ILIKE '%payload too large%' OR body ILIKE '%unauthorized%') ORDER BY _timestamp DESC LIMIT 50" \
10 2 60 \
"registry/{k8s_pod_name}: {msg}"

View file

@ -154,10 +154,10 @@ data:
"argocd/{k8s_pod_name} {k8s_container_name}: {msg}" "argocd/{k8s_pod_name} {k8s_container_name}: {msg}"
ensure_alert \ ensure_alert \
"woodpecker-errors" \ "gitea-runner-errors" \
"SELECT k8s_pod_name, k8s_container_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'woodpecker' AND (body ILIKE '%error%' OR body ILIKE '%ERROR%' OR body ILIKE '%failed%') ORDER BY _timestamp DESC LIMIT 50" \ "SELECT k8s_pod_name, k8s_container_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'gitea' AND (body ILIKE '%error%' OR body ILIKE '%ERROR%' OR body ILIKE '%failed%' OR body ILIKE '%job failed%') ORDER BY _timestamp DESC LIMIT 50" \
10 2 30 \ 10 2 30 \
"woodpecker/{k8s_pod_name} {k8s_container_name}: {msg}" "gitea/{k8s_pod_name} {k8s_container_name}: {msg}"
ensure_alert \ ensure_alert \
"registry-errors" \ "registry-errors" \

View file

@ -1,35 +0,0 @@
# Woodpecker: allow pulling from private registry
Woodpecker pipelines run as Kubernetes pods in the `woodpecker` namespace. If pipeline step images use `registry.nxtgauge.com/...` (private, Basic auth), kubelet needs an `imagePullSecret`.
This is required for base images (example `registry.nxtgauge.com/rust:alpine`) and also for any mirrored plugin images (example `registry.nxtgauge.com/kaniko:2.1.1`).
## Required secret
Create this once:
```bash
kubectl -n woodpecker create secret docker-registry registry-nxtgauge-pull \
--docker-server=registry.nxtgauge.com \
--docker-username="<REGISTRY_USERNAME>" \
--docker-password="<REGISTRY_PASSWORD>" \
--docker-email="ci@nxtgauge.com"
```
## Mirroring common plugin images (optional)
If your pipelines reference plugin images from the internal registry (example `registry.nxtgauge.com/kaniko:2.1.1`) make sure those images exist in the registry.
Example mirror from Docker Hub to internal:
```bash
docker pull woodpeckerci/plugin-kaniko:2.1.1
docker tag woodpeckerci/plugin-kaniko:2.1.1 registry.nxtgauge.com/kaniko:2.1.1
docker push registry.nxtgauge.com/kaniko:2.1.1
```
## What this kustomize applies
It patches/ensures the `default` ServiceAccount in `woodpecker` includes:
- `imagePullSecrets: [registry-nxtgauge-pull]`

View file

@ -1,8 +0,0 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: default
namespace: woodpecker
imagePullSecrets:
- name: registry-nxtgauge-pull