diff --git a/OTP_ISSUE_FIX_PROMPT.md b/OTP_ISSUE_FIX_PROMPT.md new file mode 100644 index 0000000..91ef71b --- /dev/null +++ b/OTP_ISSUE_FIX_PROMPT.md @@ -0,0 +1,101 @@ +# OTP Issue Fix for NXTGAUGE Signup Flow + +## Problem Statement +Users get "unable to create account" error when trying to sign up in the frontend-solid application. The OTP (One-Time Password) verification functionality during signup is broken. + +## Root Cause Analysis +The OTP fixes were implemented but got overwritten by subsequent commits and finally all services were switched to `high-performance-latest` tag which doesn't include the OTP functionality. + +## Historical Context + +### April 16, 17:30 - Initial OTP Fixes (Working) +- Frontend commit: `152f918` - Fixed resend-otp API endpoint path +- Backend users commit: `31d4570` - Updated email footer +- These fixes made OTP work correctly + +### April 16, 18:06 - v1 API + Legacy OTP Support (Enhanced) +- Gateway commit: `d084491` - Added /api/v1/users routing + legacy resend-otp endpoint for backward compatibility +- Backend users commit: `d084491` - Updated to support v1 API +- Enhanced OTP support with backward compatibility + +### April 16, 21:33 - Infrastructure Override (Broke OTP) +- Frontend: `152f918` → `d26f0bf` (lost OTP fix) +- Backend users: `d084491` → `9444056` (lost v1 API/OTP support) +- These crane mirror builds overwrote the OTP fixes + +### April 17, 05:25 - Current State (Still Broken) +- All services switched to `high-performance-latest` tag +- Frontend: `high-performance-latest` (missing OTP fix from `152f918`) +- Gateway: `high-performance-latest` (missing legacy OTP support from `d084491`) +- Backend users: `high-performance-latest` (missing v1 API/OTP from `d084491`) + +## Current GitOps Configuration + +### Backend Kustomization (apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml) +```yaml +images: + - name: registry.nxtgauge.com/nxtgauge-rust-gateway + newTag: high-performance-latest + - name: registry.nxtgauge.com/nxtgauge-rust-users + newTag: high-performance-latest + - name: registry.nxtgauge-frontend-solid + newTag: high-performance-latest +``` + +### Frontend Kustomization (apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml) +```yaml +images: + - name: registry.nxtgauge.com/nxtgauge-frontend-solid + newTag: high-performance-latest +``` + +## Required Fix + +### Option 1: Revert to Known Working Commits (Recommended) +Update the kustomization files to use the specific commits that included the OTP fixes: + +1. Frontend: Change back to `152f918` (contains the OTP endpoint fix) +2. Gateway: Change back to `d084491` (contains legacy OTP support) +3. Backend users: Change back to `d084491` (contains v1 API + OTP support) + +### Option 2: Fix high-performance-latest Branch +If there's a `high-performance-latest` branch in the respective repositories, ensure the OTP fixes from commits `152f918` and `d084491` are merged/rebased into it. + +## Files to Modify + +1. `apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml` +2. `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml` + +## Expected Behavior After Fix + +1. User enters email during signup +2. Frontend calls OTP generation endpoint +3. Backend generates and sends OTP via email +4. User enters received OTP +5. Frontend calls OTP verification endpoint +6. Backend verifies OTP and creates account +7. User successfully signs up without "unable to create account" error + +## Verification Steps + +After applying the fix: +1. Trigger ArgoCD sync for both applications +2. Wait for pods to restart with new images +3. Test signup flow: enter email → receive OTP → verify OTP → account created +4. Check logs if signup still fails +5. Verify OTP resend functionality works + +## Additional Context + +### SMTP Configuration (from secret.yaml) +- SMTP_HOST: "smtp.zeptomail.in" +- SMTP_PORT: "587" +- SMTP_FROM_EMAIL: "support@nxtgauge.com" +- SMTP_SECURE: "false" + +### Gateway Configuration +- Gateway URL: "http://nxtgauge-rust-gateway:9100" +- API URL: "http://nxtgauge-rust-gateway:9100/api" +- Users Service URL: "http://nxtgauge-rust-users:9101" + +Please analyze the codebase, identify the exact OTP endpoints that need to work, and provide the necessary fixes to restore the signup functionality. \ No newline at end of file diff --git a/ROUTE_ISSUE_ANALYSIS.md b/ROUTE_ISSUE_ANALYSIS.md new file mode 100644 index 0000000..f130f3f --- /dev/null +++ b/ROUTE_ISSUE_ANALYSIS.md @@ -0,0 +1,193 @@ +# Route Issue Analysis for NXTGAUGE Frontend-Solid Signup + +## Current Status: ❌ NOT FIXED + +The route issues from the frontend-solid signup pages are **still not resolved**. Users experience "unable to create account" errors during signup due to API endpoint path mismatches. + +## Route Issue Timeline + +### April 16, 17:30 - Route Issue Fixed ✅ +**Commit:** `555b4dc` +- **Frontend commit:** `152f918` - Fixed resend-otp API endpoint path +- **Backend users commit:** `31d4570` - Updated email footer +- **Impact:** Corrected the API endpoint that frontend was calling for OTP +- **Status:** Working correctly + +### April 16, 18:06 - Enhanced Route Support ✅ +**Commit:** `696dfb5` +- **Gateway commit:** `d084491` - Added `/api/v1/users` routing to gateway and users service +- **Backend users commit:** `d084491` - Updated to support v1 API +- **Features:** + - Added `/api/v1/users` routing + - Supported legacy resend-otp endpoint for backward compatibility +- **Impact:** Provided dual endpoint support to handle both old and new API paths +- **Status:** Enhanced with backward compatibility + +### April 16, 19:34 - Route Fix Broken ❌ +**Commit:** `7ef7df4` +- **Frontend:** `152f918` → `2d7117a` (lost route fix) +- **Admin:** Updated to `a13dce5` +- **AI:** Updated to `320e683` +- **Reason:** Switched to internal registry to avoid Docker Hub rate limits +- **Impact:** The correct resend-otp endpoint path was overwritten +- **Status:** Route functionality broken + +### April 16, 21:33 - Route Fix Still Broken ❌ +**Commit:** `39e69a3` +- **Frontend:** `2d7117a` → `d26f0bf` (still no route fix) +- **Backend users:** `d084491` → `9444056` (lost v1 API routing + legacy OTP support) +- **Gateway:** `d084491` → `9444056` (lost legacy OTP endpoint support) +- **Reason:** Crane mirror builds overwrote the route fixes +- **Impact:** Lost both v1 API routing and legacy OTP endpoint support +- **Status:** Route functionality still broken + +### April 17, 05:25 - Current State: Route Issues Persist ❌ +**Commit:** `75acea1` +- **All services:** Switched to `high-performance-latest` tag +- **Frontend:** `high-performance-latest` (missing route fix from `152f918`) +- **Gateway:** `high-performance-latest` (missing legacy OTP support from `d084491`) +- **Backend users:** `high-performance-latest` (missing v1 API/OTP from `d084491`) +- **Reason:** Registry infrastructure changes +- **Impact:** Route fixes not included in high-performance-latest builds +- **Status:** Route issues persist + +## Current Route Issues + +### 1. Frontend Route Mismatch ❌ +- **Problem:** Frontend calling incorrect OTP endpoint path +- **Missing:** Fix from commit `152f918` +- **Impact:** OTP generation/verification fails during signup +- **User Experience:** "unable to create account" error + +### 2. Gateway Route Support Missing ❌ +- **Problem:** Gateway missing legacy resend-otp endpoint support +- **Missing:** Fix from commit `d084491` +- **Impact:** Backward compatibility broken for OTP endpoints +- **User Experience:** OTP resend functionality fails + +### 3. Backend API Routing Missing ❌ +- **Problem:** Backend missing `/api/v1/users` routing +- **Missing:** Fix from commit `d084491` +- **Impact:** v1 API endpoints not accessible +- **User Experience:** Signup and user management functions fail + +## Current GitOps Configuration + +### Backend Kustomization +**File:** `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml` +```yaml +images: + - name: registry.nxtgauge.com/nxtgauge-rust-gateway + newTag: high-performance-latest # ❌ Missing d084491 + - name: registry.nxtgauge.com/nxtgauge-rust-users + newTag: high-performance-latest # ❌ Missing d084491 +``` + +### Frontend Kustomization +**File:** `apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml` +```yaml +images: + - name: registry.nxtgauge.com/nxtgauge-frontend-solid + newTag: high-performance-latest # ❌ Missing 152f918 +``` + +## Verification Status + +### Confirmation of Route Issues ❌ +The route issues are confirmed **NOT FIXED** because: + +1. **Missing Critical Commits:** + - Frontend fix `152f918` not deployed + - Gateway/backend fix `d084491` not deployed + +2. **Current Deployments:** + - All services use `high-performance-latest` tag + - Route fixes not included in current builds + +3. **User Experience:** + - "unable to create account" error during signup + - Consistent with route/path mismatches + - OTP verification fails + +4. **No Route References in GitOps:** + - No OTP route configurations found in current gitops + - Route fixes were overwritten by infrastructure changes + +## Required Fix + +### Immediate Action: Revert to Working Commits +Update the kustomization files to use the specific commits that included the route fixes: + +1. **Frontend:** Change to `152f918` + - Contains correct OTP endpoint path + - File: `apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml` + +2. **Gateway:** Change to `d084491` + - Contains legacy OTP endpoint support + - File: `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml` + +3. **Backend users:** Change to `d084491` + - Contains v1 API routing + - File: `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml` + +### Alternative: Fix high-performance-latest Branch +Ensure the route fixes from commits `152f918` and `d084491` are merged into the `high-performance-latest` branch in respective repositories. + +## Expected Behavior After Fix + +1. User enters email during signup +2. Frontend calls correct OTP endpoint: `/api/v1/users/resend-otp` +3. Gateway routes request to users service with proper path mapping +4. Backend generates and sends OTP via email +5. User enters received OTP +6. Frontend calls OTP verification endpoint +7. Backend verifies OTP and creates account +8. User successfully signs up without "unable to create account" error + +## Implementation Steps + +1. **Update GitOps Configuration:** + - Modify `apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml` + - Modify `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml` + +2. **Commit and Push Changes:** + - Create commit with updated image tags + - Push to main branch + +3. **Trigger ArgoCD Sync:** + - Sync `nxtgauge-frontend-solid` application + - Sync `nxtgauge-backend-rust` application + +4. **Verify Deployment:** + - Wait for pods to restart with new images + - Check pod status and logs + +5. **Test Signup Flow:** + - Test complete signup: email → OTP → verification → account creation + - Test OTP resend functionality + - Verify no "unable to create account" errors + +## Related Issues + +- **OTP Issue:** Closely related to route issues - see `OTP_ISSUE_FIX_PROMPT.md` +- **Email Configuration:** SMTP settings are correct in `apps/nxtgauge-backend-rust/base/secret.yaml` +- **Gateway Configuration:** Gateway service properly configured in `apps/nxtgauge-backend-rust/base/gateway-service.yaml` + +## Configuration Context + +### Gateway Configuration +- **Gateway URL:** `http://nxtgauge-rust-gateway:9100` +- **API URL:** `http://nxtgauge-rust-gateway:9100/api` +- **Users Service URL:** `http://nxtgauge-rust-users:9101` + +### SMTP Configuration +- **SMTP_HOST:** `smtp.zeptomail.in` +- **SMTP_PORT:** `587` +- **SMTP_FROM_EMAIL:** `support@nxtgauge.com` +- **SMTP_SECURE:** `false` + +## Conclusion + +The route issues from the frontend-solid signup pages are **confirmed NOT FIXED**. The specific commits that contained the route corrections (`152f918` and `d084491`) are not currently deployed, and all services are using `high-performance-latest` which doesn't include these critical route fixes. + +**Action Required:** Revert to the working commits to restore proper route functionality and fix the signup flow. \ No newline at end of file diff --git a/apps/ai-guard/base/deployment.yaml b/apps/ai-guard/base/deployment.yaml new file mode 100644 index 0000000..e86f27b --- /dev/null +++ b/apps/ai-guard/base/deployment.yaml @@ -0,0 +1,61 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: ai-guard + namespace: nxtgauge-ai + labels: + app: ai-guard +spec: + replicas: 1 + selector: + matchLabels: + app: ai-guard + template: + metadata: + labels: + app: ai-guard + spec: + containers: + - name: ai-guard + image: registry.nxtgauge.com/ai-guard:latest + imagePullPolicy: Always + ports: + - containerPort: 8080 + name: http + env: + - name: RUST_LOG + value: "info" + - name: PORT + value: "8080" + - name: OLLAMA_BASE_URL + value: "http://ollama.nxtgauge-ai.svc.cluster.local:11434" + - name: OLLAMA_CHAT_MODEL + value: "gemma3:270m" + - name: LLM_GUARD_URL + value: "http://llm-guard.nxtgauge-ai.svc.cluster.local:8000" + - name: PRESIDIO_URL + value: "http://presidio.nxtgauge-ai.svc.cluster.local:3000" + - name: AI_SERVICE_KEY + valueFrom: + secretKeyRef: + name: ai-guard-secrets + key: ai-service-key + resources: + requests: + cpu: 100m + memory: 128Mi + limits: + cpu: 1000m + memory: 512Mi + livenessProbe: + httpGet: + path: /health + port: 8080 + initialDelaySeconds: 10 + periodSeconds: 20 + readinessProbe: + httpGet: + path: /health + port: 8080 + initialDelaySeconds: 5 + periodSeconds: 10 diff --git a/ops/woodpecker-registry-pull/kustomization.yaml b/apps/ai-guard/base/kustomization.yaml similarity index 51% rename from ops/woodpecker-registry-pull/kustomization.yaml rename to apps/ai-guard/base/kustomization.yaml index b53baee..52241e5 100644 --- a/ops/woodpecker-registry-pull/kustomization.yaml +++ b/apps/ai-guard/base/kustomization.yaml @@ -1,6 +1,7 @@ apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization - resources: - - serviceaccount-default.yaml - + - namespace.yaml + - secret.yaml + - deployment.yaml + - service.yaml diff --git a/apps/ai-guard/base/namespace.yaml b/apps/ai-guard/base/namespace.yaml new file mode 100644 index 0000000..f1f5f6e --- /dev/null +++ b/apps/ai-guard/base/namespace.yaml @@ -0,0 +1,4 @@ +apiVersion: v1 +kind: Namespace +metadata: + name: nxtgauge-ai diff --git a/apps/ai-guard/base/secret.yaml b/apps/ai-guard/base/secret.yaml new file mode 100644 index 0000000..59c0402 --- /dev/null +++ b/apps/ai-guard/base/secret.yaml @@ -0,0 +1,8 @@ +apiVersion: v1 +kind: Secret +metadata: + name: ai-guard-secrets + namespace: nxtgauge-ai +type: Opaque +stringData: + ai-service-key: "" diff --git a/apps/ai-guard/base/service.yaml b/apps/ai-guard/base/service.yaml new file mode 100644 index 0000000..9a82334 --- /dev/null +++ b/apps/ai-guard/base/service.yaml @@ -0,0 +1,16 @@ +apiVersion: v1 +kind: Service +metadata: + name: ai-guard + namespace: nxtgauge-ai + labels: + app: ai-guard +spec: + type: ClusterIP + selector: + app: ai-guard + ports: + - name: http + port: 8080 + targetPort: 8080 + protocol: TCP diff --git a/apps/ai-guard/overlays/prod/kustomization.yaml b/apps/ai-guard/overlays/prod/kustomization.yaml new file mode 100644 index 0000000..6ad21cf --- /dev/null +++ b/apps/ai-guard/overlays/prod/kustomization.yaml @@ -0,0 +1,7 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +resources: + - ../../base +images: + - name: registry.nxtgauge.com/ai-guard + newTag: latest diff --git a/apps/nxtgauge-backend-rust/base/catering-services-deployment.yaml b/apps/nxtgauge-backend-rust/base/catering-services-deployment.yaml index 615191a..dc53440 100644 --- a/apps/nxtgauge-backend-rust/base/catering-services-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/catering-services-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9115" readinessProbe: httpGet: path: /health diff --git a/apps/nxtgauge-backend-rust/base/companies-deployment.yaml b/apps/nxtgauge-backend-rust/base/companies-deployment.yaml index 47ce2ce..42de9e9 100644 --- a/apps/nxtgauge-backend-rust/base/companies-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/companies-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9102" readinessProbe: httpGet: path: /health diff --git a/apps/nxtgauge-backend-rust/base/customers-deployment.yaml b/apps/nxtgauge-backend-rust/base/customers-deployment.yaml index f181b85..d71562e 100644 --- a/apps/nxtgauge-backend-rust/base/customers-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/customers-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9105" readinessProbe: httpGet: path: /health diff --git a/apps/nxtgauge-backend-rust/base/developers-deployment.yaml b/apps/nxtgauge-backend-rust/base/developers-deployment.yaml index bf6f871..6ffa19b 100644 --- a/apps/nxtgauge-backend-rust/base/developers-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/developers-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9110" readinessProbe: httpGet: path: /health diff --git a/apps/nxtgauge-backend-rust/base/employees-deployment.yaml b/apps/nxtgauge-backend-rust/base/employees-deployment.yaml index 17b5fa2..5d397fd 100644 --- a/apps/nxtgauge-backend-rust/base/employees-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/employees-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9106" readinessProbe: httpGet: path: /health diff --git a/apps/nxtgauge-backend-rust/base/fitness-trainers-deployment.yaml b/apps/nxtgauge-backend-rust/base/fitness-trainers-deployment.yaml index fe31d0d..74c7ab7 100644 --- a/apps/nxtgauge-backend-rust/base/fitness-trainers-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/fitness-trainers-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9114" readinessProbe: httpGet: path: /health diff --git a/apps/nxtgauge-backend-rust/base/graphic-designers-deployment.yaml b/apps/nxtgauge-backend-rust/base/graphic-designers-deployment.yaml index 906c405..e401b5c 100644 --- a/apps/nxtgauge-backend-rust/base/graphic-designers-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/graphic-designers-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9112" readinessProbe: httpGet: path: /health diff --git a/apps/nxtgauge-backend-rust/base/job-seekers-deployment.yaml b/apps/nxtgauge-backend-rust/base/job-seekers-deployment.yaml index 6e036a6..4ccccf9 100644 --- a/apps/nxtgauge-backend-rust/base/job-seekers-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/job-seekers-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9104" readinessProbe: httpGet: path: /health diff --git a/apps/nxtgauge-backend-rust/base/makeup-artists-deployment.yaml b/apps/nxtgauge-backend-rust/base/makeup-artists-deployment.yaml index b0c9951..c167565 100644 --- a/apps/nxtgauge-backend-rust/base/makeup-artists-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/makeup-artists-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9109" readinessProbe: httpGet: path: /health diff --git a/apps/nxtgauge-backend-rust/base/payments-deployment.yaml b/apps/nxtgauge-backend-rust/base/payments-deployment.yaml index 1d1c241..b50fd20 100644 --- a/apps/nxtgauge-backend-rust/base/payments-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/payments-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9116" readinessProbe: tcpSocket: port: 9116 diff --git a/apps/nxtgauge-backend-rust/base/photographers-deployment.yaml b/apps/nxtgauge-backend-rust/base/photographers-deployment.yaml index 187d0b1..1ad10ee 100644 --- a/apps/nxtgauge-backend-rust/base/photographers-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/photographers-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9107" readinessProbe: httpGet: path: /health diff --git a/apps/nxtgauge-backend-rust/base/social-media-managers-deployment.yaml b/apps/nxtgauge-backend-rust/base/social-media-managers-deployment.yaml index e820c71..b7c95dd 100644 --- a/apps/nxtgauge-backend-rust/base/social-media-managers-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/social-media-managers-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9113" readinessProbe: httpGet: path: /health diff --git a/apps/nxtgauge-backend-rust/base/tutors-deployment.yaml b/apps/nxtgauge-backend-rust/base/tutors-deployment.yaml index 3fe7974..00d4875 100644 --- a/apps/nxtgauge-backend-rust/base/tutors-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/tutors-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9108" readinessProbe: httpGet: path: /health diff --git a/apps/nxtgauge-backend-rust/base/ugc-content-creators-deployment.yaml b/apps/nxtgauge-backend-rust/base/ugc-content-creators-deployment.yaml index 2fd3ed0..48c78e6 100644 --- a/apps/nxtgauge-backend-rust/base/ugc-content-creators-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/ugc-content-creators-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9117" readinessProbe: httpGet: path: /health diff --git a/apps/nxtgauge-backend-rust/base/video-editors-deployment.yaml b/apps/nxtgauge-backend-rust/base/video-editors-deployment.yaml index 67ba728..4e42f6a 100644 --- a/apps/nxtgauge-backend-rust/base/video-editors-deployment.yaml +++ b/apps/nxtgauge-backend-rust/base/video-editors-deployment.yaml @@ -28,6 +28,9 @@ spec: name: nxtgauge-backend-rust-config - secretRef: name: nxtgauge-backend-rust-secrets + env: + - name: PORT + value: "9111" readinessProbe: httpGet: path: /health diff --git a/argocd/woodpecker-registry-pull-application.yaml b/argocd/ai-guard-application.yaml similarity index 54% rename from argocd/woodpecker-registry-pull-application.yaml rename to argocd/ai-guard-application.yaml index bcc3ad1..a0b8a2b 100644 --- a/argocd/woodpecker-registry-pull-application.yaml +++ b/argocd/ai-guard-application.yaml @@ -1,21 +1,24 @@ apiVersion: argoproj.io/v1alpha1 kind: Application metadata: - name: woodpecker-registry-pull + name: ai-guard namespace: argocd + annotations: + argocd-image-updater.argoproj.io/image-list: guard=registry.nxtgauge.com/ai-guard:latest + argocd-image-updater.argoproj.io/guard.update-strategy: digest + argocd-image-updater.argoproj.io/write-back-method: argocd spec: - destination: - namespace: woodpecker - server: https://kubernetes.default.svc project: default source: - path: ops/woodpecker-registry-pull repoURL: https://github.com/Traceworks2023/nxtgauge-gitops.git targetRevision: main + path: apps/ai-guard/overlays/prod + destination: + server: https://kubernetes.default.svc + namespace: nxtgauge-ai syncPolicy: automated: prune: true selfHeal: true syncOptions: - CreateNamespace=true - diff --git a/ops/openobserve-alerts/configmap-additional.yaml b/ops/openobserve-alerts/configmap-additional.yaml new file mode 100644 index 0000000..9875015 --- /dev/null +++ b/ops/openobserve-alerts/configmap-additional.yaml @@ -0,0 +1,143 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: openobserve-alerts-additional + namespace: openobserve +data: + additional-alerts.sh: | + #!/usr/bin/env sh + set -eu + + ORG_ID="${ORG_ID:-default}" + BASE_URL="${BASE_URL:-http://o2-openobserve-standalone.openobserve.svc.cluster.local:5080}" + STREAM_NAME="${STREAM_NAME:-default}" + TELEGRAM_CHAT_ID="${TELEGRAM_CHAT_ID:-}" + + AUTH="$(printf '%s:%s' "$ZO_ROOT_USER_EMAIL" "$ZO_ROOT_USER_PASSWORD" | base64 | tr -d '\n')" + auth_hdr="Authorization: Basic $AUTH" + + api() { + curl -sfS -H "$auth_hdr" -H "Content-Type: application/json" "$@" + } + + ensure_alert() { + alert_name="$1" + sql="$2" + period_minutes="$3" + frequency_minutes="$4" + silence_minutes="$5" + row_template="$6" + + existing_id="$( + api "$BASE_URL/api/v2/$ORG_ID/alerts" \ + | jq -r --arg n "$alert_name" '.list[] | select(.name == $n) | .alert_id' \ + | head -n 1 + )" + + payload="$(jq -n \ + --arg name "$alert_name" \ + --arg stream "$STREAM_NAME" \ + --arg sql "$sql" \ + --argjson period "$period_minutes" \ + --argjson frequency "$frequency_minutes" \ + --argjson silence "$silence_minutes" \ + --arg row_template "$row_template" \ + '{ + name: $name, + stream_type: "logs", + stream_name: $stream, + is_real_time: false, + enabled: true, + tz_offset: 330, + destinations: ["nxtgauge_telegram"], + row_template: $row_template, + row_template_type: "String", + query_condition: { type: "sql", sql: $sql }, + trigger_condition: { + period: $period, + operator: ">=", + threshold: 1, + frequency: $frequency, + frequency_type: "minutes", + silence: $silence + } + }')" + + if [ -n "$existing_id" ] && [ "$existing_id" != "null" ]; then + api -X PUT "$BASE_URL/api/v2/$ORG_ID/alerts/$existing_id" -d "$payload" >/dev/null + echo "updated alert=$alert_name" + else + api -X POST "$BASE_URL/api/v2/$ORG_ID/alerts" -d "$payload" >/dev/null + echo "created alert=$alert_name" + fi + } + + # API Health + ensure_alert \ + "api-health-failures" \ + "SELECT service, endpoint, status_code, COUNT(*) as count FROM \"default\" WHERE service ILIKE '%api%' AND (status_code >= 500 OR status_code = 0) GROUP BY service, endpoint, status_code ORDER BY count DESC LIMIT 50" \ + 5 1 15 \ + "{service}/{endpoint} status={status_code} count={count}" + + # Database Health + ensure_alert \ + "database-connection-failures" \ + "SELECT k8s_namespace_name, k8s_pod_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE body ILIKE '%connection refused%' OR body ILIKE '%database%' OR body ILIKE '%postgres%' OR body ILIKE '%sqlx%' ORDER BY _timestamp DESC LIMIT 50" \ + 5 1 15 \ + "{k8s_namespace_name}/{k8s_pod_name}: {msg}" + + # Redis Health + ensure_alert \ + "redis-connection-failures" \ + "SELECT k8s_namespace_name, k8s_pod_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE body ILIKE '%redis%' OR body ILIKE '%cache%' OR body ILIKE '%connection timeout%' ORDER BY _timestamp DESC LIMIT 50" \ + 5 1 15 \ + "{k8s_namespace_name}/{k8s_pod_name}: {msg}" + + # Pod Failures + ensure_alert \ + "pod-failures" \ + "SELECT k8s_namespace_name, k8s_pod_name, body_object_reason, body_object_message FROM \"default\" WHERE body_object_reason IN ('Failed', 'Evicted', 'NodeAffinity', 'UnexpectedAdmissionError') ORDER BY _timestamp DESC LIMIT 50" \ + 5 1 15 \ + "{k8s_namespace_name}/{k8s_pod_name} {body_object_reason}: {body_object_message}" + + # CPU High + ensure_alert \ + "cpu-high-usage" \ + "SELECT k8s_namespace_name, k8s_pod_name, k8s_container_name, AVG(cpu_usage_cores) as avg_cpu FROM \"default\" WHERE cpu_usage_cores > 0.8 GROUP BY k8s_namespace_name, k8s_pod_name, k8s_container_name ORDER BY avg_cpu DESC LIMIT 50" \ + 10 2 30 \ + "{k8s_namespace_name}/{k8s_pod_name}/{k8s_container_name} CPU={avg_cpu}" + + # Memory High + ensure_alert \ + "memory-high-usage" \ + "SELECT k8s_namespace_name, k8s_pod_name, k8s_container_name, AVG(memory_usage_bytes) as avg_mem FROM \"default\" WHERE memory_usage_bytes > 1073741824 GROUP BY k8s_namespace_name, k8s_pod_name, k8s_container_name ORDER BY avg_mem DESC LIMIT 50" \ + 10 2 30 \ + "{k8s_namespace_name}/{k8s_pod_name}/{k8s_container_name} MEM={avg_mem}" + + # Disk Full + ensure_alert \ + "disk-full-warning" \ + "SELECT k8s_node_name, k8s_namespace_name, k8s_pod_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE body ILIKE '%disk full%' OR body ILIKE '%no space left%' OR body ILIKE '%DiskPressure%' ORDER BY _timestamp DESC LIMIT 50" \ + 10 2 60 \ + "{k8s_node_name}/{k8s_namespace_name}/{k8s_pod_name}: {msg}" + + # Longhorn Health + ensure_alert \ + "longhorn-volume-errors" \ + "SELECT k8s_namespace_name, k8s_pod_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'longhorn-system' AND (body ILIKE '%error%' OR body ILIKE '%degraded%' OR body ILIKE '%faulted%') ORDER BY _timestamp DESC LIMIT 50" \ + 10 2 30 \ + "longhorn/{k8s_pod_name}: {msg}" + + # ArgoCD Health + ensure_alert \ + "argocd-app-sync-failures" \ + "SELECT k8s_pod_name, k8s_container_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'argocd' AND (body ILIKE '%SyncFailed%' OR body ILIKE '%ComparisonError%' OR body ILIKE '%ResourceQuota%') ORDER BY _timestamp DESC LIMIT 50" \ + 10 2 30 \ + "argocd/{k8s_pod_name}: {msg}" + + # Registry Health + ensure_alert \ + "registry-push-failures" \ + "SELECT k8s_pod_name, k8s_container_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'registry' AND (body ILIKE '%413%' OR body ILIKE '%payload too large%' OR body ILIKE '%unauthorized%') ORDER BY _timestamp DESC LIMIT 50" \ + 10 2 60 \ + "registry/{k8s_pod_name}: {msg}" \ No newline at end of file diff --git a/ops/openobserve-alerts/configmap.yaml b/ops/openobserve-alerts/configmap.yaml index ac15412..01a7042 100644 --- a/ops/openobserve-alerts/configmap.yaml +++ b/ops/openobserve-alerts/configmap.yaml @@ -154,10 +154,10 @@ data: "argocd/{k8s_pod_name} {k8s_container_name}: {msg}" ensure_alert \ - "woodpecker-errors" \ - "SELECT k8s_pod_name, k8s_container_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'woodpecker' AND (body ILIKE '%error%' OR body ILIKE '%ERROR%' OR body ILIKE '%failed%') ORDER BY _timestamp DESC LIMIT 50" \ + "gitea-runner-errors" \ + "SELECT k8s_pod_name, k8s_container_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'gitea' AND (body ILIKE '%error%' OR body ILIKE '%ERROR%' OR body ILIKE '%failed%' OR body ILIKE '%job failed%') ORDER BY _timestamp DESC LIMIT 50" \ 10 2 30 \ - "woodpecker/{k8s_pod_name} {k8s_container_name}: {msg}" + "gitea/{k8s_pod_name} {k8s_container_name}: {msg}" ensure_alert \ "registry-errors" \ diff --git a/ops/woodpecker-registry-pull/README.md b/ops/woodpecker-registry-pull/README.md deleted file mode 100644 index 5bf1808..0000000 --- a/ops/woodpecker-registry-pull/README.md +++ /dev/null @@ -1,35 +0,0 @@ -# Woodpecker: allow pulling from private registry - -Woodpecker pipelines run as Kubernetes pods in the `woodpecker` namespace. If pipeline step images use `registry.nxtgauge.com/...` (private, Basic auth), kubelet needs an `imagePullSecret`. - -This is required for base images (example `registry.nxtgauge.com/rust:alpine`) and also for any mirrored plugin images (example `registry.nxtgauge.com/kaniko:2.1.1`). - -## Required secret - -Create this once: - -```bash -kubectl -n woodpecker create secret docker-registry registry-nxtgauge-pull \ - --docker-server=registry.nxtgauge.com \ - --docker-username="" \ - --docker-password="" \ - --docker-email="ci@nxtgauge.com" -``` - -## Mirroring common plugin images (optional) - -If your pipelines reference plugin images from the internal registry (example `registry.nxtgauge.com/kaniko:2.1.1`) make sure those images exist in the registry. - -Example mirror from Docker Hub to internal: - -```bash -docker pull woodpeckerci/plugin-kaniko:2.1.1 -docker tag woodpeckerci/plugin-kaniko:2.1.1 registry.nxtgauge.com/kaniko:2.1.1 -docker push registry.nxtgauge.com/kaniko:2.1.1 -``` - -## What this kustomize applies - -It patches/ensures the `default` ServiceAccount in `woodpecker` includes: - -- `imagePullSecrets: [registry-nxtgauge-pull]` diff --git a/ops/woodpecker-registry-pull/serviceaccount-default.yaml b/ops/woodpecker-registry-pull/serviceaccount-default.yaml deleted file mode 100644 index eedc19e..0000000 --- a/ops/woodpecker-registry-pull/serviceaccount-default.yaml +++ /dev/null @@ -1,8 +0,0 @@ -apiVersion: v1 -kind: ServiceAccount -metadata: - name: default - namespace: woodpecker -imagePullSecrets: - - name: registry-nxtgauge-pull -