mirror of
https://github.com/Traceworks2023/nxtgauge-gitops.git
synced 2026-06-10 21:18:54 +00:00
fix(backend): add PORT env to all rust deployments (was crashing on boot)
16 of 20 rust services had no PORT env var set; their main.rs calls
std::env::var('PORT').expect('PORT must be a valid u16') which panicked
on startup. This commit adds env.PORT matching the existing containerPort
for each service. Service ports: gateway=9100 users=9101 companies=9102
jobs=9103 job_seekers=9104 customers=9105 employees=9106 photographers=9107
tutors=9108 makeup_artists=9109 developers=9110 video_editors=9111
graphic_designers=9112 social_media_managers=9113 fitness_trainers=9114
catering_services=9115 payments=9116 ugc_content_creators=9117 leads=9118
This commit is contained in:
parent
e27f82e996
commit
37a589fa87
28 changed files with 594 additions and 55 deletions
101
OTP_ISSUE_FIX_PROMPT.md
Normal file
101
OTP_ISSUE_FIX_PROMPT.md
Normal file
|
|
@ -0,0 +1,101 @@
|
|||
# OTP Issue Fix for NXTGAUGE Signup Flow
|
||||
|
||||
## Problem Statement
|
||||
Users get "unable to create account" error when trying to sign up in the frontend-solid application. The OTP (One-Time Password) verification functionality during signup is broken.
|
||||
|
||||
## Root Cause Analysis
|
||||
The OTP fixes were implemented but got overwritten by subsequent commits and finally all services were switched to `high-performance-latest` tag which doesn't include the OTP functionality.
|
||||
|
||||
## Historical Context
|
||||
|
||||
### April 16, 17:30 - Initial OTP Fixes (Working)
|
||||
- Frontend commit: `152f918` - Fixed resend-otp API endpoint path
|
||||
- Backend users commit: `31d4570` - Updated email footer
|
||||
- These fixes made OTP work correctly
|
||||
|
||||
### April 16, 18:06 - v1 API + Legacy OTP Support (Enhanced)
|
||||
- Gateway commit: `d084491` - Added /api/v1/users routing + legacy resend-otp endpoint for backward compatibility
|
||||
- Backend users commit: `d084491` - Updated to support v1 API
|
||||
- Enhanced OTP support with backward compatibility
|
||||
|
||||
### April 16, 21:33 - Infrastructure Override (Broke OTP)
|
||||
- Frontend: `152f918` → `d26f0bf` (lost OTP fix)
|
||||
- Backend users: `d084491` → `9444056` (lost v1 API/OTP support)
|
||||
- These crane mirror builds overwrote the OTP fixes
|
||||
|
||||
### April 17, 05:25 - Current State (Still Broken)
|
||||
- All services switched to `high-performance-latest` tag
|
||||
- Frontend: `high-performance-latest` (missing OTP fix from `152f918`)
|
||||
- Gateway: `high-performance-latest` (missing legacy OTP support from `d084491`)
|
||||
- Backend users: `high-performance-latest` (missing v1 API/OTP from `d084491`)
|
||||
|
||||
## Current GitOps Configuration
|
||||
|
||||
### Backend Kustomization (apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml)
|
||||
```yaml
|
||||
images:
|
||||
- name: registry.nxtgauge.com/nxtgauge-rust-gateway
|
||||
newTag: high-performance-latest
|
||||
- name: registry.nxtgauge.com/nxtgauge-rust-users
|
||||
newTag: high-performance-latest
|
||||
- name: registry.nxtgauge-frontend-solid
|
||||
newTag: high-performance-latest
|
||||
```
|
||||
|
||||
### Frontend Kustomization (apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml)
|
||||
```yaml
|
||||
images:
|
||||
- name: registry.nxtgauge.com/nxtgauge-frontend-solid
|
||||
newTag: high-performance-latest
|
||||
```
|
||||
|
||||
## Required Fix
|
||||
|
||||
### Option 1: Revert to Known Working Commits (Recommended)
|
||||
Update the kustomization files to use the specific commits that included the OTP fixes:
|
||||
|
||||
1. Frontend: Change back to `152f918` (contains the OTP endpoint fix)
|
||||
2. Gateway: Change back to `d084491` (contains legacy OTP support)
|
||||
3. Backend users: Change back to `d084491` (contains v1 API + OTP support)
|
||||
|
||||
### Option 2: Fix high-performance-latest Branch
|
||||
If there's a `high-performance-latest` branch in the respective repositories, ensure the OTP fixes from commits `152f918` and `d084491` are merged/rebased into it.
|
||||
|
||||
## Files to Modify
|
||||
|
||||
1. `apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml`
|
||||
2. `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml`
|
||||
|
||||
## Expected Behavior After Fix
|
||||
|
||||
1. User enters email during signup
|
||||
2. Frontend calls OTP generation endpoint
|
||||
3. Backend generates and sends OTP via email
|
||||
4. User enters received OTP
|
||||
5. Frontend calls OTP verification endpoint
|
||||
6. Backend verifies OTP and creates account
|
||||
7. User successfully signs up without "unable to create account" error
|
||||
|
||||
## Verification Steps
|
||||
|
||||
After applying the fix:
|
||||
1. Trigger ArgoCD sync for both applications
|
||||
2. Wait for pods to restart with new images
|
||||
3. Test signup flow: enter email → receive OTP → verify OTP → account created
|
||||
4. Check logs if signup still fails
|
||||
5. Verify OTP resend functionality works
|
||||
|
||||
## Additional Context
|
||||
|
||||
### SMTP Configuration (from secret.yaml)
|
||||
- SMTP_HOST: "smtp.zeptomail.in"
|
||||
- SMTP_PORT: "587"
|
||||
- SMTP_FROM_EMAIL: "support@nxtgauge.com"
|
||||
- SMTP_SECURE: "false"
|
||||
|
||||
### Gateway Configuration
|
||||
- Gateway URL: "http://nxtgauge-rust-gateway:9100"
|
||||
- API URL: "http://nxtgauge-rust-gateway:9100/api"
|
||||
- Users Service URL: "http://nxtgauge-rust-users:9101"
|
||||
|
||||
Please analyze the codebase, identify the exact OTP endpoints that need to work, and provide the necessary fixes to restore the signup functionality.
|
||||
193
ROUTE_ISSUE_ANALYSIS.md
Normal file
193
ROUTE_ISSUE_ANALYSIS.md
Normal file
|
|
@ -0,0 +1,193 @@
|
|||
# Route Issue Analysis for NXTGAUGE Frontend-Solid Signup
|
||||
|
||||
## Current Status: ❌ NOT FIXED
|
||||
|
||||
The route issues from the frontend-solid signup pages are **still not resolved**. Users experience "unable to create account" errors during signup due to API endpoint path mismatches.
|
||||
|
||||
## Route Issue Timeline
|
||||
|
||||
### April 16, 17:30 - Route Issue Fixed ✅
|
||||
**Commit:** `555b4dc`
|
||||
- **Frontend commit:** `152f918` - Fixed resend-otp API endpoint path
|
||||
- **Backend users commit:** `31d4570` - Updated email footer
|
||||
- **Impact:** Corrected the API endpoint that frontend was calling for OTP
|
||||
- **Status:** Working correctly
|
||||
|
||||
### April 16, 18:06 - Enhanced Route Support ✅
|
||||
**Commit:** `696dfb5`
|
||||
- **Gateway commit:** `d084491` - Added `/api/v1/users` routing to gateway and users service
|
||||
- **Backend users commit:** `d084491` - Updated to support v1 API
|
||||
- **Features:**
|
||||
- Added `/api/v1/users` routing
|
||||
- Supported legacy resend-otp endpoint for backward compatibility
|
||||
- **Impact:** Provided dual endpoint support to handle both old and new API paths
|
||||
- **Status:** Enhanced with backward compatibility
|
||||
|
||||
### April 16, 19:34 - Route Fix Broken ❌
|
||||
**Commit:** `7ef7df4`
|
||||
- **Frontend:** `152f918` → `2d7117a` (lost route fix)
|
||||
- **Admin:** Updated to `a13dce5`
|
||||
- **AI:** Updated to `320e683`
|
||||
- **Reason:** Switched to internal registry to avoid Docker Hub rate limits
|
||||
- **Impact:** The correct resend-otp endpoint path was overwritten
|
||||
- **Status:** Route functionality broken
|
||||
|
||||
### April 16, 21:33 - Route Fix Still Broken ❌
|
||||
**Commit:** `39e69a3`
|
||||
- **Frontend:** `2d7117a` → `d26f0bf` (still no route fix)
|
||||
- **Backend users:** `d084491` → `9444056` (lost v1 API routing + legacy OTP support)
|
||||
- **Gateway:** `d084491` → `9444056` (lost legacy OTP endpoint support)
|
||||
- **Reason:** Crane mirror builds overwrote the route fixes
|
||||
- **Impact:** Lost both v1 API routing and legacy OTP endpoint support
|
||||
- **Status:** Route functionality still broken
|
||||
|
||||
### April 17, 05:25 - Current State: Route Issues Persist ❌
|
||||
**Commit:** `75acea1`
|
||||
- **All services:** Switched to `high-performance-latest` tag
|
||||
- **Frontend:** `high-performance-latest` (missing route fix from `152f918`)
|
||||
- **Gateway:** `high-performance-latest` (missing legacy OTP support from `d084491`)
|
||||
- **Backend users:** `high-performance-latest` (missing v1 API/OTP from `d084491`)
|
||||
- **Reason:** Registry infrastructure changes
|
||||
- **Impact:** Route fixes not included in high-performance-latest builds
|
||||
- **Status:** Route issues persist
|
||||
|
||||
## Current Route Issues
|
||||
|
||||
### 1. Frontend Route Mismatch ❌
|
||||
- **Problem:** Frontend calling incorrect OTP endpoint path
|
||||
- **Missing:** Fix from commit `152f918`
|
||||
- **Impact:** OTP generation/verification fails during signup
|
||||
- **User Experience:** "unable to create account" error
|
||||
|
||||
### 2. Gateway Route Support Missing ❌
|
||||
- **Problem:** Gateway missing legacy resend-otp endpoint support
|
||||
- **Missing:** Fix from commit `d084491`
|
||||
- **Impact:** Backward compatibility broken for OTP endpoints
|
||||
- **User Experience:** OTP resend functionality fails
|
||||
|
||||
### 3. Backend API Routing Missing ❌
|
||||
- **Problem:** Backend missing `/api/v1/users` routing
|
||||
- **Missing:** Fix from commit `d084491`
|
||||
- **Impact:** v1 API endpoints not accessible
|
||||
- **User Experience:** Signup and user management functions fail
|
||||
|
||||
## Current GitOps Configuration
|
||||
|
||||
### Backend Kustomization
|
||||
**File:** `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml`
|
||||
```yaml
|
||||
images:
|
||||
- name: registry.nxtgauge.com/nxtgauge-rust-gateway
|
||||
newTag: high-performance-latest # ❌ Missing d084491
|
||||
- name: registry.nxtgauge.com/nxtgauge-rust-users
|
||||
newTag: high-performance-latest # ❌ Missing d084491
|
||||
```
|
||||
|
||||
### Frontend Kustomization
|
||||
**File:** `apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml`
|
||||
```yaml
|
||||
images:
|
||||
- name: registry.nxtgauge.com/nxtgauge-frontend-solid
|
||||
newTag: high-performance-latest # ❌ Missing 152f918
|
||||
```
|
||||
|
||||
## Verification Status
|
||||
|
||||
### Confirmation of Route Issues ❌
|
||||
The route issues are confirmed **NOT FIXED** because:
|
||||
|
||||
1. **Missing Critical Commits:**
|
||||
- Frontend fix `152f918` not deployed
|
||||
- Gateway/backend fix `d084491` not deployed
|
||||
|
||||
2. **Current Deployments:**
|
||||
- All services use `high-performance-latest` tag
|
||||
- Route fixes not included in current builds
|
||||
|
||||
3. **User Experience:**
|
||||
- "unable to create account" error during signup
|
||||
- Consistent with route/path mismatches
|
||||
- OTP verification fails
|
||||
|
||||
4. **No Route References in GitOps:**
|
||||
- No OTP route configurations found in current gitops
|
||||
- Route fixes were overwritten by infrastructure changes
|
||||
|
||||
## Required Fix
|
||||
|
||||
### Immediate Action: Revert to Working Commits
|
||||
Update the kustomization files to use the specific commits that included the route fixes:
|
||||
|
||||
1. **Frontend:** Change to `152f918`
|
||||
- Contains correct OTP endpoint path
|
||||
- File: `apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml`
|
||||
|
||||
2. **Gateway:** Change to `d084491`
|
||||
- Contains legacy OTP endpoint support
|
||||
- File: `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml`
|
||||
|
||||
3. **Backend users:** Change to `d084491`
|
||||
- Contains v1 API routing
|
||||
- File: `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml`
|
||||
|
||||
### Alternative: Fix high-performance-latest Branch
|
||||
Ensure the route fixes from commits `152f918` and `d084491` are merged into the `high-performance-latest` branch in respective repositories.
|
||||
|
||||
## Expected Behavior After Fix
|
||||
|
||||
1. User enters email during signup
|
||||
2. Frontend calls correct OTP endpoint: `/api/v1/users/resend-otp`
|
||||
3. Gateway routes request to users service with proper path mapping
|
||||
4. Backend generates and sends OTP via email
|
||||
5. User enters received OTP
|
||||
6. Frontend calls OTP verification endpoint
|
||||
7. Backend verifies OTP and creates account
|
||||
8. User successfully signs up without "unable to create account" error
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
1. **Update GitOps Configuration:**
|
||||
- Modify `apps/nxtgauge-frontend-solid/overlays/prod/kustomization.yaml`
|
||||
- Modify `apps/nxtgauge-backend-rust/overlays/prod/kustomization.yaml`
|
||||
|
||||
2. **Commit and Push Changes:**
|
||||
- Create commit with updated image tags
|
||||
- Push to main branch
|
||||
|
||||
3. **Trigger ArgoCD Sync:**
|
||||
- Sync `nxtgauge-frontend-solid` application
|
||||
- Sync `nxtgauge-backend-rust` application
|
||||
|
||||
4. **Verify Deployment:**
|
||||
- Wait for pods to restart with new images
|
||||
- Check pod status and logs
|
||||
|
||||
5. **Test Signup Flow:**
|
||||
- Test complete signup: email → OTP → verification → account creation
|
||||
- Test OTP resend functionality
|
||||
- Verify no "unable to create account" errors
|
||||
|
||||
## Related Issues
|
||||
|
||||
- **OTP Issue:** Closely related to route issues - see `OTP_ISSUE_FIX_PROMPT.md`
|
||||
- **Email Configuration:** SMTP settings are correct in `apps/nxtgauge-backend-rust/base/secret.yaml`
|
||||
- **Gateway Configuration:** Gateway service properly configured in `apps/nxtgauge-backend-rust/base/gateway-service.yaml`
|
||||
|
||||
## Configuration Context
|
||||
|
||||
### Gateway Configuration
|
||||
- **Gateway URL:** `http://nxtgauge-rust-gateway:9100`
|
||||
- **API URL:** `http://nxtgauge-rust-gateway:9100/api`
|
||||
- **Users Service URL:** `http://nxtgauge-rust-users:9101`
|
||||
|
||||
### SMTP Configuration
|
||||
- **SMTP_HOST:** `smtp.zeptomail.in`
|
||||
- **SMTP_PORT:** `587`
|
||||
- **SMTP_FROM_EMAIL:** `support@nxtgauge.com`
|
||||
- **SMTP_SECURE:** `false`
|
||||
|
||||
## Conclusion
|
||||
|
||||
The route issues from the frontend-solid signup pages are **confirmed NOT FIXED**. The specific commits that contained the route corrections (`152f918` and `d084491`) are not currently deployed, and all services are using `high-performance-latest` which doesn't include these critical route fixes.
|
||||
|
||||
**Action Required:** Revert to the working commits to restore proper route functionality and fix the signup flow.
|
||||
61
apps/ai-guard/base/deployment.yaml
Normal file
61
apps/ai-guard/base/deployment.yaml
Normal file
|
|
@ -0,0 +1,61 @@
|
|||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: ai-guard
|
||||
namespace: nxtgauge-ai
|
||||
labels:
|
||||
app: ai-guard
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: ai-guard
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: ai-guard
|
||||
spec:
|
||||
containers:
|
||||
- name: ai-guard
|
||||
image: registry.nxtgauge.com/ai-guard:latest
|
||||
imagePullPolicy: Always
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
name: http
|
||||
env:
|
||||
- name: RUST_LOG
|
||||
value: "info"
|
||||
- name: PORT
|
||||
value: "8080"
|
||||
- name: OLLAMA_BASE_URL
|
||||
value: "http://ollama.nxtgauge-ai.svc.cluster.local:11434"
|
||||
- name: OLLAMA_CHAT_MODEL
|
||||
value: "gemma3:270m"
|
||||
- name: LLM_GUARD_URL
|
||||
value: "http://llm-guard.nxtgauge-ai.svc.cluster.local:8000"
|
||||
- name: PRESIDIO_URL
|
||||
value: "http://presidio.nxtgauge-ai.svc.cluster.local:3000"
|
||||
- name: AI_SERVICE_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: ai-guard-secrets
|
||||
key: ai-service-key
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: 1000m
|
||||
memory: 512Mi
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 20
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
|
|
@ -1,6 +1,7 @@
|
|||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
|
||||
resources:
|
||||
- serviceaccount-default.yaml
|
||||
|
||||
- namespace.yaml
|
||||
- secret.yaml
|
||||
- deployment.yaml
|
||||
- service.yaml
|
||||
4
apps/ai-guard/base/namespace.yaml
Normal file
4
apps/ai-guard/base/namespace.yaml
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: nxtgauge-ai
|
||||
8
apps/ai-guard/base/secret.yaml
Normal file
8
apps/ai-guard/base/secret.yaml
Normal file
|
|
@ -0,0 +1,8 @@
|
|||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: ai-guard-secrets
|
||||
namespace: nxtgauge-ai
|
||||
type: Opaque
|
||||
stringData:
|
||||
ai-service-key: ""
|
||||
16
apps/ai-guard/base/service.yaml
Normal file
16
apps/ai-guard/base/service.yaml
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: ai-guard
|
||||
namespace: nxtgauge-ai
|
||||
labels:
|
||||
app: ai-guard
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
app: ai-guard
|
||||
ports:
|
||||
- name: http
|
||||
port: 8080
|
||||
targetPort: 8080
|
||||
protocol: TCP
|
||||
7
apps/ai-guard/overlays/prod/kustomization.yaml
Normal file
7
apps/ai-guard/overlays/prod/kustomization.yaml
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
resources:
|
||||
- ../../base
|
||||
images:
|
||||
- name: registry.nxtgauge.com/ai-guard
|
||||
newTag: latest
|
||||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9115"
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
|
|
|
|||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9102"
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
|
|
|
|||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9105"
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
|
|
|
|||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9110"
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
|
|
|
|||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9106"
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
|
|
|
|||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9114"
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
|
|
|
|||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9112"
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
|
|
|
|||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9104"
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
|
|
|
|||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9109"
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
|
|
|
|||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9116"
|
||||
readinessProbe:
|
||||
tcpSocket:
|
||||
port: 9116
|
||||
|
|
|
|||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9107"
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
|
|
|
|||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9113"
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
|
|
|
|||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9108"
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
|
|
|
|||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9117"
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
|
|
|
|||
|
|
@ -28,6 +28,9 @@ spec:
|
|||
name: nxtgauge-backend-rust-config
|
||||
- secretRef:
|
||||
name: nxtgauge-backend-rust-secrets
|
||||
env:
|
||||
- name: PORT
|
||||
value: "9111"
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
|
|
|
|||
|
|
@ -1,21 +1,24 @@
|
|||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: woodpecker-registry-pull
|
||||
name: ai-guard
|
||||
namespace: argocd
|
||||
annotations:
|
||||
argocd-image-updater.argoproj.io/image-list: guard=registry.nxtgauge.com/ai-guard:latest
|
||||
argocd-image-updater.argoproj.io/guard.update-strategy: digest
|
||||
argocd-image-updater.argoproj.io/write-back-method: argocd
|
||||
spec:
|
||||
destination:
|
||||
namespace: woodpecker
|
||||
server: https://kubernetes.default.svc
|
||||
project: default
|
||||
source:
|
||||
path: ops/woodpecker-registry-pull
|
||||
repoURL: https://github.com/Traceworks2023/nxtgauge-gitops.git
|
||||
targetRevision: main
|
||||
path: apps/ai-guard/overlays/prod
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: nxtgauge-ai
|
||||
syncPolicy:
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
syncOptions:
|
||||
- CreateNamespace=true
|
||||
|
||||
143
ops/openobserve-alerts/configmap-additional.yaml
Normal file
143
ops/openobserve-alerts/configmap-additional.yaml
Normal file
|
|
@ -0,0 +1,143 @@
|
|||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: openobserve-alerts-additional
|
||||
namespace: openobserve
|
||||
data:
|
||||
additional-alerts.sh: |
|
||||
#!/usr/bin/env sh
|
||||
set -eu
|
||||
|
||||
ORG_ID="${ORG_ID:-default}"
|
||||
BASE_URL="${BASE_URL:-http://o2-openobserve-standalone.openobserve.svc.cluster.local:5080}"
|
||||
STREAM_NAME="${STREAM_NAME:-default}"
|
||||
TELEGRAM_CHAT_ID="${TELEGRAM_CHAT_ID:-}"
|
||||
|
||||
AUTH="$(printf '%s:%s' "$ZO_ROOT_USER_EMAIL" "$ZO_ROOT_USER_PASSWORD" | base64 | tr -d '\n')"
|
||||
auth_hdr="Authorization: Basic $AUTH"
|
||||
|
||||
api() {
|
||||
curl -sfS -H "$auth_hdr" -H "Content-Type: application/json" "$@"
|
||||
}
|
||||
|
||||
ensure_alert() {
|
||||
alert_name="$1"
|
||||
sql="$2"
|
||||
period_minutes="$3"
|
||||
frequency_minutes="$4"
|
||||
silence_minutes="$5"
|
||||
row_template="$6"
|
||||
|
||||
existing_id="$(
|
||||
api "$BASE_URL/api/v2/$ORG_ID/alerts" \
|
||||
| jq -r --arg n "$alert_name" '.list[] | select(.name == $n) | .alert_id' \
|
||||
| head -n 1
|
||||
)"
|
||||
|
||||
payload="$(jq -n \
|
||||
--arg name "$alert_name" \
|
||||
--arg stream "$STREAM_NAME" \
|
||||
--arg sql "$sql" \
|
||||
--argjson period "$period_minutes" \
|
||||
--argjson frequency "$frequency_minutes" \
|
||||
--argjson silence "$silence_minutes" \
|
||||
--arg row_template "$row_template" \
|
||||
'{
|
||||
name: $name,
|
||||
stream_type: "logs",
|
||||
stream_name: $stream,
|
||||
is_real_time: false,
|
||||
enabled: true,
|
||||
tz_offset: 330,
|
||||
destinations: ["nxtgauge_telegram"],
|
||||
row_template: $row_template,
|
||||
row_template_type: "String",
|
||||
query_condition: { type: "sql", sql: $sql },
|
||||
trigger_condition: {
|
||||
period: $period,
|
||||
operator: ">=",
|
||||
threshold: 1,
|
||||
frequency: $frequency,
|
||||
frequency_type: "minutes",
|
||||
silence: $silence
|
||||
}
|
||||
}')"
|
||||
|
||||
if [ -n "$existing_id" ] && [ "$existing_id" != "null" ]; then
|
||||
api -X PUT "$BASE_URL/api/v2/$ORG_ID/alerts/$existing_id" -d "$payload" >/dev/null
|
||||
echo "updated alert=$alert_name"
|
||||
else
|
||||
api -X POST "$BASE_URL/api/v2/$ORG_ID/alerts" -d "$payload" >/dev/null
|
||||
echo "created alert=$alert_name"
|
||||
fi
|
||||
}
|
||||
|
||||
# API Health
|
||||
ensure_alert \
|
||||
"api-health-failures" \
|
||||
"SELECT service, endpoint, status_code, COUNT(*) as count FROM \"default\" WHERE service ILIKE '%api%' AND (status_code >= 500 OR status_code = 0) GROUP BY service, endpoint, status_code ORDER BY count DESC LIMIT 50" \
|
||||
5 1 15 \
|
||||
"{service}/{endpoint} status={status_code} count={count}"
|
||||
|
||||
# Database Health
|
||||
ensure_alert \
|
||||
"database-connection-failures" \
|
||||
"SELECT k8s_namespace_name, k8s_pod_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE body ILIKE '%connection refused%' OR body ILIKE '%database%' OR body ILIKE '%postgres%' OR body ILIKE '%sqlx%' ORDER BY _timestamp DESC LIMIT 50" \
|
||||
5 1 15 \
|
||||
"{k8s_namespace_name}/{k8s_pod_name}: {msg}"
|
||||
|
||||
# Redis Health
|
||||
ensure_alert \
|
||||
"redis-connection-failures" \
|
||||
"SELECT k8s_namespace_name, k8s_pod_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE body ILIKE '%redis%' OR body ILIKE '%cache%' OR body ILIKE '%connection timeout%' ORDER BY _timestamp DESC LIMIT 50" \
|
||||
5 1 15 \
|
||||
"{k8s_namespace_name}/{k8s_pod_name}: {msg}"
|
||||
|
||||
# Pod Failures
|
||||
ensure_alert \
|
||||
"pod-failures" \
|
||||
"SELECT k8s_namespace_name, k8s_pod_name, body_object_reason, body_object_message FROM \"default\" WHERE body_object_reason IN ('Failed', 'Evicted', 'NodeAffinity', 'UnexpectedAdmissionError') ORDER BY _timestamp DESC LIMIT 50" \
|
||||
5 1 15 \
|
||||
"{k8s_namespace_name}/{k8s_pod_name} {body_object_reason}: {body_object_message}"
|
||||
|
||||
# CPU High
|
||||
ensure_alert \
|
||||
"cpu-high-usage" \
|
||||
"SELECT k8s_namespace_name, k8s_pod_name, k8s_container_name, AVG(cpu_usage_cores) as avg_cpu FROM \"default\" WHERE cpu_usage_cores > 0.8 GROUP BY k8s_namespace_name, k8s_pod_name, k8s_container_name ORDER BY avg_cpu DESC LIMIT 50" \
|
||||
10 2 30 \
|
||||
"{k8s_namespace_name}/{k8s_pod_name}/{k8s_container_name} CPU={avg_cpu}"
|
||||
|
||||
# Memory High
|
||||
ensure_alert \
|
||||
"memory-high-usage" \
|
||||
"SELECT k8s_namespace_name, k8s_pod_name, k8s_container_name, AVG(memory_usage_bytes) as avg_mem FROM \"default\" WHERE memory_usage_bytes > 1073741824 GROUP BY k8s_namespace_name, k8s_pod_name, k8s_container_name ORDER BY avg_mem DESC LIMIT 50" \
|
||||
10 2 30 \
|
||||
"{k8s_namespace_name}/{k8s_pod_name}/{k8s_container_name} MEM={avg_mem}"
|
||||
|
||||
# Disk Full
|
||||
ensure_alert \
|
||||
"disk-full-warning" \
|
||||
"SELECT k8s_node_name, k8s_namespace_name, k8s_pod_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE body ILIKE '%disk full%' OR body ILIKE '%no space left%' OR body ILIKE '%DiskPressure%' ORDER BY _timestamp DESC LIMIT 50" \
|
||||
10 2 60 \
|
||||
"{k8s_node_name}/{k8s_namespace_name}/{k8s_pod_name}: {msg}"
|
||||
|
||||
# Longhorn Health
|
||||
ensure_alert \
|
||||
"longhorn-volume-errors" \
|
||||
"SELECT k8s_namespace_name, k8s_pod_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'longhorn-system' AND (body ILIKE '%error%' OR body ILIKE '%degraded%' OR body ILIKE '%faulted%') ORDER BY _timestamp DESC LIMIT 50" \
|
||||
10 2 30 \
|
||||
"longhorn/{k8s_pod_name}: {msg}"
|
||||
|
||||
# ArgoCD Health
|
||||
ensure_alert \
|
||||
"argocd-app-sync-failures" \
|
||||
"SELECT k8s_pod_name, k8s_container_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'argocd' AND (body ILIKE '%SyncFailed%' OR body ILIKE '%ComparisonError%' OR body ILIKE '%ResourceQuota%') ORDER BY _timestamp DESC LIMIT 50" \
|
||||
10 2 30 \
|
||||
"argocd/{k8s_pod_name}: {msg}"
|
||||
|
||||
# Registry Health
|
||||
ensure_alert \
|
||||
"registry-push-failures" \
|
||||
"SELECT k8s_pod_name, k8s_container_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'registry' AND (body ILIKE '%413%' OR body ILIKE '%payload too large%' OR body ILIKE '%unauthorized%') ORDER BY _timestamp DESC LIMIT 50" \
|
||||
10 2 60 \
|
||||
"registry/{k8s_pod_name}: {msg}"
|
||||
|
|
@ -154,10 +154,10 @@ data:
|
|||
"argocd/{k8s_pod_name} {k8s_container_name}: {msg}"
|
||||
|
||||
ensure_alert \
|
||||
"woodpecker-errors" \
|
||||
"SELECT k8s_pod_name, k8s_container_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'woodpecker' AND (body ILIKE '%error%' OR body ILIKE '%ERROR%' OR body ILIKE '%failed%') ORDER BY _timestamp DESC LIMIT 50" \
|
||||
"gitea-runner-errors" \
|
||||
"SELECT k8s_pod_name, k8s_container_name, substring(body, 1, 220) AS msg FROM \"default\" WHERE k8s_namespace_name = 'gitea' AND (body ILIKE '%error%' OR body ILIKE '%ERROR%' OR body ILIKE '%failed%' OR body ILIKE '%job failed%') ORDER BY _timestamp DESC LIMIT 50" \
|
||||
10 2 30 \
|
||||
"woodpecker/{k8s_pod_name} {k8s_container_name}: {msg}"
|
||||
"gitea/{k8s_pod_name} {k8s_container_name}: {msg}"
|
||||
|
||||
ensure_alert \
|
||||
"registry-errors" \
|
||||
|
|
|
|||
|
|
@ -1,35 +0,0 @@
|
|||
# Woodpecker: allow pulling from private registry
|
||||
|
||||
Woodpecker pipelines run as Kubernetes pods in the `woodpecker` namespace. If pipeline step images use `registry.nxtgauge.com/...` (private, Basic auth), kubelet needs an `imagePullSecret`.
|
||||
|
||||
This is required for base images (example `registry.nxtgauge.com/rust:alpine`) and also for any mirrored plugin images (example `registry.nxtgauge.com/kaniko:2.1.1`).
|
||||
|
||||
## Required secret
|
||||
|
||||
Create this once:
|
||||
|
||||
```bash
|
||||
kubectl -n woodpecker create secret docker-registry registry-nxtgauge-pull \
|
||||
--docker-server=registry.nxtgauge.com \
|
||||
--docker-username="<REGISTRY_USERNAME>" \
|
||||
--docker-password="<REGISTRY_PASSWORD>" \
|
||||
--docker-email="ci@nxtgauge.com"
|
||||
```
|
||||
|
||||
## Mirroring common plugin images (optional)
|
||||
|
||||
If your pipelines reference plugin images from the internal registry (example `registry.nxtgauge.com/kaniko:2.1.1`) make sure those images exist in the registry.
|
||||
|
||||
Example mirror from Docker Hub to internal:
|
||||
|
||||
```bash
|
||||
docker pull woodpeckerci/plugin-kaniko:2.1.1
|
||||
docker tag woodpeckerci/plugin-kaniko:2.1.1 registry.nxtgauge.com/kaniko:2.1.1
|
||||
docker push registry.nxtgauge.com/kaniko:2.1.1
|
||||
```
|
||||
|
||||
## What this kustomize applies
|
||||
|
||||
It patches/ensures the `default` ServiceAccount in `woodpecker` includes:
|
||||
|
||||
- `imagePullSecrets: [registry-nxtgauge-pull]`
|
||||
|
|
@ -1,8 +0,0 @@
|
|||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: default
|
||||
namespace: woodpecker
|
||||
imagePullSecrets:
|
||||
- name: registry-nxtgauge-pull
|
||||
|
||||
Loading…
Add table
Reference in a new issue