ShrikeOps 6 minMar 2026

Why We Run Four Kubernetes Scanners Simultaneously (and Score Them)

Pluto catches deprecated APIs. Polaris scores policy compliance. kube-score grades reliability. OSV.dev surfaces CVEs. No single tool is enough — here's how we combine them into one ShrikeOps score.

Here's the uncomfortable truth about Kubernetes scanners: every one of them is wrong alone. Pluto will clear a manifest that Polaris flags as high-risk. Polaris will pass a chart that kube-score grades D for reliability. kube-score happily shrugs at images with known CVEs. Pick any single tool and you'll ship something unsafe — just wearing a different blindfold each time.

ShrikeOps solves it by running four scanners in parallel and blending their verdicts into one A–F grade. This post is the blend.

What each scanner actually catches

Pluto (Fairwinds) — deprecated + removed Kubernetes APIs. apps/v1beta2 Deployment still in your Helm chart? Pluto finds it in 50ms.
Polaris (Fairwinds) — 30+ security + reliability best practices. Privileged containers, missing resource limits, hostPath mounts, runAsRoot, the lot.
kube-score (Zegl) — reliability + PodDisruptionBudget scoring. Catches the subtle things: missing PDB on a 2-replica Deployment, no preStop hook, no graceful shutdown.
OSV.dev — CVE database for container image contents (we query by digest). The only one that looks *inside* your images.

Each tool is narrow by design. Stack them, and the Venn-diagram overlap is where the real signal lives.

The orchestration loop

We fan out all four tools in parallel, merge findings by severity, and collapse the verdict into one letter grade. The orchestrator is ~200 lines of Go.

// ops/shrikeops/scanner/internal/engine/scanner.go (simplified)
func Scan(ctx context.Context, manifest []byte) (*Report, error) {
    var wg sync.WaitGroup
    findings := make(chan []Finding, 4)

    wg.Add(4)
    go func() { defer wg.Done(); findings <- runPluto(manifest)     }()
    go func() { defer wg.Done(); findings <- runPolaris(manifest)   }()
    go func() { defer wg.Done(); findings <- runKubeScore(manifest) }()
    go func() { defer wg.Done(); findings <- runOSV(manifest)       }()
    wg.Wait(); close(findings)

    all := []Finding{}
    for batch := range findings { all = append(all, batch...) }

    return &Report{
        Score:    gradeFindings(all),   // 0–100
        Grade:    letterGrade(all),     // A–F
        Findings: all,
    }, nil
}

Four scanners in parallel. Total wall-clock ≈ max(pluto, polaris, kube-score, osv) — not the sum.

The trick: each runner is graceful-degrading. Binary missing? Log and return empty findings. Tool returns non-zero exit on parseable output? Keep the output, drop the exit code. In 18 months of running this we've never crashed a scan because a sub-scanner had a bad day.

How the letter grade actually composes

Every finding has a severity (critical, high, medium, low, info). Starting from 100, we subtract: 20 for each critical, 10 per high, 5 per medium, 2 per low, 0 for info (those are just breadcrumbs). Floor at 0. Then:

switch {
case score >= 90: return score, "A"
case score >= 75: return score, "B"
case score >= 60: return score, "C"
case score >= 45: return score, "D"
default:          return score, "F"
}

Grading is linear. We tried weighted heuristics once; they gave worse signal than linear subtraction and nobody could explain the weights.

Where each tool's verdict wins (and loses)

Pluto wins at: deprecated APIs

Pluto is the cheapest, fastest, narrowest check. If you ship autoscaling/v2beta1 in 2026 it catches you in 40ms. But Pluto has zero opinion on whether your app is *safe* to deploy — only whether it's *deprecated*.

Polaris wins at: workload misconfigurations

Polaris is our heaviest signal contributor — 30+ checks against Pod Security Standards + reliability. When a Polaris check fires as critical, it's usually a legitimate emergency (privileged container, hostPath mount into a sensitive dir). We trust its critical bucket the most.

kube-score wins at: reliability you'd never catch in review

This is the tool that surfaces the boring problems: 'your Deployment has 1 replica with no PDB' or 'Pod has no preStop hook so terminating gracefully is a prayer'. Nobody reviews these in a PR — kube-score catches them every time.

OSV.dev wins at: CVEs you didn't know you inherited

The only scanner that sees into your image layers. An ubuntu:22.04 base with a 2-year-old libxml2? OSV will tell you. Polaris/Pluto/kube-score won't — they don't look inside the image.

The overlap is where the bugs live

Here's the case that changed how we think about this: a customer's Helm chart was graded A by kube-score, B by Polaris, A by Pluto. Three tools said ship it. OSV found a critical CVE in the base image (glibc, remotely exploitable). ShrikeOps dropped the overall grade to D — because one critical severity is more load-bearing than three clean reports.

The inverse happens too. A manifest with a single high from Polaris (missing resource limits) but immaculate elsewhere still gets a B, because one high minus 100 is 90. That's correct! It's a real issue, but it's not urgent like a CVE.

What we baked into the Starling runtime

The Starling ops image ships all four (plus Trivy, Kubescape, conftest, Kustomize) pre-installed. Scan requests land at /api/ops/scan → proxy to in-cluster shrikeops-scanner service → four goroutines → A–F in <2 seconds for a single-doc manifest.

💡 Try it yourself

Paste any YAML at warblecloud.com/ops — first 3 scans per IP per hour are free, no login. Paid tier (credits) gets you scan_cluster which runs the same four scanners against an entire running cluster via kubeconfig.

What we'd skip if we rebuilt this

JSON-RPC batch for the four scanners — we implemented it, no caller used it. The goroutine fan-out is clearer than a batched invocation.
Per-scanner weighting — started as 40% security / 20% stability / 25% reliability / 15% lint. Linear severity-based scoring beats weighted category scoring every time in blind A/B tests we ran with 5 engineers.
Scanner selection flags — we briefly had --skip-pluto style flags. Nobody used them. If a scanner isn't pulling its weight, delete it.

Takeaways

No single Kubernetes scanner is sufficient. Run at least three from different philosophical angles (deprecation, policy, reliability, CVE).
Blend their verdicts into one number, linearly. Weighted heuristics feel sophisticated and consistently score worse.
Make each scanner graceful-degrading. A missing binary should log and return empty — never crash the overall scan.
The overlap between scanners is where the real bugs hide. Design your scoring so a single critical from any one tool dominates three clean reports from the others.

ShrikeOps is the default scan engine inside every Starling MCP deployment. Try it live at warblecloud.com/ops — paste a manifest, get the four-scanner report back in under 2 seconds.