How do I check my Supabase RLS policies for security holes?

AuditYourApp performs an automated deep-scan of your Postgres schema. It simulates attacker behavior to identify tables with missing Row Level Security (RLS) policies, permissive "true" policies, and unauthenticated public access risks.

Is AuditYourApp safe to run on a production database?

Yes. The scanner operates in a read-only context for 95% of checks. For write-access verification, it uses transaction rollbacks to ensure no dummy data is left behind.

Does it provide code to fix the security issues?

Yes. Unlike standard scanners that only report errors, AuditYourApp generates the exact SQL snippets and Policy definitions needed to patch vulnerabilities and secure your Supabase project.

What is the difference between the Single Snapshot and Continuous Guard?

The Single Snapshot ($49) is a one-time audit perfect for pre-launch verification. Continuous Guard ($29/mo) runs weekly scans to detect security regressions, ensuring new code pushes do not accidentally expose private data.

Does it offer manual expert review??

Yes, AuditYourApp offers an Expert Architecture Review ($499) which includes manual logic analysis by a human security engineer.

CVSS Score Calculator: A Practical Guide for Developers

Your scanner finishes. The report lands in Slack. One finding says “High”, another says “Medium”, and a third looks harmless until someone notices it touches production auth data. The hard part isn't finding issues. It's deciding what gets fixed today, what waits for the sprint, and what only matters in theory.

That's where a CVSS score calculator helps, but only if you use it properly. A lot of teams treat CVSS as a severity label generator. They copy the number into a ticket and move on. In practice, that misses the point. CVSS is useful because it gives developers, security engineers, and product owners a shared way to describe exploitability and impact. The score is the start of the conversation, not the end of it.

For teams building on Supabase, Firebase, and mobile backends, that distinction matters. A public RPC, an over-broad rule, or a leaked key might look routine in a generic scan. In your stack, it could affect user data, admin workflows, or regulated records. The same technical flaw can be a nuisance in one app and a business-critical problem in another.

Why a CVSS Score Is More Than Just a Number

A CVSS score exists to cut through vague language. “Serious” means different things to a developer, a pentester, and a CTO. A standard score gives everyone the same baseline for discussing risk.

The Common Vulnerability Scoring System ranks vulnerability severity on a standardised scale from 0.0 to 10.0, with 9.0 to 10.0 classified as Critical in CVSS 4.0, and the system is managed by FIRST. That gives teams an international benchmark instead of ad hoc labels from individual tools or auditors, as noted in PlexTrac's CVSS overview.

Why teams misread the score

The most common mistake is to treat the score as self-contained. It isn't. A score only helps if you understand what produced it.

A “Medium” can mean a flaw is awkward to exploit but still dangerous in your environment. A “High” can mean the issue is severe, yet partly contained by your existing controls. If you only sort findings by number, you can end up fixing the wrong thing first.

CVSS works best when teams ask two questions together. How exploitable is this vulnerability, and what happens if someone uses it here?

That second part is where modern app teams often get tripped up. Cloud backends blur boundaries. A single exposed function may touch user profiles, billing data, or internal workflows. In a mobile app, a backend weakness can be hidden behind a polished client experience, while still being directly reachable by an attacker.

What the number should trigger

A CVSS score should trigger a practical review, not a reflex.

When I review findings with development teams, the useful discussion usually sounds like this:

Where is it exposed. Public internet, internal network, or authenticated path.
What does exploitation buy an attacker. Read access, data tampering, privilege escalation, or service disruption.
What protects us already. Row-level security, rate limits, approval gates, or network restrictions.
What breaks for the business. User trust, compliance posture, incident response load, or release plans.

That's why a CVSS score is more like a risk language than a label. If you use it that way, it sharpens prioritisation. If you use it as a badge on a ticket, it becomes noise.

Deconstructing the Three CVSS Metric Groups

A CVSS calculator only helps if the inputs reflect the actual finding and the environment around it. CVSS splits that job into three metric groups: Base, Temporal, and Environmental.

Teams usually know the Base score. The mistakes show up when they stop there.

Base metrics

The Base group captures the technical characteristics of the vulnerability itself. These values describe how the flaw can be exploited and what kind of impact it can have, regardless of which company is affected.

That means questions such as: is the issue reachable over the network, does the attacker need privileges, is user interaction required, and what happens to confidentiality, integrity, and availability if exploitation succeeds? Those answers should stay stable whether the flaw sits in a test app, a Supabase-backed production API, or a mobile backend used by thousands of customers.

Base scoring gives teams a common severity reference. It does not tell you whether the vulnerable component sits behind row-level security, whether exploit code is already circulating, or whether the affected data would trigger UK GDPR reporting obligations.

Temporal metrics

The Temporal group adjusts the score based on what is true now, not just what is true in theory.

This is the part that captures operational pressure. If exploitation is straightforward and public proof-of-concept code exists, the same Base score deserves more attention than it did when the finding was still obscure. If the report is low confidence, or an effective vendor fix is already available and scheduled, the urgency can shift the other way.

Security teams often skip Temporal scoring because it feels harder to maintain. In practice, it maps well to how incidents develop. A medium-severity backend issue with active exploitation chatter can outrank an older high-severity finding that is harder to reproduce and already boxed in by controls.

Practical rule: Base tells you how bad the flaw is in general. Temporal tells you how quickly it can become your problem.

Environmental metrics

The Environmental group turns CVSS from a generic severity system into a prioritisation tool.

These metrics reflect your deployment, your assets, and your business impact. For app teams, this is often where the real decision gets made. An exposed function in a side project is one thing. The same function tied to production user records, payment events, or admin workflows is a different class of risk.

That difference gets sharper in modern stacks. A Supabase misconfiguration may look moderate at first glance if the Base score only reflects direct exploitability. In a production mobile app handling personal data, with weak tenant separation and regulatory exposure in the UK, Environmental scoring can push that issue much higher because the confidentiality and integrity requirements are stricter in that setting.

Here's the practical split:

| Metric group | What it answers | What it does not answer | |---|---|---| | Base | How severe is the vulnerability by its technical properties? | How exposed are we, and how much does this asset matter to us? | | Temporal | How has exploitability or confidence changed since disclosure? | What is the business impact in our environment? | | Environmental | How severe is this issue in our deployment, with our data and controls? | Whether the core vulnerability exists in the first place |

Why the full model matters

Using only Base scores gives a consistent starting point. Using all three groups gives a usable queue for remediation.

That distinction matters for cloud and mobile teams because the same flaw can sit in very different contexts. A backend bug affecting a low-value internal tool may wait. The same bug in a public app that stores regulated customer data, depends on shared cloud services, and has few compensating controls should move fast.

That is why experienced teams do not ask only whether a vulnerability is severe. They ask whether it is exploitable now, whether existing controls really contain it, and what the business impact looks like in this specific deployment.

Calculating the Base Score with a Real-World Example

Take a common backend finding: an unprotected Supabase RPC that allows unauthorised data modification. It isn't a theoretical issue. It's the kind of flaw that appears when a function is exposed for convenience, trusted by the frontend, and never properly constrained.

A developer calculates a high CVSS security score while analyzing vulnerable Supabase database code on their desk.

A CVSS score calculator asks you to translate that finding into a set of Base metrics. The calculator handles the maths. Your job is to classify the vulnerability accurately.

Start with exploitability

The first set of Base metrics describes how an attacker reaches and uses the flaw.

Attack Vector. If the RPC is reachable over the internet through your app backend, this is typically network-accessible.
Attack Complexity. Ask whether exploitation works reliably or depends on unusual conditions. A straightforward function call with weak checks points towards lower complexity.
Privileges Required. If anyone can hit the vulnerable endpoint without a valid account or privileged role, required privileges are minimal or none.
User Interaction. If the attacker can call the function directly, they don't need a victim to click, install, or approve anything.

These four inputs heavily shape the initial severity because they answer a simple question. How much friction stands between an attacker and exploitation?

Then score impact

The second half of the Base group is about consequences.

For an unauthorised data modification flaw, the strongest impact is often Integrity. The attacker can change records they shouldn't control. Depending on the function, Confidentiality may also be affected if the same path exposes data or enables indirect disclosure. Availability depends on whether misuse can disrupt operations, block workflows, or corrupt state badly enough to cause downtime.

A practical way to think about the impact triad is this:

| Impact area | What to ask for the RPC example | |---|---| | Confidentiality | Can the attacker see data they shouldn't? | | Integrity | Can they alter, insert, or delete trusted records? | | Availability | Can they interrupt service or make data unusable? |

A lot of teams under-score integrity in app backends because data modification sounds less dramatic than data theft. That's a mistake. If an attacker can change billing status, approval flags, or user roles, the system stops being trustworthy.

Don't score the bug based on how easy it is to patch. Score it based on how the attacker experiences it and what they can achieve.

The Scope decision that causes trouble

The most error-prone Base metric is Scope.

Scope asks whether exploitation affects resources beyond the original security boundary. In plain terms, does abusing this vulnerability stay within the vulnerable component's authority, or does it let the attacker impact something outside that boundary?

Teams often overstate severity. They see cross-table effects, internal workflow changes, or downstream business impact and mark Scope as changed, even when the technical security boundary hasn't shifted.

That mistake isn't rare. A 2024 UK Government Cyber Assessment Framework audit found 63% of entities misclassified scope in at least 15% of assessments, leading to inflated severity ratings and patching delays averaging 14 days, according to Picus Security's CVSS glossary.

For the Supabase RPC example, Scope depends on architecture. If the function lets an attacker alter data inside the same application trust boundary, Scope may remain unchanged even if the business impact is ugly. If exploitation crosses into another security authority, the answer shifts.

What works and what doesn't

When teams score Base metrics well, they focus on observable facts.

What works:

Trace the attack path from anonymous access or low-privilege access to the vulnerable function.
Review actual permissions instead of assumed frontend behaviour.
Map the direct technical impact before discussing compliance or reputation damage.

What doesn't work:

Scoring from the ticket title without reproducing the finding.
Letting business anxiety distort Base metrics that should stay technical.
Confusing downstream consequences with Scope.

A CVSS score calculator is excellent at consistency. It's terrible at reading your architecture. That part is still on the team.

Refining Severity with Temporal and Environmental Scores

Base scoring gives you a clean technical baseline. Real prioritisation starts when you adapt that baseline to current conditions and local context.

A diagram illustrating the four steps to refine vulnerability severity using CVSS base, temporal, and environmental scores.

Temporal scoring changes urgency

Return to the unprotected Supabase RPC. The underlying flaw hasn't changed. The code is still vulnerable. But the urgency can move.

If exploitation is now easy to reproduce, your tolerance for delay should drop. If a remediation path is available and proven, your response can become more structured. Temporal scoring helps teams distinguish between “serious” and “serious right now”.

This matters in fast-moving app teams because conditions shift quicker than many ticket queues do. A finding from last sprint may deserve a different response this sprint, even if the Base score is identical.

Environmental scoring changes business priority

Environmental metrics answer the question most developers eventually ask anyway. Why is this issue a bigger deal in our app than in someone else's?

The answer is context. A backend function that updates low-value demo content is not the same as one that changes customer records, mobile subscription state, or regulated personal data. Standard calculators often stop before this point, which is where teams lose the most useful part of CVSS.

According to the NVD CVSS calculator context, standard CVSS calculators often ignore the Environmental metric group. In the UK, where GDPR fines can reach £17.5M or 4% of global revenue, a Medium base score of 4.0 to 6.9 can become a Critical operational risk once environmental factors are applied.

That's the gap most generic guides miss. The same vulnerability can have very different remediation priority depending on data sensitivity, exposure, and regulatory consequences.

A practical way to apply Environmental metrics

For cloud and mobile backends, Environmental scoring usually gets sharper when the team reviews the finding against operational reality:

Asset importance. Is the vulnerable component tied to onboarding, payments, auth, or admin actions?
Data sensitivity. Does the function touch personal data, internal records, or public content?
Compensating controls. Are there other checks that reduce exploit impact, or is the backend effectively trusting the caller?
Deployment context. Is this in production, staging, a dead feature path, or an internal-only environment?

Here's a simple decision frame:

| Scenario | Likely interpretation | |---|---| | Publicly reachable function with sensitive user data | Environmental score rises sharply | | Vulnerable path exists but is buffered by strong controls | Environmental score may stay closer to Base | | Same flaw in a low-value internal prototype | Priority often drops despite similar Base traits |

A CVSS score becomes operationally useful when the team can explain why this vulnerable component matters to this business, not just why the vulnerability is bad in general.

If you want to make that process routine, tie scoring into your broader continuous risk assessment workflow. That keeps severity from becoming a one-time audit artefact.

Where teams go wrong

The most common failure isn't bad maths. It's skipping Environmental scoring because it feels subjective.

It is subjective, but that doesn't make it optional. Good teams make it disciplined. They document why a customer-facing backend handling regulated data deserves a higher priority than an equally exploitable flaw in a low-risk internal tool. They also document the reverse, which is just as important for avoiding fire drills.

A CVSS score calculator gives you a framework. Environmental scoring is where your security process proves whether it understands the business it is protecting.

Using a CVSS Score Calculator and Automating Analysis

Manual scoring is useful when you're learning CVSS or reviewing a small set of findings. It becomes painful once you have regular scans, repeated issue types, and multiple environments.

Screenshot from https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator

Manual calculators are good for judgement

Official calculators are still the best place to build scoring discipline. The formula is open and accessible through tools such as the NVD CVSS v3 calculator, and the three metric groups let teams contextualise severity for local deployments, as summarised in Wikipedia's CVSS overview.

That's valuable because it forces the team to justify each metric. If an engineer can't explain why Privileges Required is low or why Scope changed, the number isn't trustworthy yet.

Manual scoring is strongest when:

You're validating a high-impact finding before creating an incident or escalation.
You're training developers to think in terms of exploitability and impact.
You're resolving disagreement between scanner output and engineering reality.

Manual entry doesn't scale well

The downside is consistency over time. Repeated scoring by hand introduces drift. Different engineers interpret the same flaw slightly differently. Ticket quality varies. Environmental context gets skipped because the team is rushing.

That gets worse in stacks like Supabase and Firebase, where the issue types are often configuration-heavy rather than classic CVE-style flaws. A scanner may tell you a rule, function, or key is exposed. Turning that into CVSS inputs still requires architectural judgement.

If you work with teams outside your own region or compliance context, it also helps to look at how others frame vulnerability review. For a non-UK example of operational assessment framing, Cyber Command's guide to vulnerability assessment for Florida businesses is a useful reference point because it shows how environment changes risk interpretation.

Automation is where mature teams land

The right long-term setup is usually hybrid.

Use a CVSS score calculator manually for validation and edge cases. Automate the repetitive mapping wherever possible. That means pulling findings into a system that can classify common patterns, preserve scoring logic, and feed results into triage workflows. If you're building that process out, an automated security scanning guide is a sensible starting point for operational design.

Engineering note: Automation should remove repetitive scoring work, not remove human review from high-consequence findings.

What works well in practice is a pipeline where routine findings are pre-scored, exceptions are reviewed by someone who understands the stack, and Environmental adjustments are attached before tickets are prioritised. What doesn't work is letting a raw scanner severity become the final answer.

A CVSS score calculator is precise. Your process needs to be repeatable as well.

Turning Scores into an Actionable Remediation Plan

A payment API with a Base score of 6.5 might sit in the backlog for weeks. The same issue can become an immediate fix when the vulnerable service handles regulated customer data, the exploit is public, and the affected app is live in production. That is the point of CVSS in practice. It helps teams decide what to interrupt, what to schedule, and what to accept with eyes open.

Raw severity is only the starting point. The remediation plan should reflect Temporal and Environmental context, because that is where business risk shows up. A Medium finding in a Supabase project with permissive RLS on a low-value internal tool is different from a Medium finding on a customer-facing mobile backend subject to UK GDPR expectations and payment flows. Same weakness category. Different priority, owner, and deadline.

What a workable plan looks like

Every finding needs three decisions attached to it:

Fix window. Immediate action, this sprint, planned backlog work, or accepted risk with review date.
Owner. Backend engineer, mobile engineer, platform team, product team, or security reviewer.
Closure evidence. Code change, config change, compensating control, regression test, or updated CVSS after re-scoring.

The ticket itself should be specific enough that an engineer can act without guessing. “Fix exposed RPC” does not help. “Require auth on the customer status RPC, restrict writes by role, add a regression test for unauthorised access, then rescore Environmental metrics for the production project” gives the team a clear path.

Good remediation plans also account for trade-offs. Some fixes are low effort and should happen immediately. Others touch schema design, mobile release cycles, or customer-facing behaviour and need coordination. In those cases, a compensating control, such as narrowing network access, rotating keys, disabling a public function, or increasing monitoring, may be the right short-term move while the permanent fix is built.

Keep the process lightweight, but enforceable

Teams do not need a complicated committee to use CVSS well. They need thresholds that map to action and are enforced consistently.

A simple model is enough:

Critical findings trigger immediate review and can interrupt planned work.
High findings get a named owner and a due date in the next sprint.
Medium findings are grouped by asset and risk pattern, then fixed deliberately.
Low findings stay visible and are usually resolved during adjacent engineering work or hardening cycles.

That only works if scoring sits inside a real vulnerability management process. Otherwise teams score findings once, paste the number into a ticket, and never revisit the context when exploitability or business impact changes.

For smaller organisations, ownership often breaks down at the business boundary rather than the technical one. MY CYBER GUARD's guide to small business cyber protection is a useful reminder that someone has to own the operational risk, not just the Jira issue.

A CVSS score should end in one of three outcomes. A verified fix, a documented exception, or a compensating control with review dates. If none of those happen, the scoring was accurate but the remediation process was weak.

AuditYour.App helps teams scan Supabase, Firebase, mobile apps, and web projects for exposed RLS rules, public RPCs, leaked keys, and hardcoded secrets, then turn findings into clear remediation work. If you want a faster path from raw security findings to prioritised fixes, try AuditYour.App.