Your team ships fast. The mobile app talks to Firebase. The web app leans on Supabase. Edge functions handle the awkward bits of business logic. Someone enabled Row Level Security, someone else added a quick RPC, and everyone assumes the platform defaults are doing more protection than they really are.
That's where many startups get caught. The stack looks modern, managed, and clean on the surface, but the weak points aren't where a traditional network-heavy test usually spends its time. They sit in policy logic, client-exposed configuration, auth edge cases, callable functions, storage rules, and the gap between “authenticated” and “authorised”.
A good red team pen test treats those gaps like an attacker would. It doesn't stop at finding a misconfiguration. It asks whether that flaw can be chained into account takeover, data access, privilege escalation, or persistent abuse in a serverless environment that changes every week.
Beyond the Checklist Embracing the Adversary Mindset
A checklist-driven pen test often works backwards from known controls. Is storage public? Are headers set? Is an endpoint rate limited? Those checks matter, but they can miss the path an attacker takes through a Supabase or Firebase app.

The adversary mindset starts with intent. If I'm attacking your app, I don't care whether your controls look neat in a dashboard. I care whether I can read another user's profile, invoke a function without the expected role, pull a privileged token from a mobile bundle, or chain weak validation into admin access.
Why serverless changes the test
In cloud-native stacks, the attack surface is distributed:
- Policy layer flaws often live in RLS rules, Firestore rules, or storage permissions that look right until you test edge cases.
- Client trust mistakes happen when developers assume values from the app are trustworthy because the backend is managed.
- Function exposure shows up in RPCs, callable functions, webhooks, and edge functions that were built for speed, not least privilege.
- Mobile leakage appears in APKs, IPAs, JavaScript bundles, and debug artefacts where keys, endpoints, and feature flags become reconnaissance gold.
A traditional test may still find obvious issues. It often won't pressure the app hard enough to prove whether the access model holds under realistic abuse.
Practical rule: If your test doesn't try to achieve a business objective, you're probably validating controls, not attacker paths.
The business case for realism is strong. 94% of organisations conducting red-team testing experience some level of successful penetration, and organisations using these exercises reduced the average cost of a breach by $204,000 according to Pentest-Tools penetration testing statistics. That doesn't mean every system is poor. It means mature teams still miss what only objective-driven simulation exposes.
What works and what fails
What works in a startup environment is narrow, concrete, and outcome-focused. Test whether a standard user can access a restricted table. Test whether an edge function trusts a forged claim. Test whether a mobile client leaks enough context to speed up exploitation.
What fails is broad theatre. Teams ask for “a red team exercise” and get a recycled web app test with a few extra screenshots. That's not enough for a stack built on managed auth, database policies, object storage, push notifications, and mobile clients.
A red team pen test for Supabase or Firebase should feel like a determined attacker working through your actual product. If it doesn't, it won't tell you much.
Defining the Battlefield and Rules of Engagement
The fastest way to waste a red team engagement is to start attacking before you've agreed what success looks like. In startups, this usually happens because everyone is in a hurry and the app changes mid-test.
A useful scope is specific enough that engineers can recognise it and defenders can measure it. “Test our app” is weak. “Gain read access to another tenant's invoices”, “escalate a normal user to staff via an edge function”, and “extract data from a supposedly private storage bucket without admin credentials” are proper objectives.
Set objectives that map to business damage
For serverless and mobile stacks, the best objectives usually sit close to business logic:
-
Cross-tenant access
Can one signed-in user read or modify another tenant's data through RLS gaps, weak document rules, or insecure query paths? -
Privilege escalation
Can a standard account become staff or admin by tampering with JWT claims, abusing a backend function, or replaying a hidden request from the client? -
Secret and token exposure
Can an attacker recover useful configuration, API keys, or tokens from bundles, build artefacts, logs, or app store releases? -
Silent persistence
Can a function, refresh token flow, webhook, or background job be abused in a way defenders won't notice quickly?
A red team pen test differs from a noisy scan. The point isn't to enumerate every possible issue. It's to prove which routes lead to impact.
Write rules that protect the company and preserve realism
Stealth matters. Red team assessments are stealthier and often last weeks or months to emulate persistent threats, with the aim of reaching a defined objective while the wider IT team remains unaware, as described in Evalian's comparison of penetration testing and red team testing. For a startup, that doesn't mean chaos. It means controlled realism.
Your Rules of Engagement should cover:
- Allowed attack surfaces such as production mobile apps, staging environments with production-like auth, storage layers, edge functions, and admin panels.
- Explicit exclusions including payment providers, third-party SaaS tenants, destructive actions against real user data, and anything that risks regulatory exposure.
- Escalation paths so a critical finding reaches a named person immediately without killing the whole exercise.
- Evidence handling so copied data is minimised, redacted where possible, and stored only as long as needed for proof.
- Emergency authorisation often called the “get out of jail free” note, which matters if social engineering, physical access attempts, or provider-side alerts trigger questions.
The tighter the objective, the better the engagement. Vague scopes produce vague findings.
A scoping pattern that works for startups
I've found that small teams do best with a layered scope:
| Scope layer | What to include | What to avoid | |---|---|---| | Core target | Production-like app flow, auth, data access, storage, functions | Massive catch-all infrastructure reviews | | Supporting assets | Mobile binaries, JS bundles, CI artefacts if approved, logs with limited access | Open-ended internal network testing with no app tie-in | | Defender visibility | Alerting rules, logging, incident triage contacts | Broadcasting the exercise to the whole company |
If you get the battlefield wrong, the rest of the exercise becomes a performance. If you get it right, every test result is easier to interpret, fix, and retest.
Threat Modelling for Modern Application Stacks
Organizations frequently overcomplicate threat modelling, then skip it. For Supabase, Firebase, and mobile-backed apps, a simple attacker-centred model works better. Start with what would hurt if exposed. Then trace how the app grants access. Then test the seams where trust is assumed.
Start with assets and trust boundaries
List the assets first, not the vulnerabilities. In these stacks, the recurring high-value targets are usually:
- User data in Postgres tables, Firestore collections, storage buckets, and cached client state
- Authorisation logic inside RLS policies, Firestore rules, custom claims, and edge functions
- Operational secrets in mobile builds, frontend bundles, environment handling, and CI outputs
- Privileged workflows such as admin panels, moderation tools, billing actions, and support impersonation
Then mark the trust boundaries. The important question isn't “where is the server?” It's “where does the app stop verifying and start assuming?” That often happens when a function trusts a user ID from the client, when a policy checks auth existence but not ownership, or when a mobile app contains enough metadata to reconstruct private flows.
If your team needs a simpler way to structure that exercise, this risk assessment framework for app teams is a practical way to move from vague concern to testable risk.
Prioritised test cases for Serverless and Mobile Stacks
| Test Case | Description | Impact | Recommended Tooling | |---|---|---|---| | RLS policy bypass | Fuzz queries and row ownership edge cases to test whether reads or writes leak across users or tenants | Direct unauthorised data access | Supabase client, psql, custom scripts, Burp Suite | | Unprotected RPC or edge function | Enumerate callable functions and replay requests with altered roles, parameters, or missing checks | Privilege escalation or sensitive action abuse | Browser dev tools, Postman, curl, Burp Suite | | Firestore or storage rules weakness | Test direct object or document access without intended ownership constraints | Data exposure and overwrite risk | Firebase Emulator, SDKs, scripted rule tests | | JWT and claim misuse | Tamper with client-controlled assumptions and inspect how backend code evaluates roles | Auth bypass and role confusion | jwt tooling, interception proxy, custom scripts | | Hardcoded secrets in mobile or web bundles | Decompile APK/IPA files and inspect JS bundles for keys, endpoints, and hidden feature flags | Reconnaissance, service abuse, chained compromise | jadx, apktool, MobSF, strings, source map review | | Insecure admin workflow | Replay privileged UI calls and test hidden routes or support tools from lower-privileged accounts | Administrative action abuse | Browser dev tools, Burp Suite, custom role test accounts |
That table is deliberately narrow. It focuses on attacks that repeatedly matter in managed backends.
Field note: The best test cases are the ones that combine application logic with platform defaults. Most severe leaks come from the interaction between the two, not from one in isolation.
For teams that want stronger visibility at the data layer, digna's database security is worth reviewing because database-side detection often catches abuse patterns that app logs flatten or miss.
What to prioritise first
If time is tight, test in this order:
- Authorisation before authentication because most modern stacks already have decent sign-in flows, but role and ownership checks are far easier to get wrong.
- Functions before frontend polish because RPCs and edge functions often become the shortcut around otherwise sound policy design.
- Mobile artefacts early because client-side recon gives attackers naming, routes, flags, and endpoints that speed up every later step.
Threat modelling only earns its keep if it produces an attack list. If your output is a whiteboard full of categories, you're not ready to test.
Executing the Attack on Serverless and Mobile
Once the target paths are ranked, the attack should feel mechanical. You're not guessing. You're validating assumptions until one breaks, then following the chain.

Attack path one through policy logic
You start with the app as a normal user. Create two low-privilege accounts if the product allows self-registration. Populate both with distinct records. Then compare what the client shows against what the backend permits.
For Supabase, a common pattern is an RLS policy that checks whether a user is authenticated but doesn't sufficiently restrict row ownership. Another is a policy that allows updates based on a client-supplied field rather than a server-derived identity. The test isn't fancy. Change filters, remove client constraints, alter record identifiers, and try operations the UI never exposes.
A simple workflow looks like this:
- Capture a legitimate request from the browser or app traffic.
- Alter the record reference to another user's object or row.
- Replay the request with the same token and inspect whether the backend rejects, leaks, or partially succeeds.
- Probe edge cases like null ownership, soft-deleted records, optional tenant IDs, and helper views that sit outside the main table protections.
Attackers love optional fields. Teams often treat them as harmless. Policies often don't.
For Firebase and Firestore, the equivalent test is to access documents or storage paths directly through SDK calls rather than through the app flow. If the rule assumes a path structure the client normally obeys, direct access often reveals where the rule is thinner than the UI suggests.
Attack path two through functions and hidden endpoints
RPCs and edge functions are where many otherwise careful teams cut corners. They're built to unblock a feature, so validation logic lands there fast and matures later, if ever.
You inspect the browser's network panel, mobile traffic, and bundled JavaScript for function names, routes, and payload shapes. Then you replay those calls outside the app. Remove fields. Add fields. Swap user IDs. Send parameters that a normal client would never send.
A practical red team pen test often finds issues like:
- Functions that check login state but not role
- Admin-only operations hidden in the UI but callable by any authenticated user
- Server code that trusts tenant IDs or account IDs from the request body
- Webhook-style handlers that verify structure but not caller intent
If your team needs a stronger foundation for this layer, this guide to pen testing APIs in modern apps is useful because API abuse is often the shortest path to impact in BaaS-heavy products.
Attack path three through the mobile client
Mobile apps give attackers durable artefacts to study. They can inspect them offline, slowly, and repeatedly. That matters because even well-protected backends leak intent through the client.
Decompile the APK with jadx or apktool. Review strings, config files, embedded endpoints, feature flags, and any debugging remnants. For iOS, inspect the IPA contents and binary strings. Then compare what the app knows with what your public documentation says. The mismatches are often interesting.
A few recurring wins:
- Environment confusion where staging and production artefacts coexist in the build
- Hidden endpoints referenced by the app but absent from public frontend flows
- Support or internal actions exposed through route names, analytics events, or disabled UI components
- Weak local auth assumptions where biometric gates protect screen access but not the backend action itself
For teams refining local protection flows, a biometric authentication developer guide is useful background because mobile identity controls often create a false sense of backend security when they aren't tied to real server-side authorisation.
Evidence matters more than drama
The best red team operators collect proof as they go. Save the request pair. Note the preconditions. Record the exact role used. Store the minimum data needed to prove impact.
That discipline is what turns “we might have a leak” into “a standard user account can read another tenant's invoices through this function when tenant_id is altered in the request body”.
That's a finding engineering can fix.
Automating Your Red Team with Continuous Scans
A one-off exercise degrades quickly in an agile stack. New functions get deployed, policies change, mobile builds go live, and someone pushes a hotfix on Friday that inadvertently undoes a careful permission check from Tuesday.
That's why manual red teaming needs an automated counterpart. Not because automation replaces human judgment. It doesn't. It handles the repetitive verification that teams otherwise skip.

What automation is good at
Automation works well when the failure mode is common and the test can be repeated consistently:
- RLS and rule regression checks after schema or policy changes
- RPC and function exposure review as new endpoints appear
- Secret discovery across updated web bundles, mobile artefacts, and release candidates
- Baseline drift detection when a previously blocked path becomes reachable
Many startups make the wrong trade-off. They either rely only on annual testing, which is too slow for modern release cycles, or they rely only on static scanners, which don't exercise business logic adequately.
Automation should carry the boring load. Humans should spend their time on chaining, judgment, and weird edge cases.
What to automate and what to keep manual
Use automation for breadth and continuity. Use manual work for stealth, chaining, and contextual abuse. That split keeps the signal high.
A practical model looks like this:
| Best automated | Best manual | |---|---| | Re-testing known risky paths after deploys | Multi-step privilege escalation | | Bundle and binary inspection for leaked secrets | Business logic abuse | | Discovery of newly exposed functions or routes | Social engineering and operational testing | | Consistent policy regression checks | Novel attack chains across product features |
If you're building a process around continuous validation, this piece on automated red team testing for app security is a useful reference point for how teams turn episodic testing into a release guardrail.
The important shift is cultural. Teams stop asking, “Did we pass the pen test?” and start asking, “What changed since the last safe state, and did any attacker path reopen?” That's a much healthier question.
Reporting Findings and Guiding Remediation
A weak report kills momentum. Engineers don't need a dramatic narrative. They need enough evidence to reproduce the issue, enough context to rank it correctly, and enough guidance to fix it without guessing.
That matters because closure rates are often poor. Organisations remediate only about 50% of the vulnerabilities found in a penetration test on average, and large enterprises fail to resolve 45.4% of issues within 12 months, as noted earlier in the same Evalian source. The lesson isn't that teams don't care. It's that unclear findings stall.

What a report should contain
A report for a startup should be split into two layers.
First, give leadership a short summary:
- What objective was achieved
- What data or function was exposed
- What part of the stack failed
- What the operational consequence is
Then give engineers the technical record:
-
Preconditions
What role or account type was required? Was the issue reachable from web, mobile, or both? -
Reproduction steps
Keep them exact and brief. Include the sequence, not just a screenshot. -
Evidence
Show the minimum proof needed. Request and response pairs beat long prose. -
Root cause
Name the failing assumption. For example, “edge function trusted tenant_id from the client” is better than “authorisation issue”. -
Remediation guidance
Suggest the fix in the same technical language the team uses. For Supabase, that may mean a tighter RLS predicate or moving ownership checks server-side. For Firebase, that may mean rewriting a rule to derive access from auth context instead of path guesswork.
The report format developers actually use
Developers respond better when each finding answers three questions immediately:
- Can I reproduce this?
- Can I see why it happened?
- Do I know what to change first?
Good reporting reduces debate. Bad reporting creates meetings.
The most effective remediation advice is concrete. If a policy is broken, show the policy pattern that should replace it. If a function trusts the client, point to the parameter that must be derived on the server. If a mobile build leaks sensitive config, name the file or build step that introduced it.
A red team pen test is only finished when the fix is verified under the same attack path that proved the weakness in the first place.
If you're building on Supabase, Firebase, or mobile backends and want a faster way to catch exposed RLS rules, risky RPCs, leaked secrets, and regression issues before release, AuditYour.App gives teams a practical way to continuously test the attack paths that matter in modern app stacks.
Scan your app for this vulnerability
AuditYourApp automatically detects security misconfigurations in Supabase and Firebase projects. Get actionable remediation in minutes.
Run Free Scan