penetration testing processapp securitysupabase securityfirebase securitystartup guide

Penetration Testing Process: Guide for Startups & App Teams

Learn the penetration testing process. Essential guide for startups & app teams to plan, test, report, & integrate security seamlessly into CI/CD for 2026.

Published May 26, 2026 · Updated May 26, 2026

Penetration Testing Process: Guide for Startups & App Teams

You're probably here because one of three things just happened.

A prospect asked for a pentest report before they'll sign. Your mobile app is about to launch and you don't fully trust your backend rules. Or your team is moving fast on Supabase, Firebase, edge functions, and third-party SDKs, and you know a once-a-year security review won't keep up with weekly releases.

That's where the penetration testing process gets misunderstood. Startup teams often picture a dramatic one-off event where someone “hacks the system” and dumps a giant PDF on the table. In practice, the useful version is far more disciplined. It's a scoped, authorised, evidence-based exercise that helps you answer a simple question: if an attacker targeted this app stack today, what could they do?

For modern app teams, that question rarely starts with an exposed server. It usually starts with a leaky frontend bundle, an over-permissive storage bucket, a weak Row Level Security policy, an API that trusts the wrong client signal, or a mobile app that exposes too much about the backend it talks to.

Beyond the 'Hacker' Myth What a Pentest Really Is

A startup gets through technical due diligence, legal is nearly done, and then the buyer asks for a pentest report dated within the last 12 months. Now the team has a deadline, a production system they do not want disrupted, and a vague idea that a pentest means "hire someone to hack us."

That framing causes bad buying decisions.

A real pentest is a controlled security exercise with a defined scope, written authorisation, evidence collection, and a report tied to remediation. The goal is not to create drama. The goal is to answer a practical question: what can a capable attacker do against this app, with this architecture, under realistic conditions?

For modern stacks, that usually means testing trust boundaries that scanners only partially understand. Supabase and Firebase projects often look clean at the infrastructure layer while still exposing risk through misconfigured storage rules, weak Row Level Security, callable functions that trust client input, insecure admin flows, mobile API abuse, or data exposed through a combination of minor issues.

A pentest validates exploitability, not just exposure

Automated scanning still matters. Good teams should run it continuously. It catches known issues fast, helps keep pace with weekly releases, and gives engineers a baseline.

But scanners stop at detection. Pentesting starts where detection leaves off.

A scanner can tell you a bucket is public, an endpoint is reachable, or a key appears in a client bundle. A tester checks whether that access leads to user data, privilege escalation, account takeover, billing abuse, or cross-tenant exposure. They also test chained paths that matter in real incidents. An exposed mobile config plus weak backend rules is a very different problem from an exposed mobile config on its own.

That is the practical difference between vulnerability management and penetration testing. HGC IT Solutions explains that distinction well.

The useful version is methodical

Good pentests are disciplined. They are scoped to the assets and flows that matter, executed with agreed rules, and written so engineering can fix the issues without guessing what the tester meant.

For a startup CTO, the value is usually not "proof that we got hacked." It is proof of which weaknesses are exploitable in your environment, how serious they are for your business, and what should be fixed first. That matters more on modern app stacks because risk often sits in configuration, integration logic, and auth design rather than a single unpatched server.

A one-off pentest still has a place before a major launch, enterprise procurement review, or architecture change. It just should not be treated as the whole security programme. The stronger model is continuous scanning in CI/CD, targeted manual testing around sensitive flows, and retesting after meaningful changes. That gives fast-moving teams a security process that matches how they ship software.

Phase 1 Planning Scoping and Legal Agreements

Most penetration tests succeed or fail before anyone touches the target.

If the scope is vague, the results will be vague. If the legal paperwork is sloppy, nobody knows what the tester is authorised to do. If the rules of engagement are missing, you get delays, false alarms, or a report full of findings that don't match the risk you care about.

Start with risk, not assets

The UK NCSC frames penetration testing as a risk-validation exercise used to identify the technical risk arising from software and hardware weaknesses, and describes a typical engagement as initial engagement, scoping, testing, reporting, and follow-up, with issues assigned a severity rating for remediation prioritisation in its penetration testing guidance.

That means scoping shouldn't begin with a list of systems alone. It should begin with the question: what do you need validated?

For a modern app team, that usually means choices like these:

  • Customer-facing app flows: Signup, login, password reset, billing, admin paths, invite systems.
  • Backend trust boundaries: APIs, serverless functions, storage rules, database access patterns, webhooks.
  • Sensitive data paths: Where user data is read, written, exported, cached, or processed by third parties.
  • Privileged surfaces: Admin dashboards, support tooling, internal APIs, scheduled jobs.

A narrow scope can be right if it matches business risk. A broad scope can waste time if nobody can remediate the findings.

Choose the right testing model

White-box, grey-box, and black-box testing aren't academic labels. They change what the tester can realistically cover.

When white-box makes sense

White-box works well when speed matters and you want depth. The tester gets architecture details, credentials, code snippets, or design docs. For Supabase and Firebase builds, that often helps uncover rule logic, auth assumptions, and dangerous backend exposures quickly.

When grey-box is usually the best fit

Grey-box is often the most practical startup option. The tester gets limited insider context, such as a test account, API documentation, and a high-level architecture map. That's enough to avoid wasting time on basic discovery while still testing how a realistic attacker could abuse an application.

When black-box is worth the effort

Black-box testing is useful when you want an outside-in view that simulates an external attacker with minimal prior knowledge. It's good for public attack surface validation, but it can spend more time on reconnaissance and less on deep application logic.

Practical rule: If your biggest risk is backend configuration and authorisation logic, a pure black-box exercise often misses the point. Give the testers enough context to reach the parts that matter.

Put the Rules of Engagement in writing

A solid Rules of Engagement document should answer questions before anyone asks them mid-test.

Include:

  1. In-scope targets. Domains, apps, APIs, cloud projects, mobile packages, functions, environments.
  2. Out-of-scope areas. Production payment processors, social engineering, third-party systems you don't control, denial-of-service activity.
  3. Permitted techniques. Auth testing, privilege escalation, business logic abuse, mobile reverse engineering, storage access validation.
  4. Testing windows. Business hours, evenings, release freezes, blackout periods.
  5. Named contacts. A technical contact, an escalation contact, and someone who can approve emergency decisions.

Don't treat the contract as admin overhead

The legal agreement protects both sides. It confirms authorisation, confidentiality, ownership of findings, and the limits of acceptable testing. It also reduces the chance of procurement chaos later, especially if your customer wants to review the engagement basis.

If your team is tightening vendor and testing paperwork in parallel, this guide on how to reduce contract risk is a useful companion read because many of the same process failures show up in security engagements.

Phase 2 Discovery in Modern App Backends

Traditional discovery starts with ports, hosts, and service banners. Modern app discovery starts somewhere else entirely. It starts in the app bundle, the browser, the API traffic, the auth flow, and the backend rules that developers assume are private.

That's why the penetration testing process needs to change for mobile-first and BaaS-heavy systems. A high-level process is still useful, but it doesn't tell you how to assess a stack where exposure sits in configuration, trust assumptions, and authorisation logic rather than classic infrastructure.

As noted in a Strike Graph discussion of pen-testing phases, this is an under-covered problem for Supabase, Firebase-style backends, and mobile integrations.

Phase 2 Discovery in Modern App Backends

What testers actually look for first

On a modern app, discovery often begins with passive analysis.

That includes reading public JavaScript bundles, unpacking mobile apps, mapping API calls, reviewing exposed configuration values, and identifying third-party integrations. A lot of useful attack surface appears before active exploitation starts.

Common targets include:

  • Frontend bundles with leaked secrets: Not every exposed key is critical, but many reveal backend structure, project identifiers, or service relationships.
  • API endpoint enumeration: Public, undocumented, versioned, internal, or deprecated routes often expose more than product teams realise.
  • Storage and object access: Public buckets, predictable file paths, and weak access controls can leak user content or internal exports.
  • Backend functions and RPCs: Developers may expose operations that were meant for trusted server-side use only.

Where Supabase and Firebase stacks go wrong

The biggest issues on these platforms usually aren't dramatic bugs. They're policy mistakes.

A Row Level Security rule may allow reads wider than intended. A write policy may trust a client-supplied field. A storage rule may assume authentication equals authorisation. A callable function may skip ownership checks because the developer expected the frontend to enforce them.

That's why generic infrastructure-led pentesting often misses the most serious risk on these stacks.

Misconfiguration beats complexity

A simple misconfigured rule can matter more than a complex exploit chain. If a logged-in user can read another customer's records by changing an identifier, the business impact is obvious. The same goes for writes, deletes, or metadata access that should have been tenant-scoped.

Mobile apps expose backend clues

iOS and Android apps often reveal endpoints, project references, API usage patterns, feature flags, and assumptions about trusted roles. Even when credentials aren't directly usable, they help the tester build a map of the backend and identify places where authorisation may be weak.

The most valuable discovery work on modern apps often looks less like network probing and more like careful product archaeology.

Automation has a clear role here

This is one of the few parts of the penetration testing process where automation can save a lot of manual effort. Tools can rapidly identify public artefacts, leaked keys, exposed routes, insecure storage patterns, and suspicious backend configurations.

If your team is trying to understand how lightweight backend platforms change architecture decisions, this explainer on how Goptimise simplifies startup backend is useful context because simpler development models often concentrate security risk into configuration and integration choices.

For API-specific testing depth, teams should also understand how endpoint validation differs from generic app testing. This guide to API penetration testing for modern app backends is a good reference point.

Phase 3 Exploitation and Business Impact Analysis

Discovery tells you where the doors are. Exploitation tells you whether they open.

That difference is the whole reason a pentest exists. A finding without proof is often deprioritised. Teams look at it, debate edge cases, and leave it in the backlog. When a tester shows the actual path from weakness to impact, the conversation changes fast.

Proving the weakness without causing damage

Envision an office with an unsecured side door. A scanner indicates the lock's weakness. A tester then verifies if that door provides access to the kitchen or the finance room.

On a modern app stack, exploitation often looks like this:

  • Logging in as a normal user and accessing another tenant's records through a weak RLS rule.
  • Modifying a request to write data into records that should be read-only.
  • Calling a backend function directly instead of through the intended client flow.
  • Pulling files from storage that should require tighter ownership checks.
  • Reusing mobile app information to reach internal or undocumented API functionality.

The tester's job isn't to break things for drama. It's to demonstrate exploitability safely and document evidence clearly.

Why post-exploitation matters

A single flaw rarely tells leadership what to fix first. Business impact does.

A good tester asks the next question after successful exploitation. If this works, what follows from it? Can an attacker extract all customer data from one table, or only a narrow slice? Can they alter account state, create fraudulent actions, or disrupt service? Can they pivot from one user role to another?

That's where severity becomes meaningful. The issue isn't “RLS policy too permissive”. The issue is “a standard authenticated user can read another customer's order history”. Or “an attacker can invoke an administrative RPC from the public app context”.

If the report can't explain what an attacker could actually read, change, or abuse, the finding isn't finished.

What works and what doesn't

What works:

  1. Controlled proof. Show the smallest successful exploit that proves impact.
  2. Realistic chaining. Combine low-level issues only when the chain is credible.
  3. Evidence tied to business context. Link the exploit to data exposure, privilege abuse, or operational risk.

What doesn't:

  • Dumping theoretical vulnerabilities with no validation.
  • Performing destructive actions when a non-destructive proof would do.
  • Reporting technical detail without explaining user or business consequences.

This phase is where a technical exercise becomes a decision-making tool. It gives the CTO a defensible answer when someone asks, “What happens if this is exploited?”

Phase 4 Effective Reporting and Remediation Guidance

A startup CTO usually reads the report after the test, not during it. If the document does not make risk, ownership, and next actions obvious in a few minutes, the engagement loses momentum.

That happens a lot with modern app stacks because the failure mode is rarely a single server bug. It is a weak RLS policy in Supabase, an overexposed Firebase rule, a callable function that trusts the client, or a mobile app secret that should never have shipped. Reporting has to translate those issues into decisions an engineering team can act on in the next sprint.

Phase 4 Effective Reporting and Remediation Guidance

What a startup-friendly report looks like

A useful report serves two audiences at once. Leadership needs a plain-language view of exposure and urgency. Engineers need enough detail to reproduce the issue, fix it correctly, and verify the fix.

A practical structure looks like this:

Executive summary

This should answer four questions quickly:

  • What was tested
  • Which risks were confirmed
  • Which findings deserve immediate attention
  • What the team should fix first

Technical findings

Each finding should include:

  • A specific title
  • The affected asset, endpoint, function, rule, or flow
  • Clear reproduction steps
  • Evidence
  • Business impact
  • Concrete remediation guidance
  • Suggested validation steps after the fix

Method and limitations

This section prevents confusion later. If the test did not include source review, mobile binary analysis, production-like data, or authenticated roles beyond a standard user, say so plainly. For Supabase and Firebase environments, scope limits matter because a lot of risk sits in configuration, client exposure, and privilege boundaries rather than in traditional perimeter assets.

Prioritise by fix effort and blast radius

Busy teams do not need a long list sorted by CVSS alone. They need to know what can expose customer data, break tenant isolation, or grant backend actions from the wrong trust boundary.

A simple prioritisation matrix works better:

| Impact / Likelihood | Low | Medium | High | |---|---|---|---| | Low | Monitor | Schedule fix | Prioritise soon | | Medium | Schedule fix | Prioritise | Urgent | | High | Prioritise | Urgent | Immediate action |

Keep the matrix simple. The core value comes from the discussion behind it.

For example, a low-complexity RLS bypass on a multi-tenant table usually outranks a noisy version disclosure. A leaked mobile API key may rank lower if it is meant to be public and properly restricted, but rank much higher if it opens direct access to privileged backend functions. Good reporting makes those trade-offs explicit.

Remediation guidance should tell the team what to change

“Tighten access controls” is not enough. Engineers need direction that maps to the stack they run.

Useful guidance looks like this:

  • Restrict an RLS policy so reads and writes are bound to the authenticated user's tenant ID or resource ownership field.
  • Move privileged operations out of client-callable paths and enforce authorisation on the server side.
  • Rotate an exposed secret, remove it from the mobile build or public bundle, and audit logs for prior misuse.
  • Split service-role access from user-context access so background jobs and public clients cannot hit the same trust boundary.
  • Add automated checks that exercise critical authz paths after deployment, especially around RPCs, storage rules, and admin functions.

The report should also separate quick containment from proper remediation. Rotating a key contains exposure. Fixing why the key was shipped to the client prevents the same issue from coming back next month.

If you want a useful benchmark, review this guide on what makes a good penetration test report before accepting deliverables from a vendor or consultant.

What I look for in a report: can an engineer reproduce the issue, can a manager understand the risk, and can the team verify the fix without reopening the whole test?

Phase 5 Retesting and Continuous Security in CI/CD

A penetration test report is not the end of the penetration testing process. It's the midpoint.

Value emerges after remediation, when someone verifies that the fixes work and that the change didn't break something else. For fast-moving app teams, this situation highlights why the old annual pentest model starts to look weak.

A 2024 industry estimate put the average cost of a penetration test at $18,300, and a 2025 statistics roundup reported that organisations remediate only about 50% of vulnerabilities found in penetration tests on average, with large enterprises failing to resolve 45.4% of discovered issues within 12 months, according to Cybersecurity Ventures' penetration testing statistics roundup. The problem isn't just finding issues. It's closing them.

Phase 5 Retesting and Continuous Security in CI/CD

Why retesting is non-negotiable

A team fixes the policy. Great. Did the tester confirm the read path is closed? Did the write path also get fixed? Did the developer accidentally widen access elsewhere while patching it?

Retesting answers those questions. Ideally, the original testers perform it because they already understand the exploit path, the evidence, and the intended risk reduction.

The one-off model breaks on modern release cycles

If you ship infrastructure changes occasionally, periodic testing may be enough. If you're changing edge functions, app bundles, auth flows, cloud rules, and third-party integrations every week, a single report goes stale quickly.

That doesn't mean manual pentesting stops mattering. It means manual testing should focus on what humans do best:

  • business logic abuse
  • exploit chaining
  • trust boundary analysis
  • product-specific authorisation flaws

Automation should cover the repeatable checks between major reviews.

What belongs in CI/CD

Automated scanning is useful for:

  • leaked secrets in bundles or app builds
  • exposed endpoints and configuration drift
  • insecure storage settings
  • suspicious RPC or function exposure
  • regressions in known risk areas

One option for this kind of stack-specific coverage is AuditYour.App, which scans Supabase, Firebase, websites, and mobile builds for issues such as exposed RLS rules, public RPCs, leaked API keys, and hardcoded secrets. That kind of tooling fits best as a continuous check between deeper manual assessments. For a broader view of the trade-offs, this guide on automated penetration testing in CI/CD is a useful reference.

The model that works in practice

The most effective pattern for startups is usually hybrid:

  1. Manual pentest at meaningful points. Major releases, enterprise deal support, architectural changes.
  2. Automated scanning between releases. Catch regressions and obvious exposures early.
  3. Formal retesting after fixes. Verify closure of important findings.
  4. Residual risk tracking. Keep unresolved issues visible rather than pretending the report solved them.

That's a much healthier security lifecycle than buying a report once, forwarding it to engineering, and hoping the important bits get fixed.

Your Actionable Penetration Test Checklist

If you're preparing for your first serious engagement, keep it simple. The biggest wins come from choosing the right scope, getting the right access ready, and making sure the report can turn into actual engineering work.

Your Actionable Penetration Test Checklist

Questions to ask your testing vendor

  • Have you tested this stack before? Ask specifically about Supabase, Firebase, mobile backends, storage rules, serverless functions, and API-heavy applications.
  • How do you handle authorisation testing? You want to hear about RLS, tenant isolation, role boundaries, and business logic abuse, not just generic scanning.
  • What do you need from us? Good testers should tell you whether they need docs, test accounts, mobile builds, architecture diagrams, or staging access.
  • What does the report include? Ask for sample structure. You want executive summary, evidence, reproduction steps, business impact, and concrete remediation guidance.
  • Do you offer retesting? If they don't validate fixes, the engagement leaves too much uncertainty.

Internal preparation steps

  • Define the exact scope. List apps, APIs, mobile builds, backend projects, and environments clearly.
  • Prepare safe access. Create test users, short-lived credentials, and any required access paths in advance.
  • Notify the right people. DevOps, support, engineering leads, and incident responders should know testing is scheduled.
  • Back up what matters. Especially if any validation will occur near production-like data or operational workflows.
  • Set a communication path. Testers need a fast way to report critical findings or ask scope questions.
  • Plan remediation ownership. Decide who triages findings, who fixes them, and who signs off before the report even arrives.

Teams get more value from a modestly scoped pentest they're ready for than from a large engagement they can't operationalise.

The penetration testing process works when it becomes a routine part of shipping software, not a last-minute scramble before procurement.


If you're building on Supabase, Firebase, or a mobile backend and want a faster way to catch misconfigurations before a formal engagement, AuditYour.App gives teams a practical starting point. You can scan a project URL, website, or mobile build to identify exposed RLS rules, public RPCs, leaked keys, and hardcoded secrets, then use those findings to clean up the obvious risks before manual testing or to support a continuous security workflow.

Scan your app for this vulnerability

AuditYourApp automatically detects security misconfigurations in Supabase and Firebase projects. Get actionable remediation in minutes.

Run Free Scan