We Audited a Vibe-Coded SaaS Product and Found 9 Critical Issues

Most articles about vibe-coding start the same way: here is the exciting new thing, here are the risks, here is how to be careful. We are going to start somewhere different: we worked with a real product and received a client request for a full technical review.

There is a gap in software development that 2025 made much wider, and almost no one is talking honestly about it. On one side: the ability to build a working product. AI coding tools have genuinely democratized this. A founder with no engineering background can describe an application in plain language and have something functional running within days.

On the other side: the ability to run that product safely in production. Under real load and against real attackers. Inside real regulatory frameworks and with real customer data at stake.

The risk is that speed, applied to software development through AI generation tools, now moves faster than the processes that traditionally made software safe, such as code review, architecture oversight, security testing, and compliance checks. Those processes did not disappear because they were inefficient. They existed because production software has to survive things that a demo does not: real traffic, malicious users, regulatory audits, hardware failures, and edge cases that nobody thought to test.

When a citizen developer builds a product in 4 months using AI-driven development and calls it ready to scale, they are not wrong to say they built something. They are missing one question: ready to scale to what?

What Is Vibe-Coding, and Why Is Everyone Doing It?

The term was coined by AI researcher Andrej Karpathy in early 2025 and spread quickly because it named something developers were already quietly doing. Vibe-coding means leaning fully into AI generation: you describe what you want, accept the output, test it manually, and iterate with more prompts.

You don’t need a cross-functional team to build an MVP. AI tools like ChatGPT, Cursor, and GitHub Copilot have genuinely lowered the barrier to shipping software. The problem isn’t that vibe-coding produces bad code in the narrow sense. Often, it produces readable code. The problem is that AI models are trained to satisfy the prompt in front of them. They write the feature you described. They do not architect the system to ensure the features work and make the user journey engaging.

An Example Flow:

Idea Stage

Define what you want to build and who may need your solution.

Describe the app, users, and core functionality.

↡↡

Create a Prompt

Explain the product in natural language.
Provide features, tech stack, and design preferences.

↡↡

AI Code Generation

AI generates the first version of the app.

Creates components, APIs, and layout automatically.

↡↡

Review and Adjust

Evaluate the generated app.

Ask the AI to improve UI, logic, or features.

↡↡

Debugging and Testing

Identify and fix issues.

Paste errors or unexpected behavior to AI.

↡↡

Iterate and Extend

Add new features quickly.
Use prompts to expand functionality.

↡↡

Deploy the App

Launch your application.

AI helps configure hosting and CI/CD.

↡↡

If you’re unsure whether your product is production-ready

At LaSoft, we help founders and product teams identify and fix these risks early.

Vibe-Coding vs. Traditional Development: What Actually Changes

In traditional development, programmers are responsible for every line of code they write. They choose the approach, understand the logic, and can explain why a particular decision was made. When something breaks in production, they handle it. When a security concern is raised, someone from the team knows if the system is vulnerable. The process may seem slow, but it’s fundamental due to brainstorming meet-ups, code review, architecture discussions, pull request comments, and documentation.

Vibe-coding changes the whole process. The AI is your team that produces implementation. The gap between “I described in a prompt what I wanted” and “I understand what was built” can be significant, and in production environments, that gap is where incidents live.

This does not mean vibe-coding is categorically worse. For prototyping, for internal tools, for early validation of a product idea, the speed advantage is real, and the stakes are low enough that the gap is manageable. The problem arises when a product built at prototype speed is treated as production-ready without closing that gap through review, testing, and deliberate engineering oversight.

The table below captures the key differences:

Aspect	Traditional Development	Vibe-Coding
How code gets written	The developer writes every line manually, with direct knowledge of what is being built and why	The professional or citizen developer prompts an AI in natural language; the AI generates the implementation
Required technical skill	Strong programming background, language fluency, framework knowledge	Low barrier as it is accessible to non-developers and junior teams
Development speed	Methodical; velocity increases with experience and planning	Fast from day one; weeks of work can compress into days
Code ownership	Developer fully owns and understands the code they write	Ownership is often unclear; developer may not understand what was generated
Security posture	Security decisions are made explicitly at each step	Security gaps emerge by default when prompts don’t specify security requirements
Architecture	Designed deliberately before and during development	Emerges organically from prompts; often inconsistent across the codebase
Testing	Tests written alongside or before code (TDD); regressions caught early	Manual testing dominates; automated tests rarely generated unprompted
Debugging	Developer has full context; can trace and reason through failures	Debugging is harder when you didn’t write or fully read the code
Technical debt	Accumulates gradually; visible through code review	Can accumulate rapidly and invisibly; debt often only discovered at audit or incident
Documentation	Produced alongside development; explains decisions	Typically absent; AI generates code but not rationale
Compliance and regulation	Can be designed in from the start with explicit requirements	Rarely considered during generation unless specifically prompted
Best suited for	Production systems, long-lived codebases, regulated industries, large-scale platforms	Prototypes, MVPs, internal tools, early validation, hackathons

The most important thing this table illustrates is not that one approach is superior to the other. They are optimized for different outcomes. Traditional development is optimized for systems that need to survive in production for years. Vibe-coding is optimized for quickly getting to a working prototype.

The mistake that creates the kinds of incidents we describe later in this article is not choosing vibe-coding. It is treating a vibe-coded prototype as a production system without the deliberate engineering work required to close the gap between the two columns above.

The Product We Audited

The client had built a B2B SaaS platform for workflow automation, targeting mid-size logistics companies. The founding team consisted of two people: one with a product background and one a junior developer. Over four months, they built the entire product using a ChatGPT-first approach. Describe a feature, review the output, paste it in, and move on.

By the time they came to us, they had roughly 18,000 lines of generated code, a working demo, and three paying pilots. The client asked us to do a software audit to confirm what they already believed: that they had an MVP ready to scale.

We spent some days on the audit. Our experts used static analysis tools, manual code review, threat modeling, and load simulation. We also reviewed the full development history: commit messages, deployment logs. We found nine critical issues we want to share with you.

The 9 Critical Issues

1. Production API Keys Exposed in the Public JavaScript Bundle

Third-party credentials for a payment processor and an email service provider were embedded in the React frontend as environment variables prefixed with REACT_APP_. Any variable with that prefix is compiled into the public JavaScript bundle at build time and shipped to every user’s browser. Any person who opened DevTools could extract live production credentials in under a minute, then use them to send bulk email from the company’s domain or trigger payment API calls directly.

This is a pattern that appears frequently in AI-generated frontend code because the model produces working integrations without flagging that client-side credential exposure is a separate and serious problem. The integration working and the integration being safe are two entirely different things.

Read:

“Web Security for Developers” by Malcolm McDonald;
“The Tangled Web: A Guide to Securing Modern Web Applications” by Michal Zalewski.

2. No Database Transactions on Multi-Step Operations

Several core business workflows, such as user onboarding, order creation, and subscription activation, involved multiple sequential database writes. None were wrapped in transactions. If the process was interrupted at any point (a server crash, a network timeout, an unhandled exception), the database would be left in a partial state.

In three months of limited production use, this had already produced two “ghost accounts”: users who existed in one table but not another, which caused confusing behavior that the team had noticed but could not explain. At scale, with hundreds of onboarding events per day, the resulting data corruption would have required significant manual intervention to diagnose and repair.

3. N+1 Database Queries on Every List View

The AI-generated ORM patterns across the data layer produced classic N+1 query chains. Loading a dashboard with 50 workflow records triggered 51 database queries: one to fetch the list, then one additional query per record to load its associated metadata.

This is invisible in development, where the dataset is small. Under real traffic with hundreds of records, each dashboard load triggers hundreds of database calls. Our load test with 200 simulated concurrent users brought the database server to 100% CPU utilization within 90 seconds. At anything resembling commercial scale, the product would have become unusable before the team understood why.

4. Payment Webhooks Without Idempotency

The payment provider integration processed incoming webhooks by immediately executing business logic: charging the customer, updating subscription status, and sending a confirmation email. There was no idempotency key validation and no deduplication logic.

Payment providers routinely retry webhook delivery when they do not receive a timely 200 response — a slow server, a momentary error, a deployment in progress. Without idempotency, each retry executes the business logic again. A single payment event with three retries equals three charges, three subscription updates, and three confirmation emails. This was a double-billing scenario waiting for the first production load event to trigger it.

5. Single-Server Architecture with No Backup and No Recovery Plan

The entire application ran on one VPS: web server, application code, database, file storage, and background jobs. There were no automated backups. No health checks. No process restart policies. No monitoring. No alerting.

When that server fails, the product goes offline entirely with no automated recovery. Without recent backups, data loss is permanent. For a B2B SaaS product with paying customers and investor meetings on the calendar, this was not a theoretical risk. It was a scheduled incident with no mitigation.

6. Unrestricted File Upload

Users could upload files as part of the product’s workflow features. The upload endpoint accepted any file type, stored files under predictable names in a publicly accessible directory, and performed no content scanning.

The attack surface here is broad: uploading executable files, overwriting existing files using predictable naming, using the server as a malware distribution host, and denial-of-service through large file uploads. The endpoint was accessible to any authenticated user.

7. No Server-Side Input Validation

API endpoints accepted and persisted user-submitted data with minimal server-side validation. Numeric fields accepted arbitrary strings. Date fields accept free-form text. Fields marked as required could be submitted as null and would be written to the database. The frontend performs validation, but frontend validation is trivially bypassed by anyone using a direct API call. Several endpoints also constructed database queries using unsanitized input, introducing SQL injection vectors.

8. GDPR Non-Compliance

The product processed personal data belonging to EU business contacts as part of its workflow automation features. There was no data export mechanism for users, no right-to-erasure implementation, no data processing agreements, and no audit log of who had accessed or modified which data.

For a product actively targeting European logistics companies, this was not just a technical gap. It was a compliance exposure that would block enterprise procurement reviews, create material legal liability, and, in the event of a complaint, result in regulatory fines. GDPR compliance is mandatory, not optional, and fitting it into an existing system is significantly more costly than building it in from the start.

9. No Tests and No Documented Business Logic

The codebase contained zero automated tests. No unit tests, no integration tests, no end-to-end tests. Critical business logic: pricing calculations, permission rules, workflow state transitions, all of it existed only as inline code with no comments and no documentation.

Changing anything in the codebase required manual verification of the entire application because there was no safety net to catch regressions. Onboarding a new developer was practically impossible. Refactoring any core module carried the risk of breaking unrelated functionality in ways that would only surface in production. This is not a matter of best practices: it is the difference between a codebase that can be maintained and grown, and one that becomes a liability the moment someone needs to change it.

Discover hidden risks before they turn into costly incidents.

At LaSoft, we help founders and product teams identify and fix these risks early.

Nothing Against Vibe-Coding

The issues above are the predictable result of a development process optimized entirely for the speed of feature delivery, without any parallel processes for security, infrastructure, compliance, or system-level review.

Traditional development builds those reviews into the workflow through friction: pull requests, code review, staging environments, and deployment approvals. Vibe-coding removes that friction. The friction needs to come back as development or as a structured audit before you scale, fundraise, or sign your first enterprise deal. It means the audit step matters more than it ever did before.

When a Software Audit Is the Right Call

A technical audit is not only for products in trouble. It is a standard step at specific points in a product’s life, and understanding those moments can prevent high costs and reputational damage.

Before investor due diligence. Technical reviewers hired by investors will look at your codebase. You want to know what they will find before they do. A credible remediation plan is a far stronger position than being surprised during diligence.

Before signing an enterprise customer. Enterprise procurement involves security questionnaires, compliance reviews, and sometimes direct codebase access. Products with the issues described above will fail these reviews. Knowing this ahead of time means you can fix the blockers or set realistic timelines.

After inheriting code from a previous team or vendor. Code built by someone else, using a process you didn’t oversee, carries unknown risk. An audit surfaces that risk before you build on top of it.

After any significant period of AI-assisted development. If a meaningful portion of your codebase was generated through vibe-coding without parallel security or architecture review, this article should give you a reasonable picture of what a structured audit might find.

Final Recap: Why Lasoft

LaSoft has been conducting software audits for startups, scale-ups, and enterprises since 2014. If you are preparing for an investment, scaling your product, taking on an enterprise customer, or simply want to understand what is actually in your codebase, talk to our team about what a review would cover.

FAQ

Is vibe-coding risky?

The risks come from deploying AI-generated code to production without the review and testing that would normally catch the issues described above. Used for prototyping and validated before production deployment, it is a genuine productivity tool.

How much does a software audit cost?

This varies with codebase size and scope. A focused audit of a startup-scale product typically runs five to ten business days. Lasoft scopes engagements based on an initial conversation about your stack, what you’re preparing for, and what you need the audit to cover.

We built a product with AI assistance and had Senior Engineers reviewing the code. Do we still need an audit?

Probably less urgently, but a structured review still adds value at key milestones. Senior engineers catch a great deal in code review, but systematic security testing, load simulation, and compliance review require dedicated time and tooling that typical development sprints don’t include.