Your SaaS Stack Is Training AI Models Right Now — And You Have No Visibility

Martin Snyder
9 hours ago
4 min read

Your organization’s data is likely being used in AI workflows today—whether or not you have explicitly approved it.

The Claim Most Security Teams Push Back On

“Your SaaS stack is training AI models right now.”

At first glance, that sounds exaggerated.

Most organizations assume they have this under control. Vendors say they do not train on customer data. Policies exist. Contracts are signed.

But when you move beyond assumptions and start asking precise questions, the certainty disappears quickly.

Because the issue is not whether some vendors train on your data.

It is whether you can confidently prove that none of them do.

And in most cases, the answer is no.

AI Is Already Embedded Across Your SaaS Stack

AI is no longer confined to standalone tools. It is integrated into platforms your organization already trusts and uses daily.

Applications like Notion, Slack, and Microsoft 365 have introduced AI features that operate directly on user-generated content.

These capabilities include:

Summarizing documents and conversations
Generating content based on internal data
Providing recommendations and insights
Automating workflows

From a usability standpoint, this is a natural evolution.

From a data perspective, it introduces a new layer of processing—one that is often not fully understood by the organizations using these tools.

The Real Question: What Happens to That Data?

When employees interact with AI features inside SaaS applications, they are not just using functionality.

They are submitting data into systems that may:

Store prompts and outputs
Analyze interactions to improve performance
Share data with underlying AI providers
Retain information beyond the immediate session

The challenge is that these behaviors are not always clearly documented or consistently enforced.

Some vendors explicitly state that customer data is not used for training. Others allow opt-outs. Some rely on anonymization or aggregation. Others depend on third-party providers with separate policies.

From a governance perspective, this creates uncertainty.

Guidance such as the https://www.nist.gov/itl/ai-risk-management-frameworkemphasizes the importance of understanding how data is used throughout the AI lifecycle, including training, inference, and retention.

Yet most organizations do not have that level of visibility.

Where Visibility Breaks Down

There are three primary reasons why organizations lack clarity on this issue.

1. Vendor Language Is Conditional

Many SaaS vendors use precise but conditional language.

For example:

“We do not use customer data to train our models”
Followed by: “except for improving service quality”

Or:

“Data is not used for training”
But: “may be processed by third-party AI providers”

These distinctions matter, but they are difficult to interpret at scale.

2. AI Features Are Introduced Incrementally

AI capabilities are often added through product updates.

A tool that was previously evaluated as low risk may now include AI features that process data differently.

These changes rarely trigger re-evaluation within security or compliance workflows.

As a result, organizations continue to trust tools based on outdated assumptions.

3. Discovery Is Incomplete

Most organizations do not have a complete inventory of their SaaS environment.

Applications introduced through:

Email-based signups
OAuth connections
Freemium usage

may never be formally reviewed.

This means that even if you have strong processes for evaluating known vendors, there is an entire category of tools operating outside that process.

This is where Shadow AI intersects with data risk.

The Difference Between Assumed and Verified Safety

There is a critical distinction between:

Assuming vendors do not train on your data
Verifying that they do not

Verification requires:

Reviewing AI-specific documentation
Understanding data flow and retention
Identifying third-party dependencies
Confirming available controls and opt-outs

In practice, very few organizations perform this level of validation across their entire SaaS stack.

This creates a gap between perceived and actual risk.

Why This Matters for Compliance

Frameworks such as SOC 2 require organizations to demonstrate control over how customer data is handled.

As AI becomes part of SaaS workflows, this includes:

Understanding where data is processed
Managing third-party risk
Ensuring appropriate controls are in place
Maintaining accurate system inventories

Without visibility into AI usage and data handling, these requirements become difficult to meet.

Guidance from https://www.cisa.gov/resources-tools/resources/ai-cybersecurity-guidelinesreinforces the need for transparency and oversight in AI-enabled systems.

The Practical Reality

Most organizations today are in the same position:

They rely on SaaS platforms with embedded AI
They trust vendor statements at a high level
They lack detailed visibility into data usage
They do not have a complete inventory of AI-enabled tools

This does not mean that data is definitively being misused.

It means that the organization cannot prove that it is not.

And from a risk perspective, that distinction matters.

Where Waldo Security Fits

Waldo Security addresses the underlying visibility problem.

By discovering SaaS applications through email signals and OAuth connections, and identifying where AI is being used, Waldo Security provides a comprehensive view of the environment.

This allows organizations to:

Identify all SaaS applications interacting with organizational data
Detect AI usage across both known and unknown tools
Prioritize vendors for deeper verification
Align governance and compliance efforts with actual usage

Waldo Security operates with a privacy-first approach. It does not train AI models on customer data and focuses exclusively on metadata analysis.

Conclusion

The statement that your SaaS stack may be training AI models on your data is not meant to be alarmist.

It is meant to highlight a gap.

AI is already embedded across SaaS applications. Data is already flowing into these systems. Policies exist, but they are often complex, conditional, and difficult to verify at scale.

The risk is not necessarily that vendors are misusing data.

The risk is that organizations lack the visibility to know for certain.

In 2026, that uncertainty is one of the most significant challenges in SaaS security.

Because the question is no longer whether AI is being used.

It is whether you understand what it is doing with your data.

To explore how organizations are gaining visibility into SaaS and AI usage, visit:

https://www.waldosecurity.com/2025-saas-and-cloud-discovery-report