Transforming Data Security with AI-Powered Classification

May 22, 2024
5 minutes
... views

Handling and securing sensitive data is a practice fraught with potential pitfalls such as inadvertent leaks, compliance violations, and the ever-present threat of cyberattacks.

Modern organizations must manage ever-growing data volumes while striving for regulatory compliance and protecting sensitive information. Traditional data loss prevention (DLP) methods haven’t kept pace with today’s modern data landscape encompassing new data types, formats, and sources.

Legacy solutions can’t deliver the diverse classification techniques needed to support modern data security outcomes, often leading to inaccurate identification of sensitive data, false positives, or missed data breaches. Traditional data classification methods rely on predefined patterns that lack the context needed to identify leaks accurately.

Enter Palo Alto Networks Data Security with AI-powered discovery and classification, a game-changing solution revolutionizing how businesses discover, manage, and protect their data assets.

100+ Pretrained ML Classifiers Automate Discovery and Classification

Palo Alto Networks Data Security provides 100+ Deep Neural Network (DNN) based classifiers out-of-the-box to automate data discovery and classification.

These models train on diverse data corpora encompassing a wide range of languages to interpret semantics with contextual understanding for near-perfect accuracy. DNN classifiers for documents or text data are available across financial, healthcare, legal, and source code categories. Image-based machine learning (ML) classifiers are also available for categories such as driver’s licenses, passports, and national IDs. Analyzing images without extracting or scanning their content allows organizations to detect and protect personal information efficiently.

At Palo Alto Networks, a dedicated team of data scientists has worked diligently over the past several years, researching new ways and techniques to improve DNN and ML models. We’re proud to disclose that these efforts have come a long way, and we’re now at the fifth generation of DNN models.

Figure 1: Pretrained ML classifiers available as predefined document types

AI and ML Augmentation Increase Detection Accuracy

Pattern matching with only regular expressions and keywords can be prone to false positives. Our approach, however, augments these DLP data patterns with ML models.

These models then undergo training using diverse datasets, leveraging large language models (LLMs) to establish ground truth. With these AI-powered improvements, we’re seeing markedly accurate detections with more than a 90% reduction in potential false positives.

Today, over 250 out-of-the-box data patterns are augmented with ML and LLMs across personally identifiable information (PII), general data protection regulation (GDPR) requirements, financial, and other predefined categories.

Lastly, in the quest to deliver industry-leading accuracy for DLP detections, Palo Alto Networks has created a customer-driven feedback loop. Within the context of a data security incident, customers can report false positives for specific detections, along with its reasoning for context. This feedback loop and reporting mechanism allows us to analyze and re-train our ML models, leading to even higher accuracy.

Figure 2: False positive reporting in an incident

Customer-Trainable AI Models Protect Unique Intellectual Property

You almost assuredly put considerable time and resources behind protecting your organization’s “crown jewels”: data. Today, high-value data is considered digital currency, including trade secrets, financial projections, client lists, proprietary source code, and more.

Palo Alto Networks enables customers to train their own ML models with sample intellectual property (IP) to find exact matches or similar datasets based on custom thresholds. These models can be tuned and retrained to ensure high accuracy and expanded over time to cover new types of IP.

Figure 3: Creating your own trainable AI models

Palo Alto Networks data security revolutionizes data classification with context-aware, AI-powered discovery for high accuracy. It delivers a comprehensive view of data across SaaS, cloud, email, browser, and network soures with a unified data map.

AI-powered data classification is currently available on Enterprise DLP, our data security solution, and will also prevent sensitive data loss in AI Access Security. Prisma SASE and Next-Generation Firewall (NGFW) customers with Next-Generation CASB and DLP entitlements will also benefit.

Start your 90-day free trial of Enterprise DLP to discover the power of AI data security for yourself.

This blog contains forward-looking statements that involve risks, uncertainties and assumptions, including, without limitation, statements regarding the benefits, impact, or performance or potential benefits, impact, or performance of our products and technologies. These forward-looking statements are not guarantees of future performance, and there are a significant number of factors that could cause actual results to differ materially from statements made in this blog. We identify certain important risks and uncertainties that could affect our results and performance in our most recent Annual Report on Form 10-K, our most recent Quarterly Report on Form 10-Q, and our other filings with the U.S. Securities and Exchange Commission from time-to-time, each of which are available on our website at and on the SEC's website at All forward-looking statements in this blog are based on information available to us as of the date hereof, and we do not assume any obligation to update the forward-looking statements provided to reflect events that occur or circumstances that exist after the date on which they were made.


Subscribe to Sase Blogs!

Sign up to receive must-read articles, Playbooks of the Week, new feature announcements, and more.