Mediocre DLP Solutions Fall Short on Data Detection–And Here’s Why

Feb 15, 2022
6 minutes

Today’s Data Loss Prevention (DLP) customers need both the ability to tune policies and the flexibility to adapt configurations and deployments to meet their specific enterprise needs. Yet no one single DLP solution fits the needs of every enterprise. 

The complexity and resource-intensiveness that usually comes with legacy DLP solutions, makes them viable only for those enterprises that can put up with significant investments in time, people, and money. As for the majority of businesses, legacy DLP solutions are either not applicable or provide minimal data leakage protection because their extensive capabilities are too complex to configure, scale and maintain.

Adding to the aforementioned limitations of legacy DLP solutions, another issue that comes to the fore is the mediocrity of data detection. A DLP solution that offers only mediocre data detection creates too many false positives and therefore is never worth the investment.

What is a Best-in-Class Data Loss Prevention Solution?

One of the things that defines a best-of-breed DLP solution is its ability to detect sensitive data accurately. And data detection, in order to be highly efficient and therefore superior, must leverage a variety of detection techniques that identify both forms of data structured and unstructured.

A superior DLP solution must rely on out-of-the-box, yet granularly customizable policies that are based on several hundreds—if not thousands—of predefined data patterns. This would allow the identification of standard data formats such as country-based national IDs, banking numbers, passport numbers, tax IDs, localized address constructs and other standard PII formats, but also source codes, secret keys, and even cover things like common blasphemous, homophobic, racial, sexual language and many other types of descriptive commonly identifiable data.

It would also allow the use of compliance-related policies for regulation such as GDPR, CCPA, HIPAA and many others. Descriptive content detection works well only if it’s context aware, meaning that only the textual context about and around that pattern would allow to accurately distinguish for instance an actual social security number from any generic 9 digit number.

False positives can be dramatically minimized if an efficient DLP can truly grasp the context around the content. And this is a very important aspect that many data protection vendors neglect to provide because it is by far the most challenging. Developing highly sophisticated context-aware techniques requires high engineering investments.

Additionally, the type of exposure (e.g., public or internal), confidence levels, and precise context criteria (e.g., number of occurrences and pattern logic) are very important in order to reduce incidents and inaccurate detection.

As for exact data matching (EDM), it’s very important for a DLP solution to be able to fingerprint large structured data sources with thousands of records, to detect with speed, and to easily build detection policies that rely on multiple combinations of information such as a name + a credit card number. This mechanism enables accurate detection and monitoring of  specific sensitive data, and protects it from malicious exfiltration or loss. 

Designed to scale to very large data sets, EDM fingerprints known personally identifiable information (PII), like bank account numbers, credit card numbers, addresses, medical record numbers and other personal information stored in a structured data source, such as a database or a structured data file like a spreadsheet. This data is then detected across the entire enterprise, as it traverses the network edge or it is transferred by employees from remote locations, or even when it’s stored and shared on SaaS applications.

A best-in-class DLP should also scan many documents and file types, and even extract information from graphic formats like images of picture IDs, passports, credit cards et cetera, even when the content is not perfectly legible, via advanced Optical Character Recognition (OCR) algorithms.

User-based document tagging and manual data classification is also an important factor. When available, DLP needs to be able to detect such classification, read the document properties and apply the same protective actions based on corporate policy.

And lastly, let’s not forget the hybrid workforce model and the vast adoption of SaaS services that have made organizations dependent on a host of mission-critical collaboration applications like Slack, Teams, Zoom, Jira and Confluence. Today these collaboration apps are driving business agility because they keep employees connected anywhere they areall day, every dayfundamentally changing the way business is conducted.

Employees today send shorter, more frequent messages via these apps, use more screenshots than actual file attachments and their conversations consist of multiple posts, sometimes between more than two users. As one would expect, confidential information exchanged via these apps has become more unstructured and fragmented and therefore difficult to protect with legacy data protection tools. 

A new-age DLP solution adapts to these changes to protect unstructured sensitive data shared over collaboration apps. Through deep learning, natural language processing (NLP), and artificial intelligence models, it automatically identifies sensitive information within the context of unstructured users’ conversations. Ensuring high accuracy and fewer false positives, it automatically understands the context and the true meaning of a written conversation, including likelihood and misspelling.

It’s Time to Stop Spinning the Wheel

Security analysts get fed up with having to manually chase large numbers of false positive incidents that require deep and time-consuming investigations. To combat this situation, advanced machine learning is the present and the future of data protection because it makes data identification more accurate and simplifies detection. 

So shouldn’t a DLP solution be able to automatically understand the context and the true meaning of a written conversation, including likelihood and misspelling? Shouldn’t it also leverage user feedback to reliably detect true positives and learn continuously? 

Consistent protection is extremely important. Once detection policies are put into place, a best-in-class DLP solution should apply the same rules to detect sensitive data everywhere data is and flows, so data security teams don’t have to reinvent the wheel every time a new corporate environment such as— SaaS apps or IaaS, additional networks and users—is added to the DLP solution.

For this to happen, the DLP solution must deliver consistent policy from a single cloud-based engine making it easy to define data protection policies and configurations anywhere and applying them automatically and instantly to every location. 

An organization’s data is its most valuable asset. To successfully overcome data protection challenges, it’s crucial for companies to put strategy in place. Palo Alto Networks is here to help you do just that. Request a free trial of our DLP solution here. 

Subscribe to Network Security Blogs!

Sign up to receive must-read articles, Playbooks of the Week, new feature announcements, and more.