How Can We Help?

Search for answers or browse our knowledge base.

Documentation | Demos | Support

< All Topics
Print

Why Use Custom Classifications

Purpose

As you have explored the Aparavi Platform and started setting it up as best fits your organization, you may have realized that the platform Classification System is quite extensive. You may have already read through the predefined classifications and custom classification documents within the support website. Both are great for exploring predefined classifications already available, covering the basic points on creating your own classifications. The purpose of this document is to outline just a few reasons why Aparavi users may wish to create their own classifications and how to accomplish that.

Custom Classification Creator

Overview

  • Complexity of existing classifications
  • Custom Rule Needs
  • Infinite Classification options

Complexity of Existing Classifications

When you start selecting classifications that fit your organization’s requirements for classifying your unstructured data, the results may vary. This is because the classifications are built using logical operators. Consider the U.S. Personal Data Policy for example:

Logical operators for U.S. Sensitive Data

As you can see, the classification logic has two “OR” components. The first OR statement requires any one of the following:

  • Date of Birth
  • E-mail Address
  • U.S. National Provider Identifier (NPI) number
  • North American Telephone Number
  • U.S. Postal Address – Form
  • U.S. Postal Mailing Address

And any one of the following from the second OR:

  • Credit/Debit Card Number
  • U.S. Medical Record Number (Generic)
  • U.S. Bank Account Number
  • U.S. Driver’s License Number
  • U.S. Individual Taxpayer Identification Number (ITIN)
  • U.S. Passport Card Number
  • U.S. Passport Number
  • U.S. Social Security Number (SSN)

The AND is what ties the OR statements together to complete the classification of U.S. Personal Data. With logical operators set up that way, many combinations could trigger the classification hit. Let’s use the Social Security Numbers search for example. The U.S. Personal Data classification will find it, but the classification policy for U.S. Social Security Number (SSN) and Taxpayer ID Policy would be a better option due to there being only one OR statement:

Logic for U.S. SSN and Taxpayer ID

Notice that it is simpler than the U.S. Personal Data Policy, but there are still a few rules that could trigger the classification of that file other than an actual SSN. This is where you may want to find data using only the SSN rule. You can open the Add Custom Classification form to easily create this, as shown below:

SSN only rule in creator

And the new classification will show in the View selection from the Classification list:

View of classification logic

It is a much simpler rule specifying exactly what to look for. Without the other rules that are part of the two defined classifications we can achieve a more narrowed result to look for just SSNs to produce a classification hit. Look below at a comparison of the two search results:

Result of U.S. Personal Data search
Result of Only SSN search

From the screen shots the number of found results are significantly different when using the custom vs predefined classification. This helps narrow the results to what is actually desired, limiting more results due to additional criteria.

Custom Rule Needs

Not only can the predefined classifications be complex, but you may also want something more specific than would be provided by an existing rule. We will use the account number example, but this could ideally be the same as customer ID, patient number, student number, etc. where a unique value may be assigned to records. If we start looking at account number, how many ways could “account number” be represented?

  • Account Number
  • Acct. Number
  • Account #
  • Acct. #
  • a/c number
  • a/c #

Those are just some of the possibilities, and an organization may also have its own unique way to indicate an account number. When it comes to custom data needs, a keyword rule can be used for custom classification creation. Not only does this rule allow for a single key word but also multiple words for cases like account number that can be written many ways. Here is what it will look like in custom classification creation:

Keyword rule in custom classification creator

As you can see it allows for a list of keywords allowing for the multiple ways to indicate account number. This simplifies creating the rule because there doesn’t have to be a separate rule created for each one.

Infinite Classification Options

Continuing with the account numbers example, not only can it be written in many different ways but the account number format itself is completely up to the organization. Number formatting can vary for numbers such as:

  • SSN
  • Bank Account
  • Credit Cards
  • Driver’s License
  • Passport Numbers

Looking at 10 different organizations there is a good chance you will have 10 different formats for account numbers. In most cases these account numbers will be comprised of just digits 0-9, could be a single digit, however more than likely will be a multiple of even more than 10 digits. To make them even more unique, an organization might even add letters to the mix. With account numbers possibly consisting of a mix of numbers and letters, no single rule would be able to handle all account number types.

The custom classifications form provides one rule to specify a unique string of numbers or numbers and letters. The answer is Regex. Here are a couple Regex examples:

  • \b\d{9,16}\b – simple string of numbers 9-16 digits in length
  • \b[A-Z]\d{9}\b – looks for a value that contains a letter, then 9 digits

Here is an example of how it looks to create one such rule:

Regex rule

What’s great is we can now combine the custom rule needed for keyword along with the rule for Regex to completely cover finding data that contains that information based on the unique account number. The complete custom classification rule would consist of an AND statement looking for both, looking like this:

Keyword and regex to create classifcation

This is how it looks in the View selection from the Classification list:

View of rule

Conclusion

When it comes to classifying your data there are a lot of great options included as predefined classifications. The main purpose being to meet certain standards to consistently find and classify data. However, they are complex, so using custom classification allows you to create your own simpler predefined rule to limit classification hits. And when a predefined rule is not enough, there are rules such as keywords and regex that allow fine tuning to exactly what you need, providing full customization based on your organization’s standards. Lastly, as a reminder from the how-to article for creating custom classifications, always use the testing feature to check for the expected classification hit for a document known to contain the qualifying data.

Was this article helpful?
0 out Of 5 Stars
5 Stars 0%
4 Stars 0%
3 Stars 0%
2 Stars 0%
1 Stars 0%
5
How can we improve this article?
Please submit the reason for your vote so that we can improve the article.
Table of Contents