By Alkis Papadopoullos, CEO and CSO at Coginov
As privacy laws are enacted throughout various countries and regions, the ability to uncover relevant tags indicative of personal or sensitive data (often referred to as PII – personal identifiable information – and SII – sensitive identifiable information) is becoming more and more important. Achieving this requires tools and algorithms to be able to put text-based keywords and expressions in context in order to minimize false positive or false negative hits when discovering PII or SII data. Typical types of said data can be person or organization names, phone numbers, addresses, financial information or records, citizen identification documents, etc.
To address this need, we present some ideas around the mechanics of semantic contextualization. Platforms that attempt to discover and extract such information have a set of very tangible expectations such as obtaining a list of possible PII or SII candidates for review and subsequent calls to action regarding disposal, storage, or protection of this data. However, this proves very difficult to accomplish if analysis is based solely on brute force extraction without any context. For example, is what seems to be a person’s name, actually part of street name and hence part of an address? Is a sixteen-digit numeral actually a credit card number? Etc.
Machine learning based semantic analysis can help to achieve these goals, primarily because it involves associating to each potential sensitive data piece, the meaning of that data; it is thus equivalent to extracting and storing concepts rather than keywords. By identifying concepts along with named entities (PII and SII data) it is possible to achieve three important goals that are the cornerstones of reliably identifying personal information:
Coginov’s QoreAudit product helps to precisely achieve such goals. Using a natural language processing approach that combines semantic analysis with proprietary machine learning algorithms we strive to help users reliably identify meaning in content and relate it to potential PII and SII data. This enhances customers’ ability to reliably mine data for actionable information and does so all the while reducing the time that must be spent analyzing data to draw reliable conclusions. This in turn means that we can identify whether a concept evoked in a comment is clearly referring to a snippet of personal or sensitive data.
Another very significant advantage of semantic contextualization is the ability to compute a document’s “semantic profile” based on the type of PII and SII data extracted. IN so doing, it is possible then possible to assess the level “sensitivity” of a given document or set of documents and much more accurately determine whether chances of identity theft, intellectual property theft, sensitive financial data acquisition, etc., are higher. By mapping the most relevant concepts to potential PII and SII data, we can determine several very interesting things:
In summary, through semantic contextualization Coginov’s QoreAudit product allows customers to gain actionable insights more rapidly from most or all of their data repositories, understand the specifics about why certain documents or sets of documents are riskier than others, and take tangible action to protect all PII and SII data they hold. Please feel to contact us at sales@coginov.com if you are interested in further information or a demo of our product.
We create innovative solutions
COGINOV is recognized as a world leader in semantic technologies and information management. We are a Canadian software company offering our customers innovative solutions for managing structured and unstructured information. Our head office is based in Montreal.
Coginov’s Qore platform technology enhances the information value chain, transforming unstructured content into highly contextualized, accessible and valuable information. Coginov’s solutions enable you to capture, analyze, engage, automate and manage your information assets, with unrivalled accuracy and efficiency.
Discover our solutions QoreAudit, QoreUltima and QoreMail
2022 Marketing. All Rights Reserved by Artureanec