Oh, the intricacies of written language where, depending on context, ‘ripped’ can be related to abs but not paper. But abs is related to abdominals and not anti-lock brake systems, again depending on context. But ABS might be deployed in the presence of abs! It all comes down to context. And, what text analytics company understands context better than Attensity? (rhetorical question, but the answer is: none!)
Recently, I helped answer an RFP for an Attensity customer. I thought that the information about context would be something that interested readers would also appreciate.
Consider the following 3 sentences:
1. While it was a smoking room, I couldn’t smell anything.
2. The room was clean, on a smoking floor, and smelled fresh.
3. The room was nice, but the hallway did smell of smoke.
A keyword-based text analytics system would typically extract the three sentences as being related to a category defined as a smelly hotel room. But, these are precision problems. Items are coded as belonging to a specific category (smelly room) to which they do not belong. This kind of inaccuracy can cause great problems downstream when attempting to make business decisions and take action on the data. To meet the needs of large organizations, an enterprise-class text analytics solution must deliver highly accurate results without manual intervention.
Attensity’s Exhaustive Extraction uses the linguistic structure of the sentence and automatically extracts both entities (such as the location “a smoking room” and the person “I”) and relationships or events, as interpreted (with unmatched contextual accuracy) in what is called a triple. The extracted triple from the first sentence would be I:smell[not]:anything.
Likewise, “the room was clean, on a smoking floor, and smelled fresh”, is properly coded as a positive event: The room:smell:fresh
Finally, in the example “the room was nice, but the hallway did smell of smoke”, the contextually correct extractions are made: the room:be nice and the hallway:smell of: smoke
Additionally, for customers who are taking Attensity output(terms, facts, triples, categories and voices, etc.) into 3rd party predictive models, model accuracy can be determined by metrics such as correct classification, minimized cost, maximized profit, etc., while model explanation can be much improved using triples, facts, events, categories, sentiment and voices, and so on. If your system doesn’t account for context, then you could be making decisions on misleading data.
Photo credits: Abs of steel by Eyesplash & Truck mechanic by Robert Couse-Baker









