Terminology Matters in Text Analytics
After recently listening to another text analytics vendor talk about how they describe what they do, I couldn’t hold off on saying this any longer: Words, especially in the field of text – matter.
SAS has been offering Text Mining, one aspect of SAS Text Analytics since 2002. However, with the popularity of social media spawning a swath of new “text mining” vendors, the true definition of mining has become muddled.
Text Analytics is, literally, the analysis of text.
Text Mining is one method to analyze text. There are other methods too. Which one, or which combination of methods, and the order in which you choose to use them, depends on your specific needs.
The most critical aspect of text mining that distinguishes it from all other types of text analysis is that, as with numeric data mining, it is used to find previously unknown information.
Some suggest text mining is about identifying words and phrases within a text collection. Actually, this is extraction and search – and could maybe even be considered discovery. However in each of these – extraction, search or discovery – you already know what you are looking for, you just aren’t sure where it is. This is a location issue. It is not mining for previously unknown information.
Others say text mining is about visually displaying words and phrases within text, sometimes in the form of a word cloud, other times highlighting phrases in a document. No, this is visual business intelligence. You already know the words/phrases you are looking for, simply color coding where they are in text is not mining.
When you don’t know what to look for, as is the case with mining data, you can’t find it – visually, or otherwise. You first need to learn what is there.
Text mining is how you do that. It finds things you don’t already know. Text mining discovers (literally) words and topics for you.
Text mining is great for discovering what key topics are in a collection of documents, call center transcripts, patient records, claim submissions, and social network conversations. Text mining can be used prior to a sentiment analysis – if you don’t already know what you want to evaluate sentiment on. Text mining can also be used prior to content categorization – to learn the topics contained in the training set, and to define the initial taxonomy based on what you uncover.
Text mining can also be used to represent text data numerically, so you can then mine, apply forecasts or other kinds of predictive analysis to both your text and your structured data together.
It is confusing when the same words are used to describe different things – and it can make it even harder to know where to start. If you’re just starting to research these topics, I hope this has helped clarify some of the terms.
SAS has been offering Text Mining, one aspect of SAS Text Analytics since 2002. However, with the popularity of social media spawning a swath of new “text mining” vendors, the true definition of mining has become muddled.
Text Analytics is, literally, the analysis of text.
Text Mining is one method to analyze text. There are other methods too. Which one, or which combination of methods, and the order in which you choose to use them, depends on your specific needs.
The most critical aspect of text mining that distinguishes it from all other types of text analysis is that, as with numeric data mining, it is used to find previously unknown information.
Some suggest text mining is about identifying words and phrases within a text collection. Actually, this is extraction and search – and could maybe even be considered discovery. However in each of these – extraction, search or discovery – you already know what you are looking for, you just aren’t sure where it is. This is a location issue. It is not mining for previously unknown information.
Others say text mining is about visually displaying words and phrases within text, sometimes in the form of a word cloud, other times highlighting phrases in a document. No, this is visual business intelligence. You already know the words/phrases you are looking for, simply color coding where they are in text is not mining.
When you don’t know what to look for, as is the case with mining data, you can’t find it – visually, or otherwise. You first need to learn what is there.
Text mining is how you do that. It finds things you don’t already know. Text mining discovers (literally) words and topics for you.
Text mining is great for discovering what key topics are in a collection of documents, call center transcripts, patient records, claim submissions, and social network conversations. Text mining can be used prior to a sentiment analysis – if you don’t already know what you want to evaluate sentiment on. Text mining can also be used prior to content categorization – to learn the topics contained in the training set, and to define the initial taxonomy based on what you uncover.
Text mining can also be used to represent text data numerically, so you can then mine, apply forecasts or other kinds of predictive analysis to both your text and your structured data together.
It is confusing when the same words are used to describe different things – and it can make it even harder to know where to start. If you’re just starting to research these topics, I hope this has helped clarify some of the terms.
Posted by Fiona McNeill, Technology Product Marketing Manager in Fiona McNeill at 16:21 | Comments (0) | Trackbacks (0)
Defined tags for this entry: categorization, sentiment analysis, terminology, text analytics, text data, text mining
Related entries by tags:
Text Analytics at the M2010 Data Mining Conference
SAS Text Analytics takes a bow
Text Analytics 101: Survey "Says"
TOTS: Taxonomy, Ontology, Thesauri and Semantics
Corpus Callosum .. Where Right and Left Brain Meet
2010 Applying Business Analytics Webinar series goes "101"
The SGF Text Analytics Conversation -- SAS Global Forum and Text Analytics
Adam Smith, sentiment and business analytics
On Text Data Quality
Text Analytics Summit Review
Text Analytics at the M2010 Data Mining Conference
SAS Text Analytics takes a bow
Text Analytics 101: Survey "Says"
TOTS: Taxonomy, Ontology, Thesauri and Semantics
Corpus Callosum .. Where Right and Left Brain Meet
2010 Applying Business Analytics Webinar series goes "101"
The SGF Text Analytics Conversation -- SAS Global Forum and Text Analytics
Adam Smith, sentiment and business analytics
On Text Data Quality
Text Analytics Summit Review
The Text Frontier - Text mining, voice mining and unstructured data analysis