|
Text mining models tend to be very large. A model that attempts to classify, for instance, news stories using Support Vector Machines or the Naïve Bayes algorithm will be very large, in the megabytes, and thus slow to load and evaluate. Concept mining models can be minute in comparison - hundreds of bytes.
For some applications, such as plagiarism detection, concept mining offers new possibilities. Where the plagiariser has been cunning enough to perform a thesaurus based substitution that will fool text comparison algorithms, the concepts in a document will be relatively unchanged. So 'the cat sat on the mat' and 'the feline squatted on the rug' appear very different from text mining algorithms, and nearly identical to concept mining algorithms.
|