GET THE APP

6713 Blood Cancer Online Journals | Open Access Journals
..

Accounting & Marketing

ISSN: 2168-9601

Open Access

6713 Blood Cancer Online Journals

Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users. Recent proliferation of the World Wide Web, and common availability of inexpensive storage media to accumulate over time enormous amounts of digital data, have contributed to the importance of intelligent access to this data. It is the sheer amount of data available that emphasizes the intelligent aspect of access—no one is willing to or capable of browsing through but a very small subset of the data collection, carefully selected to satisfy one’s precise information need. Research in artificial intelligence has long aimed at endowing machines with the ability to understand natural language. One of the core issues of this challenge is how to represent language semantics in a way that can be manipulated by computers. Prior work on semantics representation was based on purely statistical techniques, lexicographic knowledge, or elaborate endeavors to manually encode large amounts of knowledge. The simplest approach to represent the text semantics is to treat the text as an unordered bag of words, where the words themselves (possibly stemmed) become features of the textual object. The sheer ease of this approach makes it a reasonable candidate for many information retrieval tasks

Conference Proceedings

Relevant Topics in Business & Management

Google Scholar citation report
Citations: 412

Accounting & Marketing received 412 citations as per Google Scholar report

Accounting & Marketing peer review process verified at publons

Indexed In

arrow_upward arrow_upward