What is Latent semantic Indexing ( LSI )

What is Latent Semantic Indexing or LSI ?
Latent semantic indexing was initially used in Ad sense to enable adverts targeted to the theme of a webpage to appear on the page. The Search Engine algorithm checks the wording on the page and determines the theme of the page. It was only later that Google applied the algorithms to search engine placement, and it is used by search engines other than Google. It involves analysis of words used in natural language, the synonyms and closely related words used when discussing the general theme of a page. It complements, rather than replaces, keyword analysis. SEO

Since it is based on a mathematical set of rules, or algorithm, it is not perfect and can lead to results which are justifiable mathematically, but have no meaning in natural language. Google purchased a company, Applied Semantics, to develop it early in 2003.

So what is latent semantic indexing? How does it work in layman’s terms? Let’s look at the three words separately:


The word ‘latent’ means something which is present, but not obviously visible. For example, the latent heat of vaporization, or the heat required to vaporize water, is present in clouds, and is only released when it rains. This is why it seems to get warmer during thunderstorms. In terms of latent semantic indexing, it means that a word such as ‘lock’, can be present in a text, and its meaning is hidden until some other factor reveals it. ‘Lock’ can mean, among other things, a piece of hair, a security device or a means of conveying a barge between different heights in a canal. It is only the rest of the text which makes its meaning clear.


The word ‘semantic’ refers to the meaning of language or words, as opposed to what is actually said or written. In the use of the word ‘lock’ in ‘a lock of hair’, semantics is the use given to the word ‘lock’, which is made obvious by the expression ‘lock of hair’.


With reference to use of the word in latent semantic indexing, ’indexing’ is the identification of the meaning of a document from its subject content, and its listing into a form suitable for use by a search system.

Let’s make it clearer by giving an example. Software-generated pages used for Adsense tend to be very general and are able to be used as templates for any keyword or phrase. Here is how a typical piece of such text could read, using ‘the history of locks’ as key-phrase.

“If you are seeking information on the history of locks, there is no better place than the internet. The information superhighway is full of sites specializing in the history of locks, and the history of locks is a very popular subject. We recommend that you check out the other pages on this site for information on other topics associated with the history of locks.”

This is very common with Adsense sites. You can replace the key-phrase ‘the history of locks’ with any keyword whatsoever, and the same text can be used countless times, being replaced automatically by the software from a given list of keywords. It does not give the reader any information whatsoever. In fact, it does not even give information as to what type of lock is being referred to. It could apply equally to a security lock and a canal lock. This is why latent semantic indexing was introduced to search engine ranking analysis.

Now here is the same text including some qualifying wording:

“If you are seeking information on the history of locks, there is no better place than the internet. The information superhighway is full of sites specializing in the history of locks of all types, from the massive Roman door locks to sophisticated encrypted password security systems. From long-keyed safe locks to the history of combination locks that have given safe crackers so much trouble over the ages. The history of locks is a very popular subject and we recommend that you check out the other pages on this website dealing with topics such as general security, the history of cylinder and lever locks, and padlocks of various kinds.”

The ‘latency’ referred to in the term ’latent semantic indexing’ is the hidden meaning of the word ‘lock’, which remains hidden in the first version until the semantics of the second reveal its meaning. Thus, by use of the algorithm, website content with similar keywords, but different meanings for the keywords, can be differentiated and, more importantly for the webmaster, the relevance of the site can be properly determined and indexed.

No longer will sites with ambiguous keywords, as in version one above, be acceptable to search engines. The semantics of the page must make the meaning and topic of the page clear.

So what does that mean to you? It means that not only must you maintain a reasonable density of the specific keyword being targeted, since that is still the term being used by the searcher, but you must also use related words and terms to define the overall theme of the page. Prior to latent semantic indexing, a search for the term ‘the history of locks and canals’ would have been directed to both of the above texts. Since its introduction, such a search will be directed to neither. It will be directed to a page where it is obvious that the theme of the site is canal locks.

This can only be good for the visitor to Google.


  1. http://onlineentrepreneurshipideas.blogspot.com/

  2. Thanks for taking time for sharing this article, it was excellent and very informative. Its really very useful of all of users. I found a lot of informative stuff in your article Web Designing Company Bangalore | Website Designing Company Bangalore