Understanding Latent Semantic Indexing or LSI

UPDATE: I joined forces with a partner whose background is in the field of Information Retrieval. He took the time to explain to me that my theory is wrong. I am currently researching this topic in depth and I will update this post once my research is finished latent semantic indexingLSI or Latent Semantic Indexing is a mathematical way of scoring elements of a page against other elements on that page, and comparing it to other sites in order to determine the ranking in the search engine. It works with themes. Google knows how relevant the word 'cat' is the word 'dog', for example. It know this because of the way these two words are used on different sites.

It compares nearly everything about a page: links; anchor text; meta tags; punctuation; sentence structure; language; img tags; page size - and a lot more. It compares this to sites that have proven themselves to be 'authority' sites. So Google knows, based on authority sites, that 'cat' and 'dog' are animals - because sites that use these keywords will often mention the keyword 'animal' when talking about cats or dogs.

You can try it for yourself: if you type this into Google: ~dog (That's the tilde key : Shift+@) ...then Google will display the search results as normal, but it will bold all the terms that it sees as thematically related to the word 'dog', based on it's examination of authority sites.

So, how do you make use of this? You have to make sure that your site is very tightly 'themed'. Say you've got a page on your site about Tomato Soup. You would have to make sure that everything on that page only related specifically to tomato soup. By this I mean:


Take some time to go to Google and enter your main keywords. Look at the top 5 sites that come up. Read a few pages of each - these are the pages that Google likes. Notice the words they use, any consistent styles, html, sentence structure, img tags, title etc. When writing your content, forget about keyword density. It's dead. Use that theme search (the one with the ~ key) to grab a bunch of related terms and use them all in your content (only if they are relevant, of course). There's more to it than this, but by varying your keywords and building a strong theme base you will be seen as much more of an expert than a site that stuffs in keywords to achieve an x% keyword density.

Meta Tags

They have been devalued, but not nearly as much as most people think. The title tag, description and keywords are still very important to have.


Anchor text, arrangement of words in the anchor text and the destination page you are pointing to all have to be very specifically about tomato soup. All of your out going links must also only point to pages exclusively talking about tomato soup - it's definitely not a bad thing to have a few outbound links, but make them relevant. It is OK to link to other pages in your site, but do it carefully. If your are on a page about tomato soup, don't go linking to your page about pea soup - it will not do you any good. A strong internal linking structure is important, but to maximise your chances of ranking highly you must only link to other sites within the same 'silo' on the website, or the parent silo. What's a silo? A silo is basically a folder on your site. So, you domain might be www.cats.com, and it might have a silo called 'pink-cats' located here www.cats.com/pink-cats/. You should keep relevant, similar content within the same silo, and only link to informaiton within the same silo, or the silo above it. So, for this page: www.cats.com/bigcats/blue/blue-cat.html you would either link to another, relevant page in the silo 'blue', or you would link to the category page for 'blue' it's self. Don't go linking to www.cats.com/bigcats/yellow/cat.html - it's in a different silo. Silos don't have to be in folders - you can have a virtual silo system. Sites that run from templates and are database driven usually use a virtual linking system - like word press. The folder structure is imitated, and can be seen by Google from the way you link to certain pages.

Expert Verbage

Terms These are terms that are unlikely to be mentioned on any other page, other than the subject the page is about. For example, the phrase 'malice aforethought' is very unlikely to appear on any other page than one talking about the law. A good way to spark ideas and find these expert verbage terms is to use Amazon. Shoot over to Amazon and search for your keyword in the books section. Scroll down to 'Statistically Improbable Phrases' - you can find a bunch of them there.

So what's the bottom line?

Write for humans - not machines. Think about it... what does Google want (apart from world domination)? It want to give people the information they want via search - if they don't do this, and do it well - they go out of business.

By writing for humans, you are giving people good content that they want. Google will notice and appreciate this... and Google are only going to give you better rankings over time as they get better at filtering the treasure from the trash. Hope this helps...

A lot of thanks to Dave from Elitemarketingraphics for this article.