Unless you have been living under the rock for this whole time, you have most certainly realized the truth that just shoving some keywords in the content of a website is not doing the trick anymore. Yes that ‘golden age’ of web spamming is over. There is no need to churn out hundreds of ‘SEO friendly’ or ‘keywords stuff’ content in order to rank high in related terms. After the RankBrain mayhem, digital marketers have finally come to their senses. They have realized the obvious truth that they need to go back to the basics of Search Engine Optimization in order to understand and evaluate what tricks are working and what are not.
Things might unravel slightly differently when you start revisiting the basics of On Page Optimizations. Asides from the obvious things like – Keywords Density, Keywords proximity etc, you might even come across terms like – TF-IDF. Don’t let it make you feel flabbergasted. Allow us break it down for you -
Yes, we can understand that these technical jargons look unfamiliar and somewhat intimidating but believe me, they are not. It is basically a text analysis technique which is predominantly used by Google to determine the quality of a webpage in respect of a given search query. It determines the importance of a phrase or a word within a web page.
This is done by first figuring out how often a certain work has appeared on a web page and then by using inverse term frequency to scale down the importance of words that have appeared frequently on other pages.
Google has to rely on links and the keywords to figure out the relation between a search query and a web page. However, there is no doubt that over the years Google has evolved and is now using machine learning algo understand the intent of the searchers and to evaluate the quality of a webpage. But still, the algo of Google heavily rely on Page rank and keywords.
Digital marketers have long been trying to understand how search engine interpret a webpage and they came up with a strange concept – ‘Keyword Density’ which has never been used by Google. Buoyed by this ‘discovery’, digital marketers start stuffing keywords in content in order to manipulate Google’s search algo. Thankfully, Google or any other search engines for that matter, has never relied solely on keywords to determine whether a given web page stands a chance to rank high in related search query. So, the concerted attempt by search marketers failed at last. Little did they know that Google is using TF-IDF to retrieve information and to index content. Engineers at Google were well aware of the fact that metrics like – ‘keywords density’, that can be easily calculated and manipulated, will eventually become an easy target for the spammers and which is why they decided to ignore it altogether and rather they invested all their energy and resources on TF-IDF
Unlike Keyword Density or Keywords Proximity that focuses mainly on keywords, TF-IDF tries to determine how often a keyword is likely to appear in a page based on a larger set of documents. So, basically TF-IDF will take into account the keywords appearance in a given page to that of other documents within a larger set of documents.
Wondering how does this actually work? Well, quite straightforward. Allow us explain –
Imagine there is a document on – ‘The Importance of Content Marketing’, TF-IDF will first look at the unimportant words like – ‘Is’, ‘Are’, ‘The’ etc and then scale down the importance of these terms in that document. In the next stage, TF-IDF will take a holistic look at other documents within that document tree and then try to identify the ‘unique’ terms or phrases that are used in the ‘The Importance of Content Marketing’ document and scale up its importance in respect of those terms that are rarely used in other documents. The rarer the words are, the important they are.
We, we all know that Google has evolved a lot in the past few years. Slew of updates in the form of – Penguin, Panda, Hummingbird and RankBrain have changed the very concept of search engine optimization. There is no way you can trick Google by churning out mediocre content (500 words content with targeted keywords repeated 10 times). Google, now powered by machine learning algorithm, can differentiate between a spammy and keywords abused content and a well written and well researched content. Its algorithm is making use of TF-IDF-like to evaluate a web page and once the evaluation process gets completed, it starts analyzing the content and use other signals to determine whether the page deserves to rank high in related search queries.
This information is of vital importance for all SEO professionals out there. It is obvious that Google has become smarter at identifying spammy and low qualities articles thanks to the recent rejig of its core algorithm but that does not mean, it is using other metrics to determine the quality of a web page. So, that means, if you are planning to invest more time and resources on the content of your website to drive more traffic and to get more customers, you need to utilize the power of TF-IDF.
The main purpose of TF*IDF as propounded by some SEO experts is to help Google understand, identify and differentiate homonyms (Definition - each of two or more words having the same spelling or pronunciation but different meanings and origins.)
Let’s take an example to get an idea how this could work in a real-life scenario. Imagine someone searching for the upcoming Windows updates in Google. Now, there are hundreds of articles on Doors and Windows but the searcher is actually looking for a Microsoft product. So, how on earth Google is going to understand and present websites that are talking about Microsoft product and not about Doors and Windows. Well, it is fairly easy when you take into account TF*IDF. Google will simply look at the associated keywords that are included in those articles.
Articles on Windows (Microsoft product) will include terms like – ‘Excel’, ‘Words’, ‘OS’, ‘Operating System’, ‘Bill Gates’, ‘Microsoft’ etc. Whereas an article on Windows will include terms like – ‘wood’, ‘tree’, ‘carpenter’ etc. You get the idea, right?
By embracing the power of TF*IDF, you can increase the chances of your website to rank high in some search queries that you might not even thought of earlier. Moreover, we simply can’t ignore the importance of voice searches where conversational search terms are being used on a massive scale. So, basically what I am trying to say is that the search industry is evolving and we have no other option but to evolve.
LSI or Latent semantic indexing means – identifying the hidden [latent] association [semantics] between words and phrases to enhance the understanding of the information available in a web page. LSI is used by almost all search engines to understand how different terms work together in content and to better understand the relationships between the concepts explained in that content. So, when you will be doing keywords research based on LSI, you will get a list of keywords that are strictly related to the main topic. For example, if you are doing keyword research for the term – ‘Web Design’, LSI based keywords research tools might generate these –
But when you will be doing keywords research while focusing on TF-IDF, you will get keywords like –
These above set of keywords are not absolutely related but they allow you understand the relationship between different segments of web design so that you can ask your content marketing team to write some well research content on these topics.
So, it looks like you have got another power arsenal to your On-Page SEO kitty in the form of TF-IDF.