Graph Theory based Internal Linking

How a LinkPie based Internal Linking solution helped to bring the orphan pages down from 50% to 1% for a travel brand

Orphan Pages Internal Linking

Introduction

Internal linking is one of the proven tech SEO weapons, especially for the websites with a huge number of landing pages.

In this case study, I’m talking about the travel brand Omio, where we had planned to increase the number of SEO landing pages from 200 thousand to 1 million across 28 domains. While I was completely convinced about the opportunity and the approach we decided to take for this massive inventory scale up, I was a bit skeptical on the other hand because 50% of our existing landing pages were orphan pages.

 Orphan pages are the pages that can’t be reached through navigation on the website. They are the pages that aren’t linked from any other pages on the website

 So, the plan was to fix the problem of orphan pages first, and then talk about inventory expansion.

What’s next?

The good part about the situation was – we knew the problem and hence, finding the solution was the only pending thing. So, I wore my SEO hat, grabbed a coffee and ran towards a meeting room with a whiteboard. Drew some diagrams on the whiteboard to give a nice look to everything I knew about the problem and then started to collect my thoughts around the potential solutions. After a few hours, I had a rough sketch in front of me which I converted to a jira ticket the same night. This was a simple rule based solution.

Discussed the first iteration with developers and added some numbers to justify the priority of it. In a couple of weeks, we were live with the first iteration which brought our orphan page percentage down to 42%. Were we happy with the results? Of course not. The goal was to touch the 3 digit magic number – 100% coverage!

After trying a couple more iterations, we managed to bring it down to 38% (i.e 62% coverage). But still we had a lot of common and edge cases missed. We were definitely not in favour of patching more tweaks on top of what we implemented.

We realised that we were missing something core and a travel brand like us needs a different approach than a set of rules to add contextual linking on our landing pages.

The magic solution was born – graph theory based internal linking

That discussion with the Engineering team (the architects behind the solution) was the best one I’ve ever had. This time we drew world’s map and started plotting some destinations to cover some of the common and edge cases.

International Internal Linking

Then we listed the end goals, which were:

  • Keep everything relevant. User is the king and then comes search engine bots
  • Covering ~ 100% of the pages i.e leaving ~ 0% orphan pages
  • Making pages more accessible through the navigation flow – we aimed to cover at least 90% of the pages within the depth level of 5

The plan was to have a brainstorming session but we were sitting in different corners and scratching our heads. Suddenly, a nerd in the room turned around and mentioned “multi directed graph”. Well, that’s something we had studied during the high school days but never thought about an application in this case. He explained the concept of triangle count and that’s when this graph theory based internal linking solution was born.

We added some tweaks to ensure our important pages (based on demand and supply parameters) are getting the importance they deserve, for example – London to Paris is one of the most popular routes we had whereas Berlin to Brandenburg was hardly attracting any traffic, so we didn’t want this algorithm to treat both the pages at the same level of importance. So, we added a custom rule, the more important pages get more number of links, but still respecting the relevance rule.

We took one of our domains for a deep analysis to identify how this great-in-theory-solution works in reality. The results were great and we implemented the solution on a couple of test domains. Right after the release, we initiated a fresh crawl to gather the new stats. As soon as the crawl finished, we all had smiles on our faces seeing 98% coverage. This was beyond imagination to be honest.

Celebrations? Too early maybe!

The team started to celebrate, but wait! What are those 2% pages which still aren’t covered, I asked and spoiled the party. We looked into the data – some of the pages were okay to be ignored and we moved them out of the list. For the remaining ones, we found a slight tweak to fix an edge case we missed. Wohaaa! The result was 99% coverage and we were left with only ~ 1% orphan pages.

Interested in the technical nitty-gritty of the solution? You can find it here.

Results

We started to observe our access logs to see how search engines perceive this and it was all promising. We witnessed a gradual increase in rankings and impressions, especially for the pages which weren’t performing before. We rolled it out everywhere and witnessed similar (great) results.

The project was marked as one of the successful ones and I’m so proud solving the coverage problem through a graph theory based solution, massive credits to the brilliant brains in the engineering team for seeding the amazing idea in our brains. Here’s the quick summary of the end results for you:

%

Coverage

%

Orphan pages

%

Increase in landing pages contributing to SEO traffic

%

Pages accessible within the depth level of 5

Want to be a part of our next case study?