Wednesday 24 July 2013

[Build Backlinks Online] Competitive Link Analysis: Link Intersect in Excel

Build Backlinks Online has posted a new item, 'Competitive Link Analysis: Link
Intersect in Excel'


Posted by mihai.aperghis
This post was originally in YouMoz, and was promoted to the main blog because it
provides great value and interest to our community. The author's views are
entirely his or her own and may not reflect the views of Moz, Inc.

Without a doubt, one of the main steps in creating an SEO strategy is the
competitive analysis. Competitor backlinks can offer information on their link
building strategies as well as giving you opportunities to strengthen your own
link profile.


These opportunities are hard to identify manually, especially in competitive
niches, where websites tend to have a significant amount of backlinks. Although
some tools do exist that can ease up this process, like the Moz Link Intersect
tool, I chose to build my own tool using Excel that would offer greater
flexibility in handling data.


I wrote this guide to explain how you can build your own competitive link
analysis in Excel, including a template to help you start right away.


What can you find in this guide:


What Is Link Intersect Exactly?
Why Would I Need the Excel Version?

Got It, Now Show Me the Magic!

Tools of the Trade
Step 1: Export Your Backlink Data (skip if you already know/have this)
Step 2: Import It Into Excel (skip if you've already done this and removed the
errors)
Step 3: Apply the Necessary Formulas
Step 4: Build the Pivot Table
Step 5: Sort It Out and You're Done


Future Improvements
Template and Instructions


What Is Link Intersect Exactly?

As you may know, the Moz Link Intersect tool (also known as the Competitive
Link Research Tool or Competitive Link Finder), along with other tools of its
kind, allowed you to find domains that link to your competitors but aren't
linking to you. You can thus find lots of link opportunities, especially on the
domains that link to more than one of your competitors, since there's a higher
chance they might link to you as well.


The Moz tool is currently unavailable, but I have it on good authority it will
be back up down the road.


The Excel version is somewhat the same thing, allowing you to view these
domains, the competitors they are linking to, and from exactly which URL they
link to them, in addition to other metrics that will help you decide which ones
to contact.


Why Would I Need the Excel Version?

Here are the advantages of using the Excel version over other tools:


First of all, most tools that include finding link opportunities from your
competitors are part of a bigger platform that usually requires a monthly
subscription. Excel is a one-time thing (though the backlink source is usually a
monthly subscription platform itself), and chances are you might have it
already.


Second, if you have a small SEO business like I do (or are a freelancer) and
can't really afford a developer to build your own tools yet, Excel might be one
of the most useful software at your disposal. It's great for data analysis and
visualization and has lots of nifty plugins that aid you in your day-to-day SEO
job. Even more-so, almost every major online-marketing-related platform out
there has the ability to export to Excel, giving you a centralized location of
all the data.


Third, the Excel version will allow you to:


Use backlink data from ANY provider, not just OSE, as long as it has a Source
URL (where the link is posted) and a Target URL (where the link points to); of
course, any metrics can help you, but are optional for the functionality of the
tool
Sort the data the way you need it, either by the number of competitors the
domains links to, or by one of the metrics that came with your data
Analyze as many competitors as you want (as long as your computer can handle
it)


Got It, Now Show Me the Magic!!

OK, if you're still with me, I assume you're interested in this tool, so I'll
take a step-by-step approach to explain how to create it. It does include a
pivot table, but it's really not that hard to use and I'll use screenshots to
show how to implement it.


If you want to skip to the end result, the last chapter includes a template
and instructions on how to use it.


Tools of the Trade

Before we start the Excel-fu, here's a list of what you need to have at your
disposal:


You might be shocked by this one, but you will need Excel to make this work :) I
used the 2010 32bit version in my example, but other versions should work just
as well.
Backlink data. You can use the Moz OSE, Ahrefs' Site Explorer, MajesticSEO's
Site Explorer, basically anything that meets the requirements I mentioned above
and has the ability to export to Excel. For this example, I've used OSE exports.
An alternative would be using an API to get the data, that's up to you.
6 to 8 hours of your time. I'm joking :)


Step 1: Export Your Backlink Data (skip if you already know/have this)

This is fairly basic. If you've used Excel for backlink analysis before, you
probably already know how to do this. Personally I have a Moz PRO account, so
I'll be using OSE for this step.


Since I've just recently launched my company website, I won't be using it as
an example. Also, since I am too lazy to pick a random website, I'm going to use
seomoz.org as my primary domain.


I'll choose 3 competitors (I mentioned you can choose as many as you want, but
since these are fairly big websites, three should be enough for this example).
These are: distilled.net, seerinteractive.com and seogadget.co.uk.


Getting backlink data from each of those sites (including the primary one) is
straightforward. Go to OSE, enter the domain and click search. Next, you will
want to filter the results to include only external links to pages on the root
domain or subdomain (the latter if the site's hosted on a subdomain that is
fairly separate from the root domain, like a blogspot.com blog).




IMPORTANT NOTE: Getting links to the root domain will usually get you more
data, but will require the need of two additional formulas in step 3.


Optionally, you can filter this more to only include dofollow links. Be sure
to click the Filter button once you're done.


Next, you'll want to download these links. Now, OSE gives you two options.
Either use the "Download CSV" button and get up to 10k links, or use the
Advanced Reports module where you have a daily credit limit and can export up to
100k links.


In case you use that, you'll need to choose the "External linking page" and
"Any page on this root domain" (or subdomain, accordingly) options. Everything
else can be left as is, though you can choose to filter links with DA/PA higher
than a certain value, to reduce the total number of results.




Note that you can queue exports, so you don't have to wait for one to finish
until you start the next one. You'll get emailed when they're done.


Repeat this for all your competitors as well. In the end, we should have four
different CSV files (one for our backlink data, three for the data of our
competitors).


Step 2: Import It Into Excel (skip if you've already done this and removed the
errors)

It's time now to open the magical software that our people refer to as Excel.


To get the data from CSV files, we have two options:


Either open the CSV files directly, copy the columns we're interested in (this
would be the URL, Target URL and any metrics you need) and then paste them into
a new worksheet
Or use the Excel Text Import wizard to import the data into an empty worksheet
without opening the CSVs


Both options are fairly simple, though the first one is easiest to do (won't
even do screenshots for this). The problem is that the first option doesn't work
if your Windows installation is set to a European country.


That's because a CSV contains Comma-Separated Values, the comma being the
default list delimiter in the US. For European countries, the default delimiter
is usually the semicolon (";"), which means Excel won't read the CSV files
correctly.


To resolve this issue, you need to open the Regional and Language Options from
the Control Panel in your Windows installation and either set it to English
(United States), or keep your current country and, in the Advanced Settings, set
the decimal symbol to dot (".") instead of comma, and the list delimiter to
comma instead of the semicolon. You can view the exact process here (Solution
#3):


Alternatively, you can use the second option. The problem is, due to the way
Excel imports data, some of it may be displayed erroneously, which would lead to
some extra steps to clean up the data by removing all the errors. Due to this
issue I decided not to include a tutorial on how to do this (but you can do it
regardless if you prefer not to change your list delimiter).


Regardless of your choice, after including data from the first domain, copy
the data for the other domains underneath, without including the header row
again. This way you'll have a continuous list of data from all the domains with
just one header row (the first one).


IMPORTANT NOTE: If you'll be analyzing a large number of backlinks (over 50k),
enter only a limited number at start (10-20k), and add the rest (also in batches
of 10-20k) after inserting the formula columns from the next step. This is
necessary depending on your Excel version and your resources to avoid error
warnings.


Right, you should now have all the data imported into Excel. This is optional,
but I find it much easier to work if this data is in a table. To do that, select
all the data so far (click on one of the cells containing data, like A1, and hit
CTRL-A), then transform it into a table (hit CTRL-L).


Remember that without a table, you'll have to edit the formulas to include
exact cell references (e.g. $A2 instead of [@URL]).




Step 3: Apply the Necessary Formulas

Now that we have all our data in Excel, we need to apply the formulas
necessary for the next step.


Our first two formulas will simply take the (source) URL and Target URL column
data and strip everything but the subdomains. These nifty formulas are also part
of the excellent "Excel for SEO" guide from Distilled.


We'll need two create two new columns to hold this data. We'll name the first
one "Source Subdomain", and the second one will be "Target Subdomain". Since we
have a table, we just need to enter the names in the first two adjacent columns,
and Excel will attach them to the table automatically.




The first formula is


Source Subdomain


=MID([@URL],FIND("://",[@URL])+3,IFERROR(FIND("/",[@URL],9),LEN([@URL])+1)-(FIND("://",
[@URL])+3))


(where [URL] is the column that contains the Source URL, might be named
differently if you don't use OSE; thanks to GerardGallegos for pointing out a
typo!)


and the second formula is:


Target Subdomain


=MID([@[Target URL]],FIND("://",[@[Target URL]])+3,IFERROR(FIND("/",[@[Target
URL]],9),LEN([@[Target URL]])+1)-(FIND("://",[@[Target URL]])+3))


(basically the same, just for the Target URL column).


The formulas basically get and display what's after the "://" part and before
the first "/" of the URLs (this allows it to also get links from and to secure
locations with "https"). The IFERROR part ensures you get the right result for
the case where the URL doesn't have an ending slash, like the homepage URL (OSE
always adds this slash, but Majestic SEO doesn't).


You just need to enter these inside the first cell of each of the columns, and
Excel will auto-populate them for the whole column.


IMPORTANT NOTE: If you selected the "pages to subdomain" instead of "to root
domain" option when getting your data, you won't need to include the next two
formulas, since you only have one subdomain for each site (e.g. www.seomoz.org
for Moz, seogadget.co.uk for SEO Gadget, etc.). If that's the case, skip to the
Unique Domains formula.


Now, you might be wondering on the fact that we only got the subdomain of the
Target URL, and that's an actual issue. This means that seomoz.org and
www.seomoz.org will be counted as different sites, which may be a problem
further down the line (you would see domains that link to 4 or more competitors,
even if you only have 3 competitors in your data).




To fix this, we need to pull the actual root domain from the subdomain.
Unfortunately this will be a tad complicated since we have to differentiate
between TLDs (Top-Level Domains) and SLDs (Second-Level Domains), because one of
our competitors is on a SLD (SEO Gadget), and we don't want to end up with the
domain "co.uk" instead of "seogadget.co.uk" (so you can't use the "just grab
whatever is after the last dot as TLD" routine).


IMPORTANT NOTE: If you have the SeoTools for Excel plugin by Niels Bosma, you
can skip the TLD and Target Root Domain formulas, and use just one formula to
get the root domain. In this case the formula would be =UrlProperty([@[Target
Subdomain]]; âdomainâ). However, if you intend to use your own
custom SLDs (e.g. "blogspot.com" to avoid clumping different blogs from the same
domain together), you'll need to use the functions below. Thanks to Roald for
reminding me of this function!


First, we need to list all the TLDs and SLDs we expect to encounter in the
Target URL column somewhere separated from the table. I chose the Z column for
this. Our list will be the following:


.com
.net
.org
.co.uk




Always place the SLDs under the TLDs (as a point of principle), so they get
detected last. Think of it like a set of rules, the formula will check for all
rules and return the last match it found. So for example, if one of your
competitors is a blog hosted on ".blogspot.com" (which is not really a SLD, but
you would consider it as such for your analysis, since you're not interested in
"blogspot.com" as a competitor), you would want to place that under the ".com"
TLD so it gets matched correctly.


With the list set in place, our next formula will retrieve the TLD/SLD (I will
just refer to them as TLD from now on) from the Target Subdomain column. Use it
in the next adjacent column to the table, and name the column "TLD". The formula
is:


TLD


=LOOKUP(2^15,SEARCH($Z$1:$Z$4,[@[Target Subdomain]]),$Z$1:$Z$4)


The 2^15 value inside the LOOKUP tells the formula to always look for the last
occurrence of the TLD in the Target Subdomain. If you're curious in what case
would this be useful, imagine the subdomain "test.comparison.org". You would
want to retrieve the ".org" part, as that is clearly the TLD. However, without
the 2^15 part, Excel would first encounter ".comparison" and stop, so it would
then match it to ".com", which would be a mistake.


The $Z$1:$Z$4 range references the cells that contain the TLDs.




Now that we have the TLD, let's get the actual root domain. To do this, we
basically get the Target Subdomain, strip the TLD, get everything that's after
the last dot, and then apply the TLD back on it. That means that if we have
"some.thing.example.com", we'll strip the TLD and get "some.thing.example",
retrieve everything after the last dot which gets us "example", then finally
apply the TLD to get the root domain "example.com".


All of the above is done in one formula, which you'll place in the next column
to be named "Target Root Domain":


Target Root Domain


=IFERROR(RIGHT([@[Target Subdomain]],LEN([@[Target
Subdomain]])-FIND("|",SUBSTITUTE(LEFT([@[Target Subdomain]],LEN([@[Target
Subdomain]])-LEN([@TLD])),".","|",LEN(LEFT([@[Target Subdomain]],LEN([@[Target
Subdomain]])-LEN([@TLD])))-LEN(SUBSTITUTE(LEFT([@[Target
Subdomain]],LEN([@[Target Subdomain]])-LEN([@TLD])),".",""))))),[@[Target
Subdomain]])


Yeah, bit of a long one, I know. I wanted however to get it inside one formula
to avoid creating unnecessary columns and get the root domain in one go. The
IFERROR portion at the beginning is for the case where the Target Subdomain is
actually the Root Domain, so it just returns that instead. The rest of the
formula does exactly what I described above.




We now have our root domains!


The fifth (or third, depending if you used the last two) formula has the
purpose to check if a source domain links to a target one at least once, so you
can later see how many of your competitors get links from that source.


In the pivot table that we build in the next step, the formula will have the
role of doing somewhat of a "distinct count" of target root domains for each
source URL. Unfortunately, there is no way to do this without the formula unless
you are using Excel 2013.


I'm saying this because, in order to create this "distinct count", we can
actually use one of three formulas. Two of them might be faster than the third,
but you also might get error warnings from Excel (at least the 2010 32 bit
version I'm using) on large amount of links (like 30k+). I've decided to use the
third formula which, albeit it might be slower, seems to work fine with a lot of
links and yield the same result.


Create a new adjacent column called "Unique Domains", and add the following
formula:


Unique Domains


=IF(COUNTIFS(INDIRECT(ADDRESS(ROW(Table2[#Headers])+1,COLUMN([Source
Subdomain]))&":"&ADDRESS(ROW([@[Source Subdomain]]),COLUMN([Source
Subdomain]))), [@[Source
Subdomain]],INDIRECT(ADDRESS(ROW(Table2[#Headers])+1,COLUMN([Target Root
Domain]))&":"&ADDRESS(ROW([@[Target Root Domain]]),COLUMN([Target Root
Domain]))), [@[Target Root Domain]])=1,1,0)


The COUNTIFS function counts how many times a Source Subdomain has the same
Target Root Domain associated with it. The IF identifies the first association
of this kind and returns the value 1 for it and the value 0 for subsequent
associations (kind of like saying "Yes, this source links to this target at
least once").


The formula might seem long, but it's actually equivalent to this:


=IF(COUNTIFS($O$2:$O2,$O2,$R$2:$R2,$R2)=1,1,0)


In this case, the O column is the Source Subdomain, while the R column is the
Target Root Domain. 2 is the row number where you first introduce the formula
(the row right beneath the header row).


I chose the longer version so it can be applied without identifying the
columns and rows needed to make it work. This is why I used the INDIRECT and
ADDRESS functions, which have the purpose of automatically identifying the
necessary references for the formula, regardless of where the table is
positioned or how many columns it has.


Remember, if you haven't used the TLD/Root Domain formulas, you will use the
Target Subdomain reference instead of the Target Root Domain.


Unfortunately, the COUNTIFS function won't work for Excel 2003, so you need to
use a different formula for this issue, which you can find here (the SUMPRODUCT
version).


The final formula is rather simple, where we need to check if the link points
to us (our primary domain) or not, so we can later filter it. Create an adjacent
column named "Link To Us", and enter the following formula (where you replace
the "seomoz.org" with your root domain):


Link To Us


=IF([@[Target Root Domain]]="seomoz.org",1,0)


The formula is a basic if conditional, which returns 1 if it's our primary
root domain, 0 if it's a competitor.




Remember, if you haven't used the TLD/Root Domain formulas, then you're gonna
use Target Subdomain instead of the Target Root Domain (in which case, for this
example, you would have [@[Target Subdomain]]="www.seomoz.org").


Step 4: Build the Pivot Table

Okay, now that we have our backlink data table set up, the last two steps will
be quite simple.


Create a new sheet (or rename one of the other default ones) named "Pivot
Table". It is here where the data visualization will occur, and where you will
be spending the time to analyze it.




So let's create our pivot table. The process goes something like this:


Click the Insert tab -> click the Pivot Table button -> enter the name
of our table containing backlink data (usually Table1, unless you renamed it or
created multiple tables) -> hit OK.




You will now have an empty pivot table with a Field List sidebar. Here's how
we configure it:


Drag the Source Subdomain, Target Root Domain (or Target Subdomain accordingly)
and URL fields into the Row Labels box (in that order)
Drag Unique Domains, Link To Us and any metrics you want to have (like Domain
Authority) to the Values box (order doesn't matter). All of the fields should be
added automatically as a sum ("Sum of..."). For the metrics we actually need
averages, so repeat this process for each metric: click on "Sum of Domain
Authority (or whatever metric you have)" -> click Value Field Settings ->
choose Average instead of Sum, under "Summarize value field by" -> hit OK.




The field configuration should look like this (might differ somewhat depending
on your backlink data provider and the metrics you use):




Next, since the data is expanded and we can't really see anything, we need to
collapse it under the Source Subdomain fields. To do that, click one of the
source subdomains in the pivot table -> make sure you're in the Options tab
menu of the Pivot Table -> click Collapse Entire Field.




You can repeat the collapse process for a Target Root Domain as well.


Step 5: Sort It Out and You're Done

Now that we have everything we need inside the pivot table, we only need to
sort the data. First of all, since we're trying to get competitor links, we need
to filter out the Source Subdomains that already link to us.


To do this, click on the Row Labels dropdown -> Value Filters ->
Equals... -> in the new window that opens select "Sum of Link To Us" and
enter "0" in the value field -> hit OK.






Basically, you want to see the sites that link to most of your competitors,
since, as I mentioned at the beginning of this guide, these are the sites most
likely to link to you as well.


To sort it, click the Row Labels dropdown again -> More Sort Options ->
choose Descending (Z to A) by -> choose Sum of Unique Domains -> hit OK.






There, we now have a pivot table with domains that link to our competitors but
not us, sorted by the amount of competitors they link to.


You can of course sort it by Domain Authority (or any other metric you chose
to include). I've yet to find a proper way to sort by multiple columns (as in
sort by Sum of Unique Domains first, then by Domain Authority).


The final version looks something like this:




IMPORTANT NOTE: Before saving the file, to reduce the time it takes to open it
as well as reducing its size, we can replace all the formulas with their value
so Excel won't recalculate them. This is done by selecting all table data
(CTRL-A) -> copying it (CTRL-C) -> then pasting just the values (Paste
Values).


Well, that's about it. If you've made it his far, my hat's off to you, and I
really hope this guide helped improve your Excel-fu while also giving you a nice
tool to play with.


Future Improvements

There are a few things I had in mind that could be added to this tool:


Ability to sort by Unique Domains first, then by one or more metrics.
Ability to filter by metric value greater/smaller than x (e.g. Domain Authority
greater than 30); you can partially do that by moving the metric to the Report
Filter box and then ticking which values you need (gets boring really fast).
Ability to remake the pivot table to see Co-Citation opportunities; what I mean
by that is, instead of having Source Subdomain as your primary rows, you would
have the URL field, so you can see exactly which pages (as opposed to which
site) link to more than one competitor.


If you have any idea on how any of these can be implemented, write it in a
comment below.


Template and Instructions

As I promised, I'll include a template that contains all the necessary
formulas and the pivot table, to which you only need to add your backlink data.

I've decided to create two separate files:


vertify-link-intersect-sample-SUBDOMAIN.xlsx â this can be used when you
have backlink data from only one subdomain for each individual website (both
yours and your competitor's), and the backlinks point to pages of that subdomain
(in this case, domain.com and www.domain.com are considered different
sites/subdomains, so you'll have to choose which one you want the data for)

vertify-link-intersect-sample-ROOTDOMAIN.xlsx â this can be used for the
case where you use backlink data to pages of the root domain for at least one
website; in this case there will be two extra columns to extract the root
domain, which means you'll have to configure the TLD list in the Configuration
sheet





If you have any problems with the files or there's anything that you have
trouble understanding, please let me know in a comment below. Enjoy my carefully
crafted meme:




Editor's note: this post is available in Romanian on the author's own site at
http://www.vertify.ro/analiza-link-urilor-competitiei-link-intersect-in-excel/

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten
hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think
of it as your exclusive digest of stuff you don't have time to hunt down but
want to read!






You may view the latest post at
http://feedproxy.google.com/~r/seomoz/~3/x2DSJ0JqTOE/competitive-link-analysis-link-intersect-in-excel

You received this e-mail because you asked to be notified when new updates are
posted.
Best regards,
Build Backlinks Online
peter.clarke@designed-for-success.com

No comments:

Post a Comment