Posted by Sam Crocker
Having access to data and large data sets is something any SEO worth his salt craves. Sure, managing a massive dataset or database can be a bit of a hassle, but having good information is key and there are a handful of uses for other people’s/sites’ data sets that are readily available for purchase online. Big budget linkbuilding isn’t the only way to spend your SEO budget these days!
Let’s take a look at five examples of datasets that you can easily and readily purchase and how you might go about using them.
The true motivation for this article was a chat with Tom… and the fact that you can now get Geocities (yes, seriously, the whole thing) in the form of a 1 tb torrent (thanks Hackernews). How much is it going to cost you? Just your email address.
Screenshot of GeoCities–izer version of SEOmoz
What would you do with Geocities, you ask? The sky’s the limit with this one really! I’m not saying you want to use any of the great tickers and beautiful layout/seisure inducing colours for which Geocities is now famous.
However, you may very well want to use the huge volume of content that could quite easily be respun for your own purposes for a start? Or, use the epic designs for mapping out your new site- up to you!
Why pay for keywords? Well, for starters, because sometimes you may find you have a client that has exhausted the entire set of data available through the adwords API (yes, this has seriously happened before). If the site is strong enough and you find you’re still able to rank reasonably for long-tail terms post-MayDay there’s no harm in creating some new content to target the long-tail. This isn’t to suggest that you should buy keyphrases and not do the research yourself, but discussed, more data is almost always better than less.
Out of Words? – Stock Image Provided by Shutterstock
And, most importantly- just because the data isn’t in the API doesn’t mean there isn’t any search volume for it!
Some of the outfits out there selling keywords and keyphrases are:
This sort of thing won’t come cheap, but it can be extremely valuable to the larger sites.
Some of you may be familiar with using 80 legs as a tool to crawl and scrape your way through the interwebs. It’s a tool that I’ve not spent nearly enough time with as I didn’t find it quite as intuitive to use as Mozenda. However, the nice thing about 80 legs is that they have compensated for this a bit by offering packaged-up crawls.
The vast majority of the packages cost 0 per month (with the exception of the ebay motors crawl for 0/month) though the data you could pull off these is extremely valuable and saves you the trouble of doing any of the crawling yourself (or if your IP has been banned you naughty SEOs).
Again, these sets could be used for anything from price-comparison to market analysis and right on down to content creation and keyphrase research. If you’re one of the fortunate few working in the space for which these are offered you should definitely have a look.
So, the Twitter Census dataset is just an example of the variety of datasets you can buy from InfoChimps though the general concept of owning one year’s worth of URLs, hashtags, and smiley usage seems like it could be used a number of ways. Either, you could create an infographic worthy of a link from the likes of Mashable, TechCrunch, etc.
Or, you could use the data to monitor keyphrase usage, common abbreviations, or any other sort of trend in social interaction (could be a great source of keyphrases as well as the search engines begin to take signals and include social directly in the SERPs. This set is currently placed at 0.
Rand was being a bit coy about this one and at time of press I wasn’t able to get a serious price out of them but there’s a price for everything right? Any serious bidders should probably get in touch with the SEOmoz team directly…
Along these lines, there are a number of other datasets that do not have a price set but I’m sure you could get your hands on with enough money and asking the right people. These would include: Backtype API data, Wordtracker, or Amazon’s entire product catalogue. It all comes down to asking the right people, but ultimately anyone with a brain for business and a load of data would sell you their info if you know how to ask for it.
Don’t you just love it when you can get your hands on some awesome free stuff that you never knew you wanted in the first place? Well, thankfully, there are a few datasets that I came across that I thought were worth sharing and could give you some value for free.
Free Stuff! Stock Image from Shutterstock
Feel free to take a gander at these datasets and try to make use of the data! Can you say "infographic ammunition"?
The entire dataset from the New York Stock Exchange from 1970-current (Open, Close, High, Low, Volume).
Massive sets of US Census Data.
And for those of us based over in the UK – huge volumes of UK Government data right at your fingertips.
Other Huge Datasets to get stuck into:Project Gutenberg for over 6,000 full books available online. These book lists at the very least could be of
One thing that you may have noticed is a byproduct of providing large datasets to people is that they tend to be solid gold for linkbait. We could focus an entire post around this but if you’ve got access to great data and you’re not offering it out to your users/curious SEOs what are you thinking?! Publish the data, make it free to download, and require a link back for attribution for anyone who wants to use it- simples!
Mike Mindel from Wordtracker has put together some nice free SEO overview videos for beginners in the industry. Check them out:
Both Yahoo! and Microsoft have confirmed that they will start testing the Bing algorithm live on some Yahoo! traffic this month. One of the big questions from the SEO perspective is what happens to Yahoo! Site Explorer? If it goes away then webmasters will need to get link data from web indexes built by SEO companies, perhaps either Open Site Explorer and/or Majestic SEO.
Yahoo! also offers a link: search in their BOSS program. While they have stated that the BOSS program will live on, there is little chance of the link: operator working in it over the longrun as Bing has disabled inbound link search on Bing.Blekko, which is a soon to launch search start-up, doesn’t have much to lose in sharing data. In the short run anything to gain awareness will likely make them money in the longrun. And so they are doing just that:
Blekko is also showing just about all the behind the scenes data that they have to determine rank and relevancy. You can see inbound links, duplicated content and associated metadata for any domain in their index.
Blekko will also come with custom slashtags which users can use to personalize search. And end user feature for average users? Not sure. But it will be interesting to web developers & power searchers. There are already heated debates in the comments on TechCrunch on if people will use that feature. IMHO the point isn’t for it to be an end user service for average searchers, but to be one which generates discussion & builds loyalty amongst power users. And clearly it is working.
They are also following the Jason Callus-Anus strategy of anti-SEO marketing (while giving SEOs tons of free data)
The SEO gamers, content farmers and link shoppers are not going to be happy. These guys are flooding the web with content designed to turn a profit, not inform, and the searcher pays the price. One company alone generates literally tens of thousands of pages every day that are solely designed to make money from SEO traffic. Slashtags are the perfect way to bypass them and search only the sites you like.
One more reason the content farmers aren’t going to be happy: we’re opening up all the data that is the core foundation of their business. Link data, site data, rank data – all there for everyone to see. In one fell swoop the playing field just got leveled.
I think a core concept which many search engines have forgot (in an attempt to chase Google) is that if you have a place in the hearts and minds of webmasters & web developers then they will lead other users to your service.
Money is one way to buy loyalty. And Google will pay anyone to syndicate their ads, no matter what sort of externalities that leads to. But now the web is polluted with content mills. Which is an opportunity for Blekko to differentiate.
Since Yahoo! is a big publisher they had mixed incentives on this front. They do share a lot of cool stuff, but they are also the same company which just disappeared the default online keyword research tool and replaced it with nothing, and they recently purchased a content mill. This was a big area where Bing could have won. They created a great SEO guide & are generally more receptive to webmaster communications, but they have fumbled following redirects & have pulled back on the data they share. Further, if you look at Bing’s updated PPC guidelines, you will see that they are pushing out affiliates and chasing the same brand ad Dollars which Google wants. Bing will be anything but desperate for marketshare after they get the Yahoo! deal in place.
Blekko goes one further than the traditional sense of “open” for their launch. They not only give you the traditional open strategy:
Furthermore, we intend to be fully open about our crawl and rank data for the web. We don’t believe security through obscurity is the best way to drive search ranking quality forward. So we have a set of tools on blekko.com which let you understand what factors are driving our rankings, and let you dive behind any url or site to see what their web search footprint looks like.
but they also offer a “Search Bill of Rights” which by default other search companies can’t follow (based on their current business models):
1. Search shall be open
2. Search results shall involve people
3. Ranking data shall not be kept secret
4. Web data shall be readily available
5. There is no one-size-fits-all for search
6. Advanced search shall be accessible
7. Search engine tools shall be open to all
8. Search & community go hand-in-hand
9. Spam does not belong in search results
10. Privacy of searchers shall not be violated
And so based on the above they appeal to…
From a marketing perspective, their site hasn’t even launched yet and there is *at least* a half-dozen different reasons to talk about them! Pretty savvy marketing.
The other day a person contacted me about wanting to help me with ad retargeting on one of my sites, but in order to do so they would have had to have tracked my site. That would have given them tons of great information about how they could retarget all my site’s visitors around the web. And they wanted me to give that up for free in an offer which was made to sound compelling, but lacked substance. And so they never got a response.
Given that we live in “the information age” it is surprising how little people value data & how little they expect you to value it. But there are still a lot of naive folks online! Google has a patent for finding under-served markets. And they own the leading search engine + the leading online ad network.
At any point in time they can change who they are voting for, and why they are voting that way.
They acquired YouTube and then universal search was all the rage.
Yes they have been pretty good at taking the longterm view, but that is *exactly* why so many businesses are afraid of them. Google throws off so much cash and collects so much data that they can go into just about any information market and practice price dumping to kill external innovation & lock up the market.
Once they own the market they have the data. From there a near infinite number of business models & opportunities appear.
Google recently became the #1 shopping search engine. How did they respond? More promotion of their shopping search feature.
All those star ratings near the ads go to a thin affiliate / Google value add shopping search engine experience. Featured placement for those who are willing to share more data in exchange for promotion, and then over time Google will start collecting data directly and drive the (non-Google) duplication out of the marketplace.
You can tell where Google aims to position Google in the long run by what they consider to be spam. Early remote quality rater guidelines have highlighted how spammy the travel vertical is with hotel sites. Since then Google has added hotel prices to their search results, added hotels to some of their maps, and they just acquired ITA software – the company which powers many airline search sites.
Amongst this sort of backdrop there was an article in the NYT about small book shops partnering up with Google. The title of the article reads like it is straight out of a press release: Small Stores See Google as Ally in E-Book Market. And it includes the following quote
Mr. Sennett acknowledged that Google would also be a competitor, since it would also sell books from its Web site. But he seemed to believe that Google would favor its smaller partners.
“I don’t see Google directly working to undermine or outsell their retail partners,” he said. “I doubt they are going to be editorially recommending books and making choices about what people should read, which is what bookstores do.”
He added, “I wonder how naïve that is at this point. We’ll have to see.”
If they have all the sales data they don’t need to make recommendations. They let you and your customers do that. All they have to do to provide a better service than you can is aggregate the data.
The long view is this: if Google can cheaply duplicate your efforts you are unneeded duplication in the marketplace.
3 out of 4 ain’t bad. But they even on the one they missed, they still have an AdSense category for it.SEO Book.com – Learn. Rank. Dominate.
Has a competitor launched a new feature that concerns you? If so, how do you react?
Google, well known for their public relations expertise, does not like the idea of Facebook creating an (eventual) distributed ad network based on demographics data. In spite of Google personalizing search by default (without asking), Google opting you into behavioral targeting (without asking), & automatically opting you into Google Buzz (without asking), suddenly they are a company concerned with the privacy of people on *other* networks.
An effective attack typically should not look like it comes from corporate, but sound more like a list of alarmed concerns issued by individuals just like you. And so we get alarmed stories from the likes of Ka-Ping Yee, a software engineer for the charitable arm of Google:
Facebook’s new system for connecting together the web seems to have a serious privacy hole, a web developer has discovered.
“It seemed that anyone could get this list. Today, I spent a while checking to make sure I wasn’t crazy,” he wrote on his blog. “I didn’t opt in for this. I even tried setting all my privacy settings for maximum privacy. But Facebook is still exposing the list of events I’ve attended, and maybe your event.”