Add Vimeo Hosted Videos to Your Video Sitemap

If you have videos hosted over on Vimeo and have tried adding these to your video sitemap and submitting that sitemap in Google Webmaster Tools you may have ran into a problem where the video URL you are trying to add is blocked by robots.txt. Fortunately, this is easily resolved.

 

Error Messages

 

You will usually get an error message along the lines of the following:

Url Blocked by Robots.txt
sitemap contains URLs which are blocked by robots.txt

 

A Quick & Easy Fix

 

The fix for this is quick and simple (for a change) and you just need to update the paths you use to point to the video file.

 

The standard link you will have will be something like:
http://player.vimeo.com/video/12345678

 

You want  to change these so they look like this:
http://www.vimeo.com/moogaloop.swf?clip_id=1245678

 

So, you should end up with your video:player_location tag looking something like this:

<video:player_loc allow_embed="yes">http://www.vimeo.com/moogaloop.swf?clip_id=VIDEO_ID</video:player_loc>

 

To state the obvious, just in case, obviously, you should replace the VIDEO_ID value with the numeric ID of your video. Obvious right but just in case. 

 

Why this happens

 

Ahh, the inquisitive type, congratulations, you are part of the 1% that reads past the quick fix and wants to know why you can’t add the Vimeo Videos to the video sitemap in the first place.

RTFEM – Read The F***ing Error Message

 

As is often the case, any kind of computing error message is supplied with some signalling that turns of the brains of around 90% of people and if we can dig past the harsh, cryptic exterior of the message and this obvious mental trickery we can get to the heart of the problem.

Url Blocked by Robots.txt

 

This is exactly what it says on the tin. The URL you are trying to submit is blocked by the Robots.txt file. The URL is not on your site so the URL on the player.vimeo.com domain must have a rule preventing access to search engine spiders.

Sitemap contains URLs which are blocked by robots.txt

 

This is more of the same and again points us to the robots.txt file on the player.vimeo.com site. So, if we take a look at the robots.txt on the domain for the original link:

http://player.vimeo.com/robots.txt

 

Then, you can see it has a brief and comprehensive go away to all and sundry:

User-agent: *
Disallow: /

 

If we look a the robots.txt file on the alternative address we can see there are no such restrictions:
http://vimeo.com/robots.txt

 

So, with our entry unbarred, our friendly neighbourhood web spiders from Google, Bing etc can crawl and index the URL happily.

 

That’s a wrap!

If you have got any video indexing questions then drop a comment below or give me a shout on Twitter and please remember to be sociable and share this on your favourite social network via the sharing icons below. 🙂

 

 

Facebook
Twitter
LinkedIn
Pinterest

8 Responses

  1. That was a great, concise and helpful description. It helped me keep my stuff clean after hours of pulling out my hair. Never realized it was the robots.txt from player.vimeo.com that was causing all the pain. Thanks for this useful article.

  2. I’ve been stuck on this for two days! I saw that the robots.txt was disallowing everything. Just couldn’t figure out if/how I could work around it to include my vidoe in a sitemap. Your quick fix worked perfectly! Thanks for writing up this awesome solution and explanation.

  3. Hey Marcus,

    Good answer. I was wondering, since they will be doing away with the old embed code (which I think is where the http://www.vimeo.com/moogaloop.swf?clip_id=1245678 comes from), will this still work once they trash it? Or, are we screwed? Also, if I was to create a CNAME for a subdomain name, let’s say video.theurl.com that points to http://player.vimeo.com/external/ (which is the url I’m getting the error with) would that override the robots.txt file? Or would it still be the same problem, just a different path?

  4. I just set up a subdomain using Vimeo Portfolio. I am not a computer whiz kid. I am getting a message from Google that there is a 100% error for Googlebot to crawl the site. Of course I want the spider to come to my new subdomain. How can I fix this?

    1. Hey Emily. It should just be a case of using the correct link structure as detailed in the article. Happy to take a quick look though and feedback if you want to drop me an email via the contact form. Happy to help! 🙂

  5. Hi Marcus,
    Great article! Very clear and concise. I’m in the process of implementing video sitemaps for my site. I was wondering has https://vimeo.com/robots.txt changed since the writing of this article?

    It appears that vimeo.com allows the Google Adwords bot to spider:

    User-agent: Mediapartners-Google
    Disallow:

    But for everything else, including the Googlebot for search, it disallows to /moogaloop/ which is where the player location is stored:

    User-agent: *
    Disallow: */format:thumbnail
    Disallow: /download/
    Disallow: /*/download?
    Disallow: /moogaloop/ <——— blocks it here

    Am I missing something? If not, any ideas on how to get around this?

    Also, what is your opinion or storing the video thumbnail on the server vs just pointing the sitemap to the thumbnail url stored on vimeo?

    Thanks in advance for your help. Cheers.

Leave a Reply

Your email address will not be published. Required fields are marked *

Free Digital Marketing Review

Book a call with Marcus Miller to learn how to generate more traffic, leads and sales.

1. Review your current marketing
2. Identify your key opportunities
3. How to improve results & cut costs!

Put over 20 years of marketing experience to work for your business today.