Tallan's Technology Blog

Tallan's Top Technologists Share Their Thoughts on Today's Technology Challenges

Resolving ‘Invalid RSS MIME type: text/xml’ when crawling RSS Data with the FAST Web Crawler

Overview

When we were attempting to crawl an RSS feed for a client using the FAST Web Crawler, we found the following error in our RSS Crawl Logs (located at \FASTSearch\var\log\crawler\node\fetch\<content collection name>)

“Invalid RSS MIME type: text/xml”

image

We found that the MIME type being indicated by the server was indeed “text/xml”.  However, we were able to crawl another feed that used ‘text/xml’.

See screenshot below.  The feed on the LEFT crawled properly, the feed on the right did not.

image

The only difference we were able to find was the working feed had the xml version tag, while the non-working one did not.

Solution

In order to ensure FAST can properly crawl your RSS feeds, make sure they are being returned as MIME type: application/rss+xml.

We were able to have our client change the MIME type of the RSS feed, and although it was formatted exactly it was previously when it was unable to be crawled, once set to application/rss+xml FAST crawled it properly.

image

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>