Resolving ‘Invalid RSS MIME type: text/xml’ when crawling RSS Data with the FAST Web Crawler
When we were attempting to crawl an RSS feed for a client using the FAST Web Crawler, we found the following error in our RSS Crawl Logs (located at \FASTSearch\var\log\crawler\node\fetch\<content collection name>)
“Invalid RSS MIME type: text/xml”
We found that the MIME type being indicated by the server was indeed “text/xml”. However, we were able to crawl another feed that used ‘text/xml’.
See screenshot below. The feed on the LEFT crawled properly, the feed on the right did not.
The only difference we were able to find was the working feed had the xml version tag, while the non-working one did not.
In order to ensure FAST can properly crawl your RSS feeds, make sure they are being returned as MIME type: application/rss+xml.
We were able to have our client change the MIME type of the RSS feed, and although it was formatted exactly it was previously when it was unable to be crawled, once set to application/rss+xml FAST crawled it properly.