Need to Get A HEAD?

December 07, 2004

This nice little piece was recently posted by David Kline on his blog:

Have you ever wanted to know what type of file was being pointed to by a given url before clicking the link? Maybe you are writing an application that needs to filter out certain types of links. A web crawler is a good example of an application which needs to do such link filtering (skip links to graphics, audio, zip files, etc).

In order to check the type of data pointed to by a url, you are going to need to issue a request to the server. Normally, this involves receiving the entire page or file at that location. This can be a time consuming proposition, especially over slow network connections, and defeats the purpose of allowing your application to filter out undesired links.

The solution is to issue a request to the server, asking only for the HTTP headers. This "HEAD" request is small, fast (does not transfer file contents) and provides you with exactly the data your application needs. While there are plenty of interesting headers, the header we are interested in today is "Content-type".

Below is a simple console application that takes a url path and displays the value of the content-type header. Please note: To keep this example as small as possible, only minimal error checking is performed - any real-world implementation would need to do much more than what I show here.
using System; using System.Net; class ContentType { public static void Main(String[] args) { if(args.Length != 1) { Console.WriteLine("Please specify a url path."); return; } // display the content type for the url String url = args[0]; Console.WriteLine(String.Format("Url : {0}", url)); Console.WriteLine(String.Format("Type: {0}", GetContentType(url))); } private static String GetContentType(String url) { HttpWebResponse response = null; String contentType = ""; try { // create the request HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest; // instruct the server to return headers only request.Method = "HEAD"; // make the connection response = request.GetResponse() as HttpWebResponse; // read the headers WebHeaderCollection headers = response.Headers; // get the content type contentType = headers["Content-type"]; } catch(WebException e) { // we encountered a problem making the request // (server unavailable (404), unauthorized (401), etc) response = e.Response as HttpWebResponse; // return the message from the exception contentType = e.Message; } catch(NotSupportedException) { // this will be caught if WebRequest.Create encounters a uri // that it does not support (ex: mailto) // return a friendly error message contentType = "Unsupported Uri"; } catch(UriFormatException) { // the url is not a valid uri // return a friendly error message contentType = "Malformed Uri"; } finally { // make sure the response gets closed // this avoids leaking connections if(response != null) { response.Close(); } } return contentType; } }
The above code can be compiled and run on either the .NET Framework or the .NET Compact Framework.

Here's a sampling of the content types I received when running the above application against a handful of urls:

text/html
text/html; charset=utf-8
image/gif
application/octet-stream
text/plain

Enjoy!
-- DK

Disclaimer(s):
This posting is provided "AS IS" with no warranties, and confers no rights.
[Microsoft WebBlogs]

Comments

Anonymous5:35 AM
thanks Peter, just what i was looking for!
ReplyDelete
Replies

Add comment

Search This Blog

Peter Bromberg's UnBlog

Need to Get A HEAD?

Comments

Post a Comment

Popular posts from this blog

Some observations on Script Callbacks, "AJAX", "ATLAS" "AHAB" and where it's all going.

IE7 - Vista: "Internet Explorer has stopped Working"

FIREFOX / IE Word-Wrap, Word-Break, TABLES FIX