Need to Get A HEAD?

This nice little piece was recently posted by David Kline on his blog:

Have you ever wanted to know what type of file was being pointed to by a given url before clicking the link?  Maybe you are writing an application that needs to filter out certain types of links.  A web crawler is a good example of an application which needs to do such link filtering (skip links to graphics, audio, zip files, etc).

In order to check the type of data pointed to by a url, you are going to need to issue a request to the server.  Normally, this involves receiving the entire page or file at that location.  This can be a time consuming proposition, especially over slow network connections, and defeats the purpose of allowing your application to filter out undesired links.

The solution is to issue a request to the server, asking only for the HTTP headers.  This "HEAD" request is small, fast (does not transfer file contents) and provides you with exactly the data your application needs.  While there are plenty of interesting headers, the header we are interested in today is "Content-type".

Below is a simple console application that takes a url path and displays the value of the content-type header.  Please note: To keep this example as small as possible, only minimal error checking is performed - any real-world implementation would need to do much more than what I show here.

using System;
using System.Net;

class ContentType
{
    public static void Main(String[] args)
    {
        if(args.Length != 1)
        {
            Console.WriteLine("Please specify a url path.");
            return;
        }

        // display the content type for the url
        String url = args[0]; 
        Console.WriteLine(String.Format("Url : {0}", url));
        Console.WriteLine(String.Format("Type: {0}", GetContentType(url)));
    }

    private static String GetContentType(String url)
    {
        HttpWebResponse response = null;
        String contentType = "";

        try
        {
            // create the request
            HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;

            // instruct the server to return headers only
            request.Method = "HEAD";

            // make the connection
            response = request.GetResponse() as HttpWebResponse;

            // read the headers
            WebHeaderCollection headers = response.Headers;

            // get the content type
            contentType = headers["Content-type"];
        }
        catch(WebException e)
        {
            // we encountered a problem making the request
            //  (server unavailable (404), unauthorized (401), etc)
            response = e.Response as HttpWebResponse;

            // return the message from the exception
            contentType = e.Message;
        }
        catch(NotSupportedException)
        {
            // this will be caught if WebRequest.Create encounters a uri
            //  that it does not support (ex: mailto)

            // return a friendly error message
            contentType = "Unsupported Uri";
        }
        catch(UriFormatException)
        {
            // the url is not a valid uri
           
            // return a friendly error message
            contentType = "Malformed Uri";
        }
        finally
        {
            // make sure the response gets closed
            //  this avoids leaking connections
            if(response != null)
            {
                response.Close();
            }
        }

        return contentType;
    }
}

The above code can be compiled and run on either the .NET Framework or the .NET Compact Framework.

Here's a sampling of the content types I received when running the above application against a handful of urls:

text/html
text/html; charset=utf-8
image/gif
application/octet-stream
text/plain

Enjoy!
-- DK

Disclaimer(s):
This posting is provided "AS IS" with no warranties, and confers no rights.

[Microsoft WebBlogs]

Comments

  1. Anonymous5:35 AM

    thanks Peter, just what i was looking for!

    ReplyDelete

Post a Comment

Popular posts from this blog

Some observations on Script Callbacks, "AJAX", "ATLAS" "AHAB" and where it's all going.

IE7 - Vista: "Internet Explorer has stopped Working"

FIREFOX / IE Word-Wrap, Word-Break, TABLES FIX

System.Web.Caching.Cache, HttpRuntime.Cache, and IIS Recycles

FIX: Requested Registry Access is not allowed (Visual Studio 2008)