Parallell extensions in .NET 4.0 introduced Tasks as a very useful abstraction for threads. However uncaught exceptions in Tasks can take down your web server.
Adding a global event handler for uncaught exceptions in Tasks, so called unobserved exceptions, solves the problem.
Code to solve the problem
There was quite a bit of backstory, but to cut a long story short a WCF service created .NET 4.0 Tasks, one of which then threw an exception later. This caused IIS to crash, but is in fact a designed feature of IIS.
At the top of my mind I can think of two common patterns for multithreaded applications. For fun I call these two patterns “Fire and Wait” and “Fire and Forget”. The solution to the problem was very straightforward: just add a special event handler to the hosting application. This is shown in the “Fire and Forget” code below.
Fire and wait
The gist of fire and wait is to simply create all the Task objects and start them, and ensure they are stored in a collection. Then the thread that created the Tasks just blocks on one of the static “Wait” methods, until all the Tasks are have completed or faulted.
Failed tasks will trigger an AggregatedException, which encapsulates all the Exceptions that occured in the Tasks. These can then be handled one by one.
The example code here uses simple Tasks which don’t return values:
public class FireAndWaitExample
{
public void RunTasks()
{
try
{
List tasks = new List();
var taskFactory = new TaskFactory();
tasks.Add(taskFactory.StartNew(() =>
{
throw new Exception("Thread failed!");
}));
tasks.Add(taskFactory.StartNew(() =>
{
throw new Exception("Yet another thread failed!");
}));
// Throws aggregate exception:
Task.WaitAll(tasks.ToArray());
}
catch (AggregateException aggregateEx) // The most important part
{
aggregateEx.Handle(HandleExceptionsInAggregateException);
}
}
public bool HandleExceptionsInAggregateException(Exception ex)
{
errorTrace.TraceData(TraceEventType.Error, 0, ex); // Do tracing and so on here
return true;
}
}
After handling the AggregatedException it is possible to read return values from Tasks which were set up for that. Return values of faulted Tasks are simply null.
Fire and Forget
In the Fire and Forget pattern your code cannot stick around waiting for the Tasks to complete. So catching an AggregateException as in the previous pattern will not work.
However, there is still a simple solution to the problem of Tasks throwing unobserved or unhandled exceptions. Registering a global event handler for the TaskScheduler.UnobservedTaskException
event is all you need to do. For ASP.NET applications (4.0, MVC 3 or any other for that matter) the best place to do this in Global.asax.cs:
protected void Application_Start()
{
// Catch unobserved exceptions from threads before they cause IIS to crash:
TaskScheduler.UnobservedTaskException += (object sender, UnobservedTaskExceptionEventArgs excArgs) =>
{
TraceSource trace = new TraceSource("UnhandledExceptionTrace"); // Example of tracing the exception
trace.TraceData(TraceEventType.Error, 1, excArgs.Exception);
excArgs.SetObserved();
};
}
You can then fault threads as you please without any danger of taking down IIS. And if exceptions should occur while in production then they will will be logged for certain, which makes diagnostics so much easier.
public class HomeController : Controller
{
public ActionResult Index()
{
Task t1 = new Task(() =>
{
throw new Exception("Fire-and-forget thread faulted!");
});
t1.Start();
return View();
}
}
Taken together, the Fire and Wait and Fire and Forget patterns make it quite easy to write safer threaded applications even if the threads throw exceptions.
The Backstory: Random crashes
Recently I came across a problem with a WCF service hosted on Azure. The IIS process w3wp.exe would crash, and the service instance would then be unavailable for a short while. Usually errors like these would be reported in the application’s exception log, but the log revealed nothing. Some information could be gleaned from the Windows Application and System log though.
Most of the service’s operations were straightforward CRUD operations, but a few operations were different. For business reasons they would start threads which continued doing work after the service operation returned. A clue pointing to these threads as the likely reasons for the crashes was the fact that IIS crashed several minutes after the last service operation had returned. We used a controlled test deployment, so we knew this for certain. So the only code that could be running at the time of the crashes would be the threads.
One of the three errors seen in the Windows Application log relating to IIS crashing
The source of the crashes
Taking a closer look at the Tasks mentioned earlier it became clear that not all of them were wrapped in a try-catch block. In fact some quick testing revealed that one was throwing uncaught exceptions due to it not being able to reach a remote host.
Some more searching was necessary. It turned out that IIS was set up so that unobserved exceptions rethrown on the finalizer thread would cause the w3wp.exe process to terminate! In other words uncaught exceptions from faulted threads cause IIS to crash during .NET garbage collection. This was due to a security policy specific to IIS, and is a default setting for at least Windows 7 and Server 2008, if not all Windows versions.