Transient fault handling in gRPC with retries in C#

gRPC calls can be interrupted by transient faults. Transient faults include:

  • Momentary loss of network connectivity.
  • Temporary unavailability of a service.
  • Timeouts due to server load.

When a gRPC call is interrupted, the client throws an RpcException with details about the error. The client app must catch the exception and choose how to handle the error.

var client = new Greeter.GreeterClient(channel);
try
{
    var response = await client.SayHelloAsync(
        new HelloRequest { Name = ".NET" });

    Console.WriteLine("From server: " + response.Message);
}
catch (RpcException ex)
{
    // Write logic to inspect the error and retry
    // if the error is from a transient fault.
}

Duplicating retry logic throughout an app is verbose and error-prone. Fortunately, the .NET gRPC client now has built-in support for automatic retries. Retries are centrally configured on a channel, and there are many options for customizing retry behavior using a RetryPolicy.

var defaultMethodConfig = new MethodConfig
{
    Names = { MethodName.Default },
    RetryPolicy = new RetryPolicy
    {
        MaxAttempts = 5,
        InitialBackoff = TimeSpan.FromSeconds(1),
        MaxBackoff = TimeSpan.FromSeconds(5),
        BackoffMultiplier = 1.5,
        RetryableStatusCodes = { StatusCode.Unavailable }
    }
};

// Clients created with this channel will automatically retry failed calls.
var channel = GrpcChannel.ForAddress("https://localhost:5001", new GrpcChannelOptions
{
    ServiceConfig = new ServiceConfig { MethodConfigs = { defaultMethodConfig } }
});

References
https://devblogs.microsoft.com/dotnet/grpc-in-dotnet-6/
https://docs.microsoft.com/en-us/aspnet/core/grpc/retries?view=aspnetcore-6.0