API design: Handling platform specific errors

**MOS-6581** · 03-14-2015

Consider the situation where you're writing a cross-platform library that has functions for creating and manipulating windows. To create a window suitable for OpenGL rendering on Windows you need to call a bunch of different functions such as GetModuleHandle, RegisterClass, CreateWindow, SetPixelFormat, GetDC, wglCreateContextGetDC and so on. All of these functions can potentially fail in one way or another and the Windows documentation doesn't even attempt to list all the different error codes that these functions return.

Let's say your function for creating a window is called create_window(). What kind of error message should you return if any of these platform specific functions fail? It could be something as simple as MultiByteToWideChar failing for some reason when you convert the UTF-8 string your API uses internally to the UTF-16 string that the Windows API expects. I've read that you're not supposed to leak implementation details from your API but what are you suppose to return when an encoding conversion fails that is supposed to be completely transparent to the user of your library?

I've decided to simply return an error code indicating a platform specific error in those cases because there's no way I can anticipate all the different errors that the Windows API might return. The problem with that is that almost all the errors end up being platform errors and without a way to actually tell what the error is then you're basically saying that some unknown error happened. That's not very useful. You could have a function called get_platform_error() that you call to obtain the exact platform error that occurred, but I'm not a fan of such a design and I've had enough trouble with GetLastError() style error handling as it is.

An alternative to that is to create a struct that represents errors and have a field in it where you can store platform specific errors. I believe this is what Apple do with their NSError error object but I'm not sure how I feel about that approach.

Assuming you implement a way to return the platform specific error code then the next part is obtaining a string that describes the error. Your library can obviously not implement error strings for every single error code that Windows might return so you're forced to use FormatMessage. The problem with that function is that you can't force it to return error messages in a specific language and it's not even guaranteed that you can get an English translation of the error string. I don't want to display error messages that mixes English with the current language the operating system uses so how do I solve this problem?

Are there any good open source libraries I could study? The ones I've looked at so far have subpar error handling at best. Many of them don't even check the return values from platform specific functions.

**phantomotap** · 03-14-2015

What kind of error message should you return if any of these platform specific functions fail?

I've read that you're not supposed to leak implementation details from your API but what are you suppose to return when an encoding conversion fails that is supposed to be completely transparent to the user of your library?

I don't want to display error messages that mixes English with the current language the operating system uses so how do I solve this problem?

O_o

I'll tell you what I do for my own coded applications.

I divide errors into "informative" and "debugging" considerations.

The debugging errors are not intended for consumption by a user. The debugging error messages are consumed exclusively by me so they need to be in English. The debugging messages are either trace, platform, or context messages. The trace messages are actual trace lines which only show up with the verbose flag under debugging builds. The platform messages are literally whatever the platforms error functions return formatted for easier parsing. The context messages are basically a formatted dump of the function that failed with some cleanup to prevent leaking sensitive information showing up only when a platform message can't be harvested.

The informative lines are the ones I think you care about. The informative messages are intended for consumption by users so I use a system defaulting to English yet attempts to find a file on the system with translations associated by numeric identifier. (I further have a tool that can use such files to translate an English dump.) The informative messages are separated into error, warning, and success messages. The success message is simply a message which can be parsed to show progress that only shows up with the verbose flag. The error and warning messages are the same class of error representing different severity. The error and warning messages are further separated into marshalling, privilege, system, resource, and a few other such conceptual error categories. The conceptual categories are the lines potentially translated and given untranslated (raw) context.

For example, a privilege message might look like "error:#,#:`user lacks read permission ($)'" when formatted with the English language.

The "#,#" marker is an optional line and column number for those messages which are associated with parsing a file.

The "$" is raw context. The example privilege message would, in our imagined case, have the name of the file ("~/test.txt") as requested by the user.

I don't shift the burden of understanding a third-party error message onto the user. The above mechanism serves in the name of normalizing messages across platforms and across languages. I only worry about categories of errors so less work is necessary for a translation, but I do have enough such categories to cover most any situation a user could reasonably recover from before attempting again to use an application.

Soma

**MOS-6581** · 03-14-2015

Don't a lot of errors end up being "unexpected error" or something like that if you want to present normalized error messages? Creating a window on Windows might fail because RegisterClass failed but other platforms might not even use such a mechanism when creating windows. If the interface shouldn't leak implementation details then it would not be reasonable to return a "window class registration failed" error when most platforms don't even use such a mechanism. In that case you're left with returning "unexpected error/platform error" or leaking implementation details and have an error code for something that only happens on Windows.

In your code, how would you the case where a string encoding conversion fails that is supposed to be completely transparent to the user? For example, it's very common to use UTF-8 for strings in the interface and then change the encoding as needed whenever you need to interact with the underlying platform. This is yet another situation where you're forced between leaking implementation detail or return a cryptic error message. Granted, it is unexpected that such an error should occur so maybe returning "unexpected error" isn't a disaster.

**phantomotap** · 03-14-2015

Don't a lot of errors end up being "unexpected error" or something like that if you want to present normalized error messages? Creating a window on Windows might fail because RegisterClass failed but other platforms might not even use such a mechanism when creating windows. If the interface shouldn't leak implementation details then it would not be reasonable to return a "window class registration failed" error when most platforms don't even use such a mechanism. In that case you're left with returning "unexpected error/platform error" or leaking implementation details and have an error code for something that only happens on Windows.

O_o

You lack imagination, and you apparently also lack experience with other platforms.

You simply don't have to account for as many errors as you apparently think for each interface for consumption by a user. You need to look more carefully at why most categories of error codes exist. You might indeed look at the `RegisterClass' interface, for example, where a lot of error codes deal with failures to correctly use the interface. A few common `RegisterClass' errors early in development are "the parameter is incorrect" and "class already exists" which are errors you don't need to translate for the user. You don't need to translate the "the parameter is incorrect", "class already exists", and many similar errors for two supremely important reasons: the developer should have caught the misuse of an interface during routine development and an average user can do nothing to solve the problem in any event. (You would need to be aware of the misuse of the `RegisterClass' interface in order to translate the "the parameter is incorrect" error, but the translated error will do the user no good because the average user has no measures to fix data in the `WNDCLASS' instance provided to the interface. You should have instead researched why the error occurs and simply prevent that error from occurring in the first place. You could default to printing the `GetLastError' or similar, but the average user will still have no means to correct functionality.) The errors you can translate, such as "can't allocate class" or similar, can indeed be generalized with context which gives the user information necessary to attempt recovery. (The relevant message corresponding to exhausted "GDI" resources, for example, might be "can't allocate interface components" which corresponds with a help entry explaining that the message usually means that too many windows are open implying that the user might close a few windows before trying to run again the application. The core "X" windows environment libraries do not have anything similar to native "GDI" resources, but the core "X" libraries do have similar limits on certain resources which can be reported by the same error message because the context and solution of closing a few windows is conceptually the same. You could report more context or platform specific context, but the average user would have no way to free exactly the relevant "GDI" or "X" resources which means simply closing a few applications would be the most reasonable solution in any event.) You need to focus errors for user consumption on giving the user information necessary to recover or correct the environment without overwhelming the user.

The average user doesn't care in the slightest which, to consider your example, in a chain interface calls fails in setting up a rendering target if the context provided ("environment doesn't support minimum required resolution") is sufficient to inform a path to a solution. The average user can do nothing, again considering your example, if you as developer fail to call `wglCreateContext' so providing information that `wglMakeCurrent' failed is meaningless. You surely need such specific context during development, but the average user doesn't need such information.

In your code, how would you the case where a string encoding conversion fails that is supposed to be completely transparent to the user? For example, it's very common to use UTF-8 for strings in the interface and then change the encoding as needed whenever you need to interact with the underlying platform. This is yet another situation where you're forced between leaking implementation detail or return a cryptic error message. Granted, it is unexpected that such an error should occur so maybe returning "unexpected error" isn't a disaster.

I could directly answer, but I think you'll get more insight into my method if I first ask some questions.

You need to take your time to answer.

Where does the conversion take place?

Why does the conversion fail?

What can the user do about the failure?

Soma

(I don't think we have a "hide" so the quote is the best I can manage.)

You will only ever see a few reasons for data conversion to fail. In the case of encoding strings, you will have to allocate the memory for the conversion as well as perform the actual conversion. If the allocation fails, the memory available is unlikely to be sufficient to perform whatever operation is necessary that requires the conversion so a generic memory allocation error shown at the point of failure is entirely appropriate and provides sufficient information to guide the user in recovering from the error. The actual conversion will only fail, assuming your implementation is robust, if the data was malformed in the first place. You might, for example, do the conversion as the data is processed which allows you to show a marshalling (`invalid identifier ("ReQuested$id3ntif_er")' with a help entry explaining what constitutes a valid identifier) error as soon as possible which again provides sufficient information to guide the user.

A generic "unexpected error" would not guide the user in correcting malformed data, but you could argue that a conversion could spuriously fail without such context necessary for a second attempt. I would, for our imagined spurious "codec" problem, inform the user about the nature of the error after local recovery attempt much the same way I would inform a user about a "307 temporary redirect" situation.

**MOS-6581** · 03-14-2015

Originally Posted by phantomotap

Where does the conversion take place?

The conversion takes place transparently in all functions that internally need to convert an UTF-8 string to UTF-16 and supply that to WinAPI functions. The user shouldn't even be aware that such a conversion is taking place.

Originally Posted by phantomotap

Why does the conversion fail?

I don't know. The WinAPI doesn't specify all the error codes that functions might return. I can test my code to make sure it's correct but in the unlikely event that something goes wrong then I should return some kind of error code. If you call MultiByteToWideChar twice with the exact same input then it should give you the same result twice but there's no guarantee that it will. Robust code should deal even with the exceptional cases.

Originally Posted by phantomotap

What can the user do about the failure?

Most of the time, nothing. If RegisterClass fails then it's not the user's fault since it shouldn't be aware that the function even exists, but it might not be my fault either. It should be an exceptional event but when it happens but I figured I might want to display something a little more information than "Window creation failed: Unexpected error". Maybe that's not a good idea though. That's what I'm trying to ask. Technically, it is an unexpected error since even I as the programmer of the library couldn't anticipate it.

The internet is full of complaints from users about software that returned "unexpected error" though and if you receive a complaint like that yourself then you have no idea what went wrong.

**phantomotap** · 03-14-2015

The conversion takes place transparently in all functions that internally need to convert an UTF-8 string to UTF-16 and supply that to WinAPI functions.

O_o

The conversion-on-use is only one possibility. The point of the questions was to get you into thinking about the actual context of an error. You can handle where the conversion happens in a multitude of ways. You can control where the conversion happens thus also where to contextualize the any errors related to conversion.

We should say you are using `MultiByteToWideChar' in a deeply nested implementation within your library. The `ERROR_INSUFFICIENT_BUFFER' error from `MultiByteToWideChar' signals that you've incorrectly used the interface, but the information "I didn't write correct code." doesn't do anything for the average user so the report is largely useless outside of bug reports so we don't need to translate the error. (You could say that we are intentionally leaking details, but I will argue that we aren't leaking details because we are the ones consuming the bug report.) The `ERROR_NO_UNICODE_TRANSLATION' error from `MultiByteToWideChar' signals malformed data, but the implementation doesn't have the context to make a good decision about formatting the error even though the user may be able to correct the environment. The deeply nested implementation is then arguably the wrong place to make the decision. Where do we make the decision? Where is the context of the error most relevant?

The deeply nested implementation, let's imagine, uses `MultiByteToWideChar' because the `CopyFileEx' interface expects, almost, a "UTF16" identifier. You don't want to be a crappy programmer so aren't going to call `CopyFileEx' with malformed data requiring you to check that `MultiByteToWideChar' succeeds. (We could call `CopyFileEx' with malformed data, but the `CopyFileEx' interface would fail with some error.) The actual error is "label contains invalid codepoints", but the actual error is almost completely irrelevant. What does it mean for a user to see "label contains invalid codepoints" in a log? The user needs more context for the error to be useful. You can almost always format an error where more context is available. The deeply nested implementation doesn't know have enough information, but we know the calling interface has more information.

The calling interface knows that the utility function failed, but why did the utility function fail? The calling interface doesn't care about the data being invalid ("label contains invalid codepoints") because the information remains essentially useless. (The calling interface can neither fix the data nor request correct data from the user.) What is the function of the deeply nested implementation? Let's say, for the sake of discussion, that the deeply nested implementation is part of a secure file copy mechanism. Let's say that the calling interface is a generalized, platform independent, layer using multiple platform specific interfaces to fulfill the secure file copy mechanism. We know the highly specific yet virtually useless reason the mechanism failed, but the user only cares about why the application failed. The application failed because the secure file copy mechanism failed. We only need to report the context of the mechanisms failure to the user. What does it mean for a user to see "error::`could not copy file ("~/test\xFF.txt")'" in a log?

The "error::`could not copy file ("~/test\xFF.txt")'" message doesn't include the irrelevant details regarding the `MultiByteToWideChar' interface failing with `ERROR_NO_UNICODE_TRANSLATION' because the user can do nothing with such information, yet the information is sufficient to inform the user about the issue, and the message includes context necessary to guide a user in corrective measures when coupled with a help file ("valid characters in file names") or knowledgeable community without really leaking implementation details or private information.

Soma

**MOS-6581** · 03-14-2015

Originally Posted by phantomotap

The `ERROR_INSUFFICIENT_BUFFER' error from `MultiByteToWideChar' signals that you've incorrectly used the interface, but the information "I didn't write correct code." doesn't do anything for the average user so the report is largely useless outside of bug reports so we don't need to translate the error.

If you receive that error then it's most likely because you screwed up but it could be because you triggered a bug in MultiByteToWideChar or because radiation from outer space flipped a bit in the computer memory and made the function fail. Either way you need to return some kind of error when it happens and that error will be implementation specific unless you use a generic error code like "unexpected error" or "platform error". Based on your answers I'm thinking that I probably don't need to give a reason for the failure at all and just return "unexpected error" like I'm already doing. If create_window() then fails because MultiByteToWideChar failed with ERROR_INSUFFICIENT_BUFFER while converting the window title to UTF-16 then I guess the user of my library (that called create_window() in the first place) will just have to settle with not knowing why it failed.

**phantomotap** · 03-14-2015

Either way you need to return some kind of error when it happens and that error will be implementation specific unless you use a generic error code like "unexpected error" or "platform error".

O_o

Nonsense. You can, as I've shown, normalize error codes.

Based on your answers I'm thinking that I probably don't need to give a reason for the failure at all and just return "unexpected error" like I'm already doing.

I find the idea repellent. You should, as I've shown, contextualize the error without regurgitating useless detail.

If create_window() then fails because MultiByteToWideChar failed with ERROR_INSUFFICIENT_BUFFER while converting the window title to UTF-16 then I guess the user of my library (that called create_window() in the first place) will just have to settle with not knowing why it failed.

Foolish. You should, as I've shown, normalize the platform specific `ERROR_INSUFFICIENT_BUFFER' into the generic "out of memory".

Soma

**MOS-6581** · 03-14-2015

Originally Posted by phantomotap

Foolish. You should, as I've shown, normalize the platform specific `ERROR_INSUFFICIENT_BUFFER' into the generic "out of memory".

I don't agree with that. The error didn't occur because you ran out of memory but rather because something unexpected happened. Getting an "out of memory" error in such a situation would be more confusing than just saying that something unexpected happened.

Thread: API design: Handling platform specific errors

Thread Tools

Search Thread

Display

API design: Handling platform specific errors

Similar Threads

Design questions on prioriterized message handling

platform specific API or C standard API

So... handling errors.

Platform Specific...

Platform specific