cd ~

Vulkan Apps Now Functional

After much effort I finally got a minimal Vulkan application working. And I mean minimal: Just create a Vulkan instance and destroy it. After more wrestling, a small physical device enumeration example followed.

This is it. Vulkan is in Racket.

I have not proved 100% coverage, so on that front I cannot declare victory. But the physical device program is evidence that someone can write (require vulkan/unsafe) and hand-port a Vulkan program in C over to Racket. In my mind, that’s huge. It means that Racket is now a valid medium in which to learn and practice using Vulkan. It also means that if someone reports a bug, I can offer support and iterate on actual Vulkan applications.

I hope that this will be my last article before showing the output of an actual compute pipeline, but there is so much to learn that I have to get notes published before they grow too large for one page.

Let’s talk about my recent lessons.

“Forward Declarations” Didn’t Work

I jumped the gun when I said that declaring types as symbols helped me get around writing a topological sort for types named by the Vulkan registry. Since Racket will not allow you to redefine an identifier in a non-interactive context, I could not write a (define-cstruct) after my “forward declaration” for that struct.

I ended up writing an algorithm to sort the types in the Registry based on name references. What made this really tricky is that the name references come from more than one source depending on the surrounding type declaration. If one day the Khronos Group decides to do something weird like put a type behind a preprocessor macro referenced in the C text fragment of an function pointer spec—no, they are NOT above this—my sort will end up incorrect.

Sigh.

The good news is that my bindings are generated from a local mirror of vk.xml. I can compare diffs when it changes to see if I have to monkey-patch something. I’m not thrilled about that being a support task, but it’s better than an automated update causing chaos.

Creating Healthy Distance from ffi/unsafe Wrapper Functions

I’m learning Racket’s FFI collection as I go, so having a big soup of generated bindings gives me a great crash course. I feel like I was outside of the assumed audience for the collection’s documentation, which slowed me down a little. That said, I want to give a big thank you to Matthew Flatt for helping me over my hurdles.

There are now over 2200 Racket bindings for #defined constants, enumerated type names, enumerant names, struct names, union names with custom accessors, and function names. That’s not huge in an enterprise setting, but it’s big enough to hurt if I don’t represent the API faithfully.

Racket’s FFI collection maintains a strong boundary between Racket and C, such that the values you pass across this boundary are subject to different translation and allocation rules. This becomes apparent when you use the _fun form, which the Racket FFI docs wholeheartedly suggest. _fun implicitly handles common C concerns for you, but insists that you enable this help as part of declaring a Racket procedure to call a C function.

To understand the impact, note that Vulkan exports a handful of functions that you need to call twice to allocate memory for structured data. The first call enumerates the number of elements you need, the second actually populates the memory you allocate for that many elements.

uint32_t num;
f(&num, NULL);

X* array = (X*)malloc(num * sizeof(X));
f(&num, array);

A previously generated signature for one of these functions looked like this:

(define-vulkan vkEnumerateInstanceLayerProperties
  (_fun (o0 : (_ptr o _uint32_t))
        (o1 : (_ptr o _VkLayerProperties))
        -> (r : _VkResult)
        -> (begin (check-vkResult r (quote vkEnumerateInstanceLayerProperties))
                  (values o0 o1))))

It’s wrong.

Look at the forms starting with o0 and o1. These are identifiers bound to the values produced using the type specification following the colon. The (_ptr o ...) form represents an “output” pointer type, making the Racket procedure act like a pass-by-reference function to C code. It is not exactly equivalent to a pointer type to non-const data, which was my initial understanding. It merely implies that.

What this signature actually does is tell Racket to allocate space for the given type in that parameter, pass it to C, and then make sure it’s bound for later return in the values part of the form.

Knowing this, we can see why the signature is wrong: An allocation for o1 would occur for each call, with no regard to the value Vulkan would write to the memory referenced by o0. In other words, the parameters do not have their intended relationship.

But that wasn’t the scary part. The scary part was that the (_ptr o ...) form reduces the arity of the wrapping Racket procedure.

That means the call would look like this in Racket.

(define-values (num pArray) (vkEnumerateInstanceLayerProperties))

It looks convenient and helpful, yes, but it won’t work. Also, it means the procedures I provide are not a faithful reproduction of the Vulkan API, despite attempts to follow the Vulkan API Registry. Functions are missing parameters, and are preventing any workaround to allocation errors.

So, I have a design problem in front of me. Do I try to generate the most helpful signatures I can up front to reduce boilerplate and increase safety, or do I minimize the help and allow my clients to risk undefined behavior through incorrect usage?

If I generate safer bindings, I would not be providing the Vulkan API. I’d have to document every discrepency from the standard across 346 (cross-platform) functions, and my users would have to learn them.

I cringe at the thought of one hypothetical sentence from the above example:

“Hey, this is Vulkan, except every pointer to non-const data you see in a parameter meant that parameter is not actually there and is instead one of the return values.”

If I heard that during a conference I’d ask what was in the punch.

I think I can get away with automatically checking all return codes and converting errors to runtime exceptions, which is a nice perk of _fun. But I don’t even know if Vulkan experts would agree with that decision, since some success codes like VK_INCOMPLETE are assumed erroneous only in beginner Vulkan tutorials.

This is how I started getting the impression that I wasn’t in the primary audience for the FFI documentation: In this project, I am a library author. If I were an application author that was hand-binding a subset of Vulkan, then I wouldn’t have any problem leaning on the FFI collection to manage patterned allocations and bindings for me.

So, I made the judgement call to keep unsafe bindings openly so, putting the onus on client code to do more C stuff. Vulkan is simply too complicated to generate tailored _fun forms in one pass.

There will be helpers, but any helpers I write or generate will supplement the unsafe bindings as an additional layer of code.

With this, my only critique of Racket’s FFI collection is that it is too confident in encouraging use of the _fun form. The collection—and _fun in particular—gives you access to C functions, but obliges you to hide C concerns at the same time. This works for 95%+ of projects, but complicates any attempt to represent a raw C API.

To elaborate on that point, here is one of the suggestions from the user list on handling my erroneous signature for the two-step allocation:

 (define-vulkan vkEnumerateInstanceLayerProperties
   (_fun (o0 : (_ptr io _uint32_t))
         _pointer
         -> (r : _VkResult)
         -> (begin
              (check-vkResult r 'vkEnumerateInstanceLayerProperties)
              o0)))

o1 is gone, as well as the _VkLayerProperties pointer type. Apparently there was not a custom function type I could declare to handle this using one binding, which is why _pointer is used at all. This means that the procedure is more “raw”. To me, I interpret “raw” as “faithful to C”, so I considered this discussion as evidence that I should lean towards less safety for the first set of generated bindings.

To be clear, I am not criticizing the Racket FFI collection for having a design flaw. I am only saying that the documentation strongly advises use of _fun and friends almost regardless of context, which made it harder to deliver Vulkan to Racket on an “as-is” basis.

Getting struct layouts right

I got sloppy when generating structs, in that I did not account for members with an array of elements, and assumed some members were pointers when they were not. This of course impacts the size of the underlying C struct and opens the door to buffer overruns.

To make solving this problem even more fun, some struct members use API constants (via #define) to define array length. As I mentioned earlier, preprocessor directives screw with my topological sort of types. In this case, a named constant from the preprocessor doesn’t come from any declared type. I had to artificially categorize API constants and manually toss them before struct declarations.

I write this only to say to anyone parsing vk.xml that you really have to handle all the casework of structs up front. As technical debt goes, any shortcut you take here is a payday loan.

The struct sizing bug I encountered is now fixed, although there are no tests guarding against this from happening again. The needed test could generate both Racket and C code that emit (ctype-sizeof X) and sizeof(X) data points, respectively. The test would fail if there exists any mismatch.

What’s Next?

Writing more Vulkan apps! I’m still limiting myself to simpler applications. A finished compute pipeline is the next milestone. After that, presentation to the GUI.


1 The FFI has more degrees of freedom regarding pointer type declaration because pointers are associated with Racket value “tags” to implement some Racket-side validation.