exploring dynamic dispatch in rust

type

status

slug

date

summary

category

password

icon

在计算机科学中，动态分派（Dynamic dispatch）是指运行时选择哪一个多态的实现（具体的方法或函数）来调用的过程。动态分派通常被应用于面向对象编程（OOP）的语言和系统，并被认为是一个主要特点。

Suppose I want to create a struct CloningLab that contains a vector of trait objects (in this case, Mammal):

This works fine. You can iterate over the vector of subjects and call run or walk as you would expect.

However, things break down when you try to add an additional trait to the trait object bounds like:

This fails with the the following error:

And I found this surprising. In my mind, a trait object with multiple bounds would be analogous(类似的) to multiple inheritance in C++. I would expect the object to have multiple vpointers for each 'base', and do dispatch through the appropriate one. Given that rust is still a somewhat young language, I could appreciate why the developers might not want to introduce that complexity immediately (being stuck with a poor design forever would be a high cost for little reward), but I wanted to work out exactly how such a system might work (or not work).

在 C++ 中，一个类可以从多个基类继承，这就是所谓的多重继承。每个基类都有自己的虚函数表（vtable），子类对象会有多个虚指针（vpointer），每个虚指针指向一个基类的虚函数表。这样，当调用一个虚函数时，可以通过相应的虚指针找到正确的函数实现。

作者原本期望 Rust 中的 trait 对象在有多个边界时，也能有类似的机制。也就是说，每个 trait 对象会有多个虚函数表，每个表对应一个 trait，从而可以动态地调用正确的方法。

Vtables in rust

Like C++, dynamic dispatch(动态分配) is achieved in Rust though a table of function pointers (described here in the rust docs). According to that documentation, the memory layout of a Mammal trait object made from a Cat will consist of two pointers arranged like:

I was surprised to see that the data members of the object had an additional layer of indirection. This is unlike the (typical) C++ representation which would look this:

With the vtable pointer first and the data members immediately following.

The rust approach is interesting. It incurs（导致）a cost when 'constructing' a trait object, unlike the C++ approach in which a cast to a base pointer is free (or just some addition for multiple inheritance). But this cost is very minor. The rust approach has the benefit that an object does not have to store the vtable pointer if it is never used in a polymorphic context. I think it is fair to say that rust encourages the use of monomorphism, so this is probably a good trade-off.

Trait Objects with Multiple Bounds

Returning to the original problem, lets consider how it is resolved in C++. If we have multiple traits (purely abstract classes) that we implement for some structure, then an instance of that structure will have the following layout (e.x., Mammal and Clone):

Notice that we now have multiple vtable pointers, one for each base class Cat inherits from (that contains virtual functions). To convert a Cat* to a Mammal*, we don't need to do anything, but to convert a Cat* to a Clone*, the compiler will add 8 bytes (assuming sizeof(void*) == 8) to the this pointer. 这里说的比较关键，如果我们需要把一个Cat指针转变为Mammal指针，我们不用做任何事情，但是如果要把它变成一个Clone的指针，编译器就需要增加8字节给this指针。？？？？没搞懂

我懂了，多亏copilot

在 C++ 中，当一个类（如 Cat）从多个基类（如 Mammal 和 Clone）继承时，它会有多个虚函数表（vtable）指针，每个基类有一个。这些指针用于动态分派，即在运行时确定应该调用哪个函数。当你将 Cat*（指向 Cat 的指针）转换为 Mammal* 时，不需要做任何事情，因为 Mammal 是 Cat 的第一个基类，所以 Cat* 和 Mammal* 指向的是同一个地址。然而，当你将 Cat* 转换为 Clone* 时，就需要调整指针的值。这是因为 Clone 不是 Cat 的第一个基类，所以 Cat* 和 Clone* 指向的不是同一个地址。在这个例子中，编译器会将 this 指针的值增加 8 字节（假设 sizeof(void*) == 8），以便指向 Clone 的虚函数表。这就是为什么在转换 Cat* 到 Clone* 时会有特殊处理的原因。

It is easy to imagine a similar thing for rust:

So there are now two vtable pointers in the trait object. If the compiler needs to perform dynamic dispatch on a Mammal + Clone trait object, it can access the appropriate entry in the appropriate vtable and perform the call. Because rust does not (yet) support struct inheritance, the problem of determining the correct subobject to pass as self, does not exist. self will always be whatever is pointed at by the data pointer.

This seems like it would work well, but this approach also has some redundancy. We have multiple copies of the type's size, alignment, and drop pointer. We can eliminate this redundancy by combining the vtables. This is essentially what happens when you perform trait inheritance like:

Using trait inheritance in this way is a commonly suggested trick to get around the normal limitation of trait objects. The use of trait inheritance produces a single vtable without any redundancy. So the memory layout looks like:

Much simpler! And you can currently do this! Perhaps what we really want is for the compiler to generate a trait like this for us when we try to make a trait object with multiple bounds. But hold on, there are some significant limitations. Namely, you cannot convert a trait object of CloneMammal in to a trait object of Clone. This seems like very strange behavior, but it is not hard to see why such a conversion won't work.

Suppose you attempt to write something like:

Line 10 must fail to compile because the compiler cannot possibly find the appropriate vtable to put in the trait object. It only knows that the object being referenced implements CloneMammal, but it doesn't know which one. Of course, we can tell that it must be a Cat, but what if the code was something like:

The problem is more clear here. How can the compiler know what vtable to put in the trait object being constructed on line 17? If clone_mammal refers to a Cat, then it should be the Cat vtable for Clone. If it refers to a Dog then it should be the Dog vtable for Clone. So the trait-inheritance approach has this limitation. You cannot convert a trait object in to any other kind of trait object, even when the trait object you want is more specific than the one you already have. 下面这段我觉得是原文没有提醒的，就是虽然作者在这段说multiple vtable可以解决问题，但是rust采用的并不是这种方式，而是用的tarit-inheritance 提到的方法 The multiple vtable pointer approach seems like a good way forward to allowing trait objects with multiple bounds. It is trivial(容易的) to convert to a less-bounded trait object with that setup. The vtable the compiler should use is simply whatever is already Clone vtable pointer slot (the second pointer in diagram 4：

) 看懂了！！！

因为cat和dog的walk和run的实现是不一样的，如果没有把vtable分开那就必须事先知道clone_mammal的实际是cat还是dog才能找到合适的clone的vtable传入trait的object。但是这在编译时（compile time）是做不到的，然而如果采用有冗余的multiple vtable，这就是能做到的，和c++的实现方式类似，指针大小是固定的，能够找到clone的vtable

我把两张图对比着说，当你写： let clone: &Clone = &clone_mammal;

走到这里的时候，&clone_mammal可以看做是指向的data pointer的指针（因为他是trait object的第一个元素），然后对于左图，你必须进入vtable pointer，找到它指向的内容，从中摘出clone trait，但是这不可能，它指向的内容取决于这个clone_mammal到底是什么，然而这在compile time无法决定，所以做不到。而右边，不管clone_mammal到达是什么，我们能找到正确的clone的虚函数表指针都一定是：当前&clone_mammal+16字节（假定8字节为一个指针大小），而不需要知道实际上这个对象是什么。