DOD generally implies a high level of coupling on the system. That's it's biggest weakness. It also pushes even more data management problems onto programmers which makes it easy to get things wrong.
You take those downsides and you trade them for higher performance.
Now, that doesn't mean that you can't have hybrid systems and get most of the benefits of both worlds. It does, however, mean that you will end up with a more complex system to manage and maintain.
A simple OO system is relatively easy to craft without a whole lot of ceremony. That's the prime benefit of OO.
It's not a weakness to integrate more information about the problem space into the solution. It's engineering. To _ignore_ information about the problem space is certainly a weakness, and is partly responsible why the Windows boot time appears to be a cosmological constant.
> It's not a weakness to integrate more information about the problem space into the solution. It's engineering.
You want your solution to depend on the right abstract model of the problem, not on the particulars of the problem.
Otherwise your solution is difficult to extend, and is generally expensive and error prone to change when your problem changes, which happens with 100% probability.
If a well established abstraction solves the problem, then that's just a particular known about the solution space.
If your data changes, your problem changes. We only ever solve particular problems, given the distribution, shape and density characteristics of input data.
Lets say you have "struct Point {int x; int y;};", and all things are fine and dandy. You write your OOP as such (or your DOD version), and things are going great.
Suddenly, you want "struct Point3D {int x; int y; int z;}". How would you change the code?
In OOP, maybe you'd compose a Point3D as such: struct Point3D { Point xy; int z; }; You can now "capture" all of the Point xy; functionality, and extended it to a 3rd "z" coordinate just fine. All old Point code still works, and you have a few functions that work with Point3D now (maybe with a bit of redundant wrapper functions, but such code is straight forward)
The equivalent in DOD seems less obvious. You'd have to make a new "z" array. But the arrays of Point and Point3d wouldn't line up anymore. (If you had 300 Points, and 20 Point3Ds, is Point#315 a Point3D, or is it a Point? Do you "pad" the "z" arrays to ensure that the struct-of-arrays line up?)
There's no obvious way to benefit from a "z-vector", extending the functionality off of your Point to Point3D. I think the DOD design would create a 3rd "z-vector", and shove a "dummy value" into z for all of the points, to make sure all the vectors would have an ideal set of indexes.
Which demonstrates the coupling factor: its not really easy to compose objects in DOD. While in OOP, composing and inheriting classes and objects is the very point.
You wouldn't have to mingle them if they're different types, in fact that would probably be one of the worst design decisions you could make.
In the DOD version you'd create a new record containing 3 arrays instead of 2 (x,y,z) and write or extend your functions to support this new record type.
In the OOP version (NB: blech, it's not OOP, it's just records) you'd also need new functions to support the new type.
There are definitely benefits and drawbacks to both approaches, but what you've written is not a drawback for structure-of-arrays or a benefit for array-of-structures.
> In the DOD version you'd create a new record containing 3 arrays instead of 2 (x,y,z) and write or extend your functions to support this new record type.
But now you're violating DRY: don't repeat yourself. Bugfixes found in Point3D will have to be manually "ported" to Point functions (and vice versa).
> In the OOP version (NB: blech, it's not OOP, it's just records) you'd also need new functions to support the new type.
Those new functions can largely be written as foo3D(Point3D a, Point3D b) {return foo(a.xy, b.xy); }.
DRY is kept, your bugs fixed in foo will automatically apply to foo3D. If you use inheritance (which... probably would be a mistake... but its possible and an "OOP" solution), your Foo functions would automatically work on Point3d.
EDIT: Hmm... maybe Point and Point3D keep the Liskov Substitution Principle and therefore work for inheritance. Maybe inheritance is a valid solution in this case?
So your OOP version creates many layers which impedes understanding and extension as well.
DRY is also not a law which must always be followed. You often don't even know when you're repeating yourself until you do, and the first time you see a repetition is not when you should eliminate it (in most cases). You often see people reference the rule of three here.
The first repetition should make you pay attention. The second lets you know that it's probably (not necessarily) time to refactor. Premature DRY is as bad as premature optimization.
If you don't care about DRY in this case, that's fine. There's more than one problem with your proposed DOD solution.
Your proposed solution violates the open-closed principle. Which means you have to modify "old working code" to get anything done.
If your "solutions" to fixing code issues is "rewrite and extend the old code to cover the new case", then it becomes impossible to build libraries.
----------
Lets say ApplicationFoo has been written using Point2D, either using OOP or DOD principles. In the real world, ApplicationFoo is often written by another team outside of your direct control.
Your ApplicationBar wants Point3D, and realizes that Point2D largely implements the functionality you want (2d-distances, line-drawing, intersections, etc. etc.)
Your solution forces you to rewrite Point2D, which therefore creates a change in ApplicationFoo. (Or more likely, ApplicationFoo is going to refuse to update to the new library: ApplicationFoo is now stuck on Point2D 1.0, while ApplicationBar is using Point2D 1.1. And now your organization is no longer sharing code effectively).
> Your solution forces you to rewrite Point2D, which therefore creates a change in ApplicationFoo.
I don’t see how this is the case. You don’t change Point2D at all, you simply add a second type Point3D. ApplicationFoo can happily continue to use Point2D from your library while you use Point3D.
I think the example of Point2D and Point3D is not very well chosen. Your proposed solution of Point3D = { xy : Point2D, z : Float } is also leads to lots of duplication in the sense that (for example) to add two of these points you would have to do
{ xy = add2D(xy1, xy2), z = z1 + z2 }
where the computation of `z` is a copy of the computations of `x` and `y` in Point2D. So should you detect an error in Point2D you might still have to manually port it to Point3D.
For more complicated operations on points, the maths is mostly completely different in 2D and 3D (the same concepts might not even make sense, especially if you don’t generalize to n dimension right away), so sharing of code between the two points seems difficult.
The main problem with this solution, I think, is that you picked one specific projection from 3D space to 2D space. It might line up with a projection that is relevant to your domain but it probably won’t. So if you use inheritance and call a function requiring a Point2D with one of your Point3D’s you secretly project the point to the xy-plane which seems like something that should be explicit in the code.
> I don’t see how this is the case. You don’t change Point2D at all, you simply add a second type Point3D. ApplicationFoo can happily continue to use Point2D from your library while you use Point3D.
Point2D, under DOD, doesn't exist. There's "array_x" and "array_y". See the article under question. Point2D would be the index into the "x" and "y" arrays, or an "id" returned by the ECS system.
The "x" and "y" fields, under DOD, are dispersed into different areas, allegedly for SIMD / auto-vectorization benefits. I'm trying to discuss the effects of a decision like this.
> I think the example of Point2D and Point3D is not very well chosen.
There are multiple users who understand software engineering who disagree with me, but understand what I'm trying to discuss.
There are also multiple users who prefer to be pedantic and focus on this issue. For the most part, I'm able to ignore these unimaginative users pretty easily. So the example is working pretty well as a filter.
If you wish for more people to participate in the discussion without getting distracted, maybe a better example would have been recycling the example from the article:
Doesn't really matter. There are plenty of data / classes where you need to just "add one more parameter" to a previously created class to make it perfectly work. The above "GalaxianEnemy" and "FastEnemy" classes follow this pattern.
---------
But if we focus on this pedantry, we wouldn't be able to discuss software engineering. Surely you've come across an example in your programming life where code-reuse would be useful with a subset of parameters, but needed to be extended into an additional parameter?
> The main problem with this solution, I think, is that you picked one specific projection from 3D space to 2D space. It might line up with a projection that is relevant to your domain but it probably won’t. So if you use inheritance and call a function requiring a Point2D with one of your Point3D’s you secretly project the point to the xy-plane which seems like something that should be explicit in the code.
Is it so hard to imagine a video game, where my Point2D and Point3D interpretation is in fact correct? The point of the discussion is to point out software engineering principles: recycle code where possible, open-to-extension but closed-to-modification, and other such principles.
I think you are dramatically underestimating the differences between operations 2D and 3D space. There is zero chance you would want to re-use the vast majority of your 2D code for 3D points.
A popular one is to group your entities by "archetype"- you have two arrays of Point2Ds, one for those without a Z coordinate and one for those with a Z coordinate. Now if you want all Point2Ds, you loop over both arrays; if you want all Point3Ds, you just loop over one. (This generalizes cleanly to larger numbers of "components.")
In some ways, this is actually quite a bit easier to use than OOP composition+inheritance. You don't need to build your compositions up front as classes or types- you just build them directly as values, throw them all into your archetype system that manages the arrays, and query them however you like. It feels similar to a relational database.
> A popular one is to group your entities by "archetype"- you have two arrays of Point2Ds, one for those without a Z coordinate and one for those with a Z coordinate. Now if you want all Point2Ds, you loop over both arrays; if you want all Point3Ds, you just loop over one. (This generalizes cleanly to larger numbers of "components.")
But now your Point2D code needs to be aware of the Point3D code. Violating the open-closed principle (which is an OOP principle, but one that's extremely useful in software engineering).
You NEVER want to change working code. That's the fundamental basis of the open-closed principle. Figuring out how to extend the code to new functionality ("open to extension"), WITHOUT risking any breaks on old code ("closed to modification"), is core to software engineering principles.
Any "solution" that relies upon changing Point2D is null-and-void to my software engineering brain. You cannot build libraries or reusable code if your solution is "rewrite the old code to support new functionality".
> But now your Point2D code needs to be aware of the Point3D code.
Not at all! You write the archetype management code once, and then the Point2D code just asks it for Point2Ds, nothing else, and that doesn't change no matter how you use Point2D elsewhere.
I'm having difficulty understanding how your proposal works.
In the beginning, there was one struct-of-arrays, the Point2D x and y arrays.
int Point2D_x[];
int Point2D_y[];
We also have a number of functions that use these two arrays. Point2D_foo(Point2D_index), Point2D_bar(Point2D_index).
Then, later, we discovered the need of Point3D code. Leading to the creation of one more struct-of-arrays (or 3-more arrays).
int Point3D_x[];
int Point3D_y[];
int Point3D_z[];
Point3D_foo(), Point3D_bar(). Theres a 3rd function, Point3D_baz() that is specific to 3d code, but foo() and bar() are defined to be identical as the 2d versions. I haven't really figured out what parameters these 3d-versions of foo, bar, and baz would be like.
All "old" Point2D code was written only knowing about the first two arrays (Point2D_x and Point2D_y). The new Point3D code can be written knowing about all 5 arrays.
But I'm having difficulty seeing how you extend the old Point2D code to work on the new Point3d Arrays in the DOD case.
You don't just have bare arrays sitting there. You have a separate part of your program that is responsible for managing them, playing a role similar to a relational database.
Now, to be clear, DOD/ECS/whatever doesn't mean "change all your Point2D methods to take an index instead of a `this` pointer." That gains you nothing on its own. If your `foo` and `bar` are just working with individual Point2Ds, then you can just write them that way- hand them a `Point2D* p` or `int* x, int* y` or something.
It's when you're working with large collections of these things, which exist as pieces of larger (conceptual) entities, that the arrays become important. And you can't know what that looks like until you have a particular operation in mind- are you stepping a physics simulation, or computing a shortest path, etc?
Now the trick is to write those larger-scale bulk operations in terms of queries, instead of direct access to the arrays. If your pathfinding algorithm is 2D, you can run it on all Point2Ds with a query for x and y components. If you later add a z component to some of your entities that already have x and y components, the pathfinding algorithm will keep working and just ignore the z component.
Hmm, I'll admit that this is the first time I've heard of the ECS pattern. So I'm not too familiar with your argument. From your description, and from various links discussing the pattern, it seems like an adequate solution to the problem I posed.
I would argue that an ECS is quite far removed from what the original article was talking about however. But perhaps the original article was too superficial in its discussion of DOD.
Excuse me? If it's called Point2D, it must have two coordinates. The type should be called Point and it should have an arbitrary number of coordinates (which seems a bit silly, since a Point2D and a Point3D are different types, as are points and vectors that have exactly the same representation)
I'm just running with the original example, contrived as it may be. Nobody here is actually talking about anything specific to points of any dimension.
Is there any particular case in which you'd want Point and Point3Ds to be indexed interchangably in the same array? This is just as cumbersome in the array-of-structs case: some structs are larger than others in the same array, you need some signalling mechanism to know which structs are and aren't Point3Ds (lest you invite the wrath of your optimizer), etc. In Rust the AOS case would be an Enum of Point and Point3D, and the SOA case would be two Vecs of ints and one Vec of Option<int>s.
More generally, this sounds like a language support problem: most languages are built around arrays of structs because it's easier to codegen for. If you're composing structures, just treat the composed-over struct as one very wide field and continue counting your offsets for the next fields down. Need to reference a struct? Just take a pointer to it and all the fields are a fixed offset. Want a bunch of the same struct? Put them next to one another in memory and you can go from one to the next by just bumping a pointer.
What we need is some sort of language support for "intrusive arrays", where you can declare a struct, create an intrusive array of it, and that array is laid out as structs of arrays. When you request an intrusive array, the language would need to recursively construct intrusive arrays for each non-primitive field (e.g. your IntrusiveVec<Point3D> would hold IntrusiveVec<Point> plus Vec<int>, which flattens down to 3 Vec<Int>s). References would consist of a pointer to the whole array plus an index, and methods on structs (er, "classes", in this case) would need to accept this new pointer-to-intrusive-vector format.
> This is just as cumbersome in the array-of-structs case: some structs are larger than others in the same array, you need some signalling mechanism to know which structs are and aren't Point3Ds
Strong disagree. The traditional OOP approach is passing indirect pointers to everything and incurring an inefficient level of indirection.
Ex:
vector<Point*> blah;
blah.push_back(&somePoint2D);
blah.push_back(&somePoint3D.xy);
foo(blah); // "Function foo" works on Point2D and Point3Ds, none the wiser
// Note that foo may change *blah[10].x += 5
// and it updates the original Point2D or Point3D. This
// wouldn't work for a copied array.
I think OOP's inefficiency is well known and often criticized. Critics are correct: OOP methodologies are inefficient (compared to other techniques). But OOP seems to have a win on various software-engineering metrics: Extendability, Open-Closed, DRY, etc. etc.
> In Rust the AOS case would be an Enum of Point and Point3D, and the SOA case would be two Vecs of ints and one Vec of Option<int>s.
Rust is a general purpose language. I'm not too good at Rust, but the AOS case would simply be a Box(Point2D).
> More generally, this sounds like a language support problem <snip>
ISPC got SOA types by the way. Check em out if you're interested:
You don't need a very large SOA-type to take advantage of SIMD or auto-vectorization. You can "gather" into an SOA, then "scatter" back to your main representation.
Right, but your example isn't an array of structs anymore. It's an array of pointers. You are correct that it's easier to take an array of structs and turn it into an array of pointers, or to take a single struct and get a pointer to one of it's members, whereas a struct of arrays would have to be scattered in order to get anything referenceable.
ISPC sounds pretty cool.
How much of a performance penalty is scatter/gather into and out of an SOA? I assumed you'd want to store all of your data-parallel fields in SOA format, rather than gathering into one every time you wanna do some maths-heavy operation.
It's O(n) to gather and to scatter. So it's definitely expensive if you are doing only a single pass.
But if you are doing any big o bigger than O(n) (ex: matrix multiplication, which is O(n^2.4) or so), then it's relatively cheap. Indeed, optimized matrix multiplications transpose the data to speed it up!!
There is something to be said about gather and scatter patterns as a methodology, especially as a first step into optimizing your algorithms. It's not the ultimate speed form, but relatively straightforward and simple to implement.
It should also be noted that in ISPC, CUDA, ROCm, OpenCL (etc. etc.), your thread-local variables are compiled into SOA form for maximum performance.
SIMD architectures benefit more from SOA, and as such... making maximum use of SOA is built into those languages implicitly. If you're finding yourself reaching for SIMD-techniques, maybe its time to use a dedicated SIMD-language.
From what I've seen in DOD systems, there are generally common structures that hold data for the entire application. The entire application has to know about these structures in order to function.
Take ECS [1] as an example.
In order to create a new entity on the system you might need to talk to a location component, a health component, a velocity component all to register a brand new entity on the system. Now, you might have an entity creation service that hides those creation details from you, and that's fine. But you need to make sure as you are talking to each of the component you are doing the right thing.
This is the coupling.
The traditional OO approach here would be to create a new entity with health, position, velocity, all contained within the same object and potentially referencable via a common interface for each.
Now, for ECS in a game there are definite benefits to this coupling. For example, it is a LOT easier in games to create universal systems which handle the physics for all objects irrespective of object type. Further, composing behavior onto entities can be much easier. You are free from a strict hierarchy. There's a reason this approach is popular there.
Now consider a very simple rest webservice. Now imagine trying to do that as an ECS system. You might have one component that is the header component and one component that is the body component and an entity that contains both. Now imagine processing that entity through the system. Who would be in charge of making sure all the right interactions happen at the right time? Who would be in charge of responding? Who would be in charge of deleting requests that had been fully handled? I'm sure that's all possible (I wonder if anyone has tried it? could be a fun side project) I'm also sure I'd very likely not want to deal with such a system, no matter how fast it is. It would be unreasonable to try and figure out all the life cycles of everything.
For business applications with a lot of rules to follow, this sort of system would be nightmarish to maintain. It'd be almost impossible to track down what is changing what. You need a lot of gatekeepers because the goal of business apps isn't to enable novel and unexpected interactions, but rather to very explicitly define what happens when and why. For a game, those novel and unexpected interactions are a huge boon and a feature. They make games fun!
As a side note, the coupling of ECS systems is one thing that makes it hard to properly thread games. That's because you've got this globally shared mutable state being accessed through a ton of systems throughout the game. It's frankly impressive that games are threaded at all!
https://youtu.be/W3aieHjyNvw I think this talk on how an ECS actually reduced coupling in the case of Overwatch would be a good concrete real world example to look at. It enabled some architectural benefits for implementing replay cameras or for networking code in their game through actual reduced coupling between input ->commands and commands->simulation.
I don't know about DOD in general, but something seems fundamentally wrong with your understanding of ECS -- it conflicts with almost all of the reading I've done on it so far (though I can't say much about it in practice)
> But you need to make sure as you are talking to each of the component you are doing the right thing.
I'm not clear on "doing the right thing". If you're looking at eg specs, at creation-time the only thing you're doing is constructing the entity correctly (adding values to the relevant arrays) with valid values. There's no real hooking into anything, or any intelligent behavior really, since you're just treating the entity as nothing more than the values its composed of.
The systems just pick it up "magically", and treat it as it would any other entity -- it doesn't need to know anything really about the new entity, except that all of the necessary components have been slotted, before executing. And if they aren't.. it won't execute, at least not for that entity.
Which I think is almost by definition de-coupled. The system knows nothing about the entity, and the entity knows nothing about the system -- the system just looks for anything that fulfills the correct set of components, and the entity just looks to fulfill all components needed to describe it. What behavior that will lead to... the entity itself has no idea. And of course, new systems and new entities (and new components) can be plugged in without knowing anything about the others.
>As a side note, the coupling of ECS systems is one thing that makes it hard to properly thread games. That's because you've got this globally shared mutable state being accessed through a ton of systems throughout the game. It's frankly impressive that games are threaded at all!
Something seems very wrong here -- bevy, specs, legion etc (rust ecs frameworks) all basically suggest multithreading comes for almost free, because you're being very clear about what you're accessing and when (the systems are defined with the components they rely on, and have access to). So they can represent systems as a dependency graph, and parallelize independent systems appropriately. eg https://specs.amethyst.rs/docs/tutorials/09_parallel_join.ht...
The problem is really the other way around -- since its parallelized for you, the difficulty is in defining an ordered sequence of events (when needed); I believe the simple solution is generally adding a component that's really just a flag, and the more general solution is utilizing event queues and event chaining
specs doesn't iterate the array itself in parallel though, at least not automatically, but there's nothing stopping you (since it's already locked the array on your behalf), hence it provides par_iter
Entities aren’t coupled because they exist as a concept one level higher. Similarly in the GameObject and MonoBehavior model from Unity the GameObject isn’t much more concrete than an ID since it’s really just a container.
The trouble ECS can get into is where you want a set of components to relate to an Entity but don’t want one or more of the Systems that update that set to run. You end up with more complex System definitions that need to exclude Entities based on extra Component criteria. It gets quite messy to work out looking at the Components that make up an Entity what Systems will run on it.
You take those downsides and you trade them for higher performance.
Now, that doesn't mean that you can't have hybrid systems and get most of the benefits of both worlds. It does, however, mean that you will end up with a more complex system to manage and maintain.
A simple OO system is relatively easy to craft without a whole lot of ceremony. That's the prime benefit of OO.