Embed is in C23

quelsolaar · on July 23, 2022

I represent Sweden in the ISO WG14, and I voted for the inclusion of Embed in to C23. Its a good feature. But its not a necessary feature and I think JeanHeyd is wrong in his criticism of the pace of wg14 work. I have found everyone in wg14 to be very hardworking and serious about their work.

Cs main strengthen is its portability and simplicity. Therefore we should be very conservative, and not add anything quickly. There are plenty of languages to choose form if you want a "modern" language with lots of conveniences. If you want a truly portable language there is really only C. And when I say truly, I mean for platforms without file systems, or operating systems or where bytes aren't 8 bits, that doesn't use ASCI or Unicode, where NULL isn't on address 0 and so on.

We are the stewards of this, and the work we put in, while large, is tiny compared to the impact we have. Any change we makes, needs to be addressed by every compiler maintainer. There are millions of lines of code that depend on every part of the standard. A 1% performance loss is millions of tons of CO2 released, and billions in added hardware and energy costs.

In this privileged position, we have to be very mindful of the concerns of our users, and take the time too look at every corner case in detail before adding any new features. If we add something, then people will depend on its behavior, no matter how bad, and we therefor will have great difficulty in fixing it in the future without breaking our users work, so we have to get it right the first time.

ErikCorry · on July 23, 2022

> for platforms without file systems, or operating systems or where bytes aren't 8 bits, that doesn't use ASCI or Unicode, where NULL isn't on address 0 and so on.

This seems totally misconceived to me as a basis for standardizing a language in 2022. You are optimizing for the few at the expense of the many.

I get that these strange architectures need a language. Why does it have to be C or C++? They can use a nonstandardized variant of C, but why hobble the language that is 99% used on normal hardware with misfeatures that are justified by trule obscure platforms.

quelsolaar · on July 23, 2022

It doesn't have to be C, but as of today there is no other option. No one is coming up with new languages with these kinds of features so C it is. People should, but language designers today are more interested in memory safety and clever syntax, than portability.

I would like to caution you against thinking that these weird platforms are old machines from the 60s that only run in museums. For instance many DSPs have 32bit bytes (smallest memory unit that can be individually addressed), so if you have a pair of new fancy noise canceling headphones, then its not unlikely you are wearing a platform like that on your head everyday.

mastax · on July 23, 2022

Unusual platforms like DSPs usually have specific (usually proprietary) toolchains. Why can't those platforms implement extensions to support 32-bit bytes? Why must everyone else support them? In practice ~no C code is portable to machines with 32-bit bytes. That's okay! You don't choose a DSP to run general purpose code. You choose it to run DSP code, usually written for a specific purpose, often in assembly.

quelsolaar · on July 23, 2022

"Weird" platforms often do have their own tool-chains but they do have the ability to leverage LLVM, MISRA, and an array of common tools and analyses that exists for C. One of the reason we got new platforms like RISC-V is that today its possible to use existing OSS software to build a platform with a working OS and Development environment, that common basic libraries can be built for is that all this software is written in C and can be targeted towards a new platform.

asojfdowgh · on July 24, 2022

doesn't LLVM only cover a smidge of the weird platforms, if that?

Like, it doesn't even cover ESP32 chips, despite attempts to get support into LLVM since 2019.

point being, they can't leverage LLVM

ErikCorry · on July 23, 2022

What is the relevance of RiscV here? Not weird at all. I feel like you skipped part of the argument.

saagarjha · on July 23, 2022

Ok here’s a concrete one, ARM is experimenting with a brand new CHERI implementation that throws out a lot of “obvious” things that are supposed to be true for pointers and integers. The only way this has been able to work is that the standard is flexible enough to let C(++) work here. Rust is getting breaking changes to support the platform.

tialaramex · on July 24, 2022

Unsurprisingly Rust can target Morello already but it comes at a performance penalty. Whereas C has myriad integer types for special purposes, Rust only has a pair - a signed and unsigned integer of the same size as pointers. Since on CHERI a pointer is a different size from an address, Rust pays the price of using a 128-bit integer (to store pointers) as its array index.

The potential change would make these types address-sized, so you can't fit a whole pointer in them on Morello where the pointer isn't just an address. On other platforms this change makes no practical difference.

gumby · on July 23, 2022

The point is that new exploration of the design space only works when there’s a familiar environment to build on. The old days of each architecture being its own hermetic environment are gone.

AdamH12113 · on July 23, 2022

Because C already does this, and has from the beginning. C was designed to be portable in an era where there were significant differences in fundamental CPU design decisions between platforms. C is widely used to write software for all kinds of weird platforms. Changing that would be far more work than just making a new language.

rcxdude · on July 24, 2022

They also tend to be non-standard for a variety of reasons anyway! C bends backwards to support odd architectures but is often also insufficient (or at least the vendors cannot justify the effort to achieve full compliance and their customers don't care significantly anyway).

ErikCorry · on July 23, 2022

Perhaps Carbon is the first in a series of new low level languages that free us from the impossible tensions of C/C++ having to be all things to all (low level) programmers.

I would love a new language for implementing high level languages. I've worked on several of these projects and we use mostly unstandardized dialects of C++ and it's really not fit for purpose.

nine_k · on July 23, 2022

While at it, I should mention Zig.

ErikCorry · on July 23, 2022

Does zig have as a selling point that it has _more_ UB than C?

dleslie · on July 23, 2022

AFAICT, Zig is halfway step between C and C++ for desktop and mobile developers.

I'm thrilled that the folks steering C are keeping the needs of embedded developers at the forefront.

messe · on July 24, 2022

> AFAICT, Zig is halfway step between C and C++ for desktop and mobile developers.

What does this even mean? Zig very much has embedded use as a target as well, with freestanding use being a first-class citizen. The majority of the standard library works without an OS (and if you wish, you can provide your own OS-like interfaces quite easily, and use the OS-dependent parts in a freestanding environment). I've written a UEFI bootloader in Zig, and right now I'm using it on an RP2040 as well, cross compiling from an M1 Mac without the need to install any additional cross compilers.

I'd argue that it might be even better for embedded than C/C++ eventually, as unlike them, allocation is even more tightly controlled, with a convention that only functions (or datastructures) that take an allocator as an argument will allocate memory on the heap. Future versions may even restrict recursion in certain contexts to guarantee max stack depths.

dleslie · on July 25, 2022

Cool, you've targeted ARM on a common SoC and a desktop PC. Now do sh2, sh4, m68k or even xtensa with Zig with a standard build.

They chose to build Zig on top of the worst compiler backend toolchain for embedded support.

messe · on July 25, 2022

Man, it hasn’t even fucking reached 1.0 yet. The fact that the default install can target multiple architectures (including C dependencies) is nothing short of impressive.

Does GCC target any of those in its standard build? Oh. Wait. It doesn’t. It builds for only one architecture at a time and if you want anything else you’ll need to recompile the compiler.

LLVM is even intended to be optional in future, and the Zig team is well on their way to making that a reality with the work on stage2.

What backend would you suggest for embedded support?

dleslie · on July 25, 2022

Zig is 6 years old, and years-old issues exist in their task tracker requesting support for some of these targets.

You _cannot_ simply recompile LLVM to support the targets I listed without using unsupported, risky patches from community sources; if any such patches exist. The multiple architecture support in one binary is a pointless feature when installing multiple GCC builds only takes a tiny, insignificant fraction of my development machine's disk space. Never in my life have I thought "Wow, I'd really speed up development if GCC was one enormous binary holding all possible targets."

Zig's best path forward is to support targeting C as a first tier target; which they seem to be interested in doing.

chrisseaton · on July 23, 2022

> It doesn't have to be C, but as of today there is no other option

Isn’t C99 an option? Why can’t more advanced things go into newer C and people who genuinely need something more basic can use C99.

quelsolaar · on July 23, 2022

We can! Many of us still use c89.(c99 has problems, like variable length arrays).

The reality however is that you cant escape never versions entirely. Not all code you interact with was written in the subset you want, so when your favorite OS or library starts using header files with newer features you need to run that version of the language too.

Another less appreciated detail, is that a lot of WG14 work is not about adding new features but clarifying how existing features are meant to work. When the text is clarified this gets back-ported to all previous versions of C in major compilers. An example of this is "provenance". This is a concept that implicitly been standard since the first ISO standard, but only now is becoming formalized. This means that if you want to adhere to the C89 standard, you will find a lot of clarifications about how things should work in the C23 standard.

kevin_thibedeau · on July 23, 2022

VLAs are optional since C11. There is no reason why a vendor can't support a modern language.

saagarjha · on July 23, 2022

VLAs are not the only thing added to C since 1999.

morelisp · on July 24, 2022

They are one of the few things added that require the target platform, rather than the compiler, to do something different. (And the other big things like that, like atomics, are similarly optional.)

duped · on July 23, 2022

If it were to focus on stability, it would probably be LLVM IR. That said, there's plenty of C++ being written for these applications. And Ada.

> so if you have a pair of new fancy noise canceling headphones, then its not unlikely you are wearing a platform like that on your head everyday.

Chip shortage aside, the likelihood of these devices using obscure hardware like discrete DSPs is going down as cheaper low power architectures are becoming commoditized.

astrange · on July 23, 2022

LLVM IR isn’t stable or even portable. It’s just a compiler IR, not a language.

duped · on July 23, 2022

Hence the qualifier, if it focused on stability. And IRs are languages. They look and quack like them and people treat them as such.

saagarjha · on July 23, 2022

And pay the price, there are tons of places that are stuck with some ossified “LLVM 3.2” toolchain or similar because they build their stuff on an unstable IR.

duped · on July 24, 2022

Yea and again, I said "if". We live on the else branch of that if statement.

My point is that a lot of this sentiment is treating C as the portable backend of a compiler because there is no portable front end. It holds us back in a lot of ways from iterating on systems languages in ways that are interesting and valuable.

Ideally there would be an IR with stable textual and binary format that can be compiled into the machine code for various ISAs and support the extensions necessary by silicon manufacturers for the exotic bits. I've used exotic tool chains for weird ISAs where the exotic bits are sugar added to a GCC front end, and it always feels wrong and limiting.

ryukoposting · on July 23, 2022

> This seems totally misconceived to me as a basis for standardizing a language in 2022. You are optimizing for the few at the expense of the many.

Sure, but it's the same line of reasoning that made C relevant in the first place, and keeps it relevant today - some library your dad wrote for a PDP-whatever is still usable today on your laptop running Windows 10.

Because it's antiquated, it's also extremely easy to support, and to port to new and/or exotic platforms.

ErikCorry · on July 23, 2022

The library my dad wrote (lol) for the PDP-11 is probably full of undefined behaviour and won't work now that optimizers are using any gap in the standard to miscompile code.

jolux · on July 23, 2022

> using any gap in the standard to miscompile code

For code to be miscompiled, there has to be a definition of what correctly compiling it would mean, and if there were, it would not be undefined behavior.

ErikCorry · on July 23, 2022

Instead of "miscompiled" you can read "Doesn't do what it did on the PDP-11 with the compilers of the time".

xg15 · on July 23, 2022

Yeah, but if that definition is constantly shifting, you cannot expect it to work with existing codebases.

jolux · on July 23, 2022

Well yeah — therein lies the problem with a language with pervasive undefined behavior.

temac · on July 23, 2022

The standard doesn't do that often, but it does sometimes. E.g. realloc to null which was previously defined, and is now UB :(

ErikCorry · on July 23, 2022

We are taking about code written before the standard so every bit of UB in the standard is in play here.

Eg the fact that overflowing a signed int can cause the compiler to go amuck would certainly be a surprise to the person who wrote code for the PDP-11.

flqn · on July 23, 2022

What a useless and jaded assumption that code written in the past is bad.

jolux · on July 23, 2022

The assumption being made here is "any useful C program relies on undefined behavior" which is pretty much true.

ErikCorry · on July 23, 2022

Yes and I'm sure it's doubly true of code that was written before the C standards were written.

pantalaimon · on July 24, 2022

That sounds like a strong take - to you have examples?

Compiling with -Wall -Werror is pretty much standard those days.

morelisp · on July 24, 2022

-Wall -Werror is mostly designed to catch dangerous but totally well-defined idioms, not UB. It doesn't warn on every signed arithmetic operation or unchecked array access, for example.

jolux · on July 24, 2022

See "useful" — it may not be not quite as strong as you're thinking. It may be possible to write a minimal C program without UB, but I'm thinking of larger programs, more than a few hundred lines. Common UB includes: array access out of bounds, dereferencing a null pointer, use after free, use of uninitialized variables. -Wall -Werror can catch some instances of some types of UB, and runtime libraries like UBSan can catch more. But they're not exhaustive.

ErikCorry · on July 23, 2022

I certainly didn't say it was bad. Just that it went outside the boundaries of a standard that was written 25 years later.

raverbashing · on July 23, 2022

> PDP-whatever is still usable today on your laptop running Windows 10

No, it isn't. Go on. Go ahead and try

See it break in a million weird ways. (Or, for a start, it will have the K&R C format, which is a pain to maintain)

"If your computer doesn't have 8-bit bytes" at this day and age? It belongs in a dumpster, sorry.

(I think the only "modern" arch that does this is PIC, and even only for program data - where you're not running anything "officially" C89 or later)

icedchai · on July 23, 2022

When I first learned C, it was K&R, pre-ANSI with old style function parameters. It is trivial to convert to ANSI C. The truth is C has barely changed in decades.

ryukoposting · on July 25, 2022

You should take a look at plan9port - a bunch of userspace tools from Plan 9, carefully ported to Linux with few changes. Maybe that's not PDP-whatever, but it is Sun-whatever. Either way, it's code that was written decades ago for a dead architecture.

skrebbel · on July 23, 2022

C is pretty much the only language in common use for programming microcontrollers. Microntrollers seldomly have filesystems. To break the language on systems without filesystems or terminals means to break the software of pretty much every electronics manufacturer out there.

varajelle · on July 23, 2022

Thinking of it, JavaScript is a language that target mainly browser, which also doesn't have a filesystem.

dleslie · on July 23, 2022

Web requests are its filesystem.

iainmerrick · on July 24, 2022

Sure, but you don’t typically run the compiler on the microcontroller! It’s the host that needs a filesystem, not the target.

skrebbel · on July 24, 2022

Of course, I can't see a reason why #embed wouldn't be useful for microcontrollers. In fact, I imagine it's a key target market for a feature like that, resource managers are complex and tools like bin2c have always felt like a terrible back.

I was solely replying to the commenter who said that all reasonable modern systems have filesystems so I put one in for the embedded software developers.

ithkuil · on July 23, 2022

It may have no filesystem but it's extremely likely it has 8 bit bytes

josephcsible · on July 24, 2022

CHAR_BIT != 8 on a lot of systems: https://stackoverflow.com/q/2098149/7509065

ithkuil · on July 24, 2022

Yeah, if the number of platforms using an obscure corner of the spec were exactly zero, we wouldn't be talking about whether the niche size is worth the hassle.

My point was that this niche doesn't cover the entirety of the embedded / IoT space.

varajelle · on July 23, 2022

But you don't run the compiler on a computer without a file system. How would #include works otherwise?

makapuf · on July 24, 2022

This feature is precisely very useful when you don't have a filesystem to read data from at runtime. Embedding it to the binary to flash it is so much simpler with this.

gumby · on July 23, 2022

As the GP post comments, if you want those features there are plenty of other languages to choose from.

I don’t even like programming in C but I respect what the committee is trying to do, and yes I do sometimes write C code.

Gibbon1 · on July 24, 2022

I'll flip that around if you want to serve on a language standards commit there are a lot of other languages to choose from. Why be on the C standards committee with the express purpose of blocking progress?

gumby · on July 24, 2022

Because it's one of the very few -- almost only -- languages with this objective.

Gibbon1 · on July 24, 2022

Whose objective? The people that use it or the people that sit on the standards committee?

nine_k · on July 23, 2022

I would say that one should be pretty cautious when baking in assumptions snouty such a fleeting thing as hardware into such a lasting thing as a language.

C itself carries a lot of assumptions about computer architecture from the PDP-9 / PDP-11 era, and this does hold current hardware back a bit: see how well the cool nonstandard and fast Cell CPU fared.

A language standard should assume as little about the hardware as possible, while also, ideally, allowing to describe properties of the hardware somehow. C tries hard, but the problem is not easy at all.

uecker · on July 23, 2022

Can you explain what aspect of C from PDP-11 was problematic for Cell?

nine_k · on July 23, 2022

All memory is uniform, for instance. There is one scalar data processing unit that finishes a previous operation and then issues the next: no way to naturally describe SIMD, for instance. No way to speak about asynchronous things that happen on a Cell CPU all the time, as much as I can judge. (I never programmed it, but I remember that people who did said they had to use assembly extensively.)

OTOH you can write stuff like `*src++ = *dst++`, and it would neatly compile into something like `movb (R1)+, (R2)+`, a single opcode on a PDP-11.

stephencanon · on July 24, 2022

It’s worse—-almost all of them already use a nonstandard variant of C. The committee is bending over backwards to accommodate them, but they literally _do not care what the standard says_, so this doesn’t even benefit them. Most will keep using a busted C89 toolchain with a haphazard mix of extensions no matter what the standard does.

dotopotoro · on Aug 6, 2022

Even if they fork the language, standard still provides the common baseline for all, which is useful.

beached_whale · on July 24, 2022

This is purely compiler side and usually those esoteric hosts are not running the compiler, being cross compiled but cross compiled, aren't they?

Gibbon1 · on July 24, 2022

Well and studiously not talking to the few about their actual needs.

morelisp · on July 23, 2022

This reasoning has always rung mostly hollow for compiler features (#embed, typeof) rather than true language features (VLAs, closures).

Modern toolchains must exist for marginal systems. It's understandable to want to write code for a machine from 1975, or a bespoke MCU, on a modern Thinkpad. It is not necessary to support a modern compiler running on the machine from 1975 / bespoke MCU. You might as well argue against readable diagnostic messages because some system out there might not be able to print them!

tialaramex · on July 23, 2022

I could also see this, though perhaps it's a step too far for C, applying to Unicode encoding of source files.

The 1970s mainframe this program will run on has no idea that Unicode exists. Fine. But, the compiler I'm using, which must have been written in the future after this was standardised, definitely does know that Unicode exists. So let's just agree that the program's source code is always UTF-8 and have done with it.

Jason Turner has a talk where the big reveal is, the reason the slides were all retro-looking was that they were rendered in real time on a Commodore 64. The program to do that was written in modern C++ and obviously can't be compiled on a Commodore 64 but it doesn't need to be, the C64 just needs to run the program.

morelisp · on July 23, 2022

This seems a step too far for me. Compatibility with existing source files which may not be trivial to migrate does also matter. (Well, except for `auto`, C23 was right to fuck with that.) At the very least you'll need flags that mean "do whatever you did before".

tialaramex · on July 23, 2022

Sure, I don't seriously expect C to embrace that, even though I think it'd be worth the effort I'm sure plenty of their users don't.

For auto I think the argument is that if you poke around in real software the storage specifier was basically never used because it's redundant. That's the rationale WG21 had to abolish its earlier meaning in C++ before adding type deducing auto.

As I read it, N2368 (which I think is what they took?) gives C something more similar to the type inference found in many languages today (which gives you a diagnostic if it can't infer a unique type from available information) whereas C++ got deduction which will choose a type when ambiguous, increasing the chance that a maintenance programmer misunderstands the type of the auto variable.

However it got inference from return, which I think is a misfeature (although I think I can see why they took it, to make generics nicer). With inference from return, to figure out what foo(bar)'s type is, I need to read the implementation of foo because I have to find out what the return statements look like. It's more common today to decide we should know from the function's signature.

This is somewhat mitigated by the fact that N2368 says auto won't work in extern context, so we can't just blithely say "This object file totally has a function which returns something and you should figure out what type that is" because that's clearly nonsense. You will have the source code with the return statements in it.

uecker · on July 23, 2022

We took N3007 which does not have inference on return etc. https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3007.htm

tialaramex · on July 23, 2022

Ah, great. I don't write very much C any more, but the auto described in N3007 (well, the skim of N3007 I just did) feels very much like what I'd want from this feature in C and perhaps more importantly, what I'd assume auto does if I see it in a snippet of somebody else's code I'm trying to understand.

hi_herbert · on July 23, 2022

Or how to make everyone life worse for the few weirdos that don't use a LSP/IDE. The type of auto on the return of a function can and is automatically shown e.g. https://resources.jetbrains.com/help/img/rider/2022.1/inlay_... With this kind of non-consideration of the developer comfort, it is clear C is an obscolete language.

saagarjha · on July 23, 2022

C++ got auto because C++11 introduced types that could not be named, and this had to be inferred using auto.

xg15 · on July 23, 2022

> And when I say truly, I mean for platforms without file systems

Are we're really talking about compiling on such platforms? And if that's the case, how would #include work but not #embed?

quelsolaar · on July 23, 2022

No, I'm mainly talking about targeting. My point is not so much about embed, but rather that, almost anything you assume you think you know about how computers work isn't necessarily true, because C targets such a wide group of platforms. Almost always when some one raises a question along the line of "No platform has ever done that right?", some one knows of a platform that has done that, and it turns out has very good reasons for doing that.

For this reason, everything is much more complicated then you first think. For me joining the WG14 has been an amazing opportunity to learn the depths of the language. C is not big but it is incredibly deep. The answer to "Why does C not just do X?" is almost always far more complicated and thought through than the one thinks.

Everyone in the wg14 who has been around for a while, knows this, and therefore assumes that even the simplest addition will cause problems, even if they cant come up with a reason why.

xg15 · on July 23, 2022

Yeah, but then I have to side with the author - how could a compile time only feature which doesn't even introduce new language semantics possibly be affected by the multitude of build targets?

Unless "it's more complicated than you think" is the catchall answer to any and all proposals for new language features. In which case, how to make progress at all?

Also, I find the point about the language being "truly portable" a bit ironic, considering the whole rationale of #embed was that the use case of "embed large chunks of binary data in the executable" was completely non-portable and required adding significant complexity to the build scripts if you were targeting multiple platforms.

It's easy to make a language portable on paper if you simply declare the non-portable parts to not be your responsibility.

> Everyone in the wg14 who has been around for a while, knows this, and therefore assumes that even the simplest addition will cause problems, even if they cant come up with a reason why.

That's not something to be proud of.

quelsolaar · on July 23, 2022

> That's not something to be proud of.

Its learning from old mistakes.

Look at embed as an example. Look how complex it is, dealing with empty files, different ways of opening files, files without lengths, null termination... the list goes on. This is typical of a proposal for C, it starts out simple "why cant i just embed a file in to my code?" and then it gets complicated because the world is complicated.

I worry a lot about people loading in text files and forgetting to add null termination to embeds. I would not be surprised if in a few years that provides a big headline on Hacker news, about how that shot someone in the foot and how C isn't to be trusted. The details matter.

iainmerrick · on July 24, 2022

I worry a lot about people loading in text files and forgetting to add null termination to embeds.

If you really worry about that, why did you vote in favour of this feature (as you stated earlier)?

nindalf · on July 24, 2022

He’s just trying to justify why it took 5 years to approve a feature that is so similar to #include.

rurban · on July 24, 2022

null termination is not to be added to embed. embed adds a const sized buffer of unsigned bytes, not strings. files are not strings, files do contain \0.

and I still don't get why embed is so much better than xxd included buffers. it's more convenient sure, but 10x faster?

marssaxman · on July 24, 2022

#embed is faster because it skips the parsing step. xxd generates source code tokens, which the parser must then decode back into bytes; embed just reads the bytes directly.

riking · on July 25, 2022

Multiple big-O steps faster.

account42 · on July 25, 2022

That's not how big-O works.

duped · on July 23, 2022

> I worry a lot about people loading in text files and forgetting to add null termination to embeds. I would not be surprised if in a few years that provides a big headline on Hacker news, about how that shot someone in the foot and how C isn't to be trusted. The details matter.

The compiler should insert the null terminator if it's not in the embedded file.

quelsolaar · on July 23, 2022

This is another issue here. If loads of compilers start doing this then programs start relying on it an then it becomes a de-facto undocumented feature. That means if you move compilers/platforms you get new issues. A lot of what the C standard does is mopping up these kinds of issues.

duped · on July 23, 2022

Then require compilers implement it in the standard. I think it's really backwards to ignore the tool chain and its ability to prevent bugs from entering software.

It's stuff like this that leaves us writing C to rely on implementation defined behavior. Under specification that leaves easy holes to fill will be filled by the compiler and we will rely on them. Just like type punning.

quelsolaar · on July 23, 2022

This is the problem. Things get complicated fast. If we mandate null termination, then its impossible to have multiple embeds in a row to concatenate files, or we need some how to have rules for when to add null termination and not. These rules in turn are not going to be read by all users, so some people will just assume that embed always adds null terminate in when it doesn't and then we are back to square one. The more we add the more corner cases there are.

duped · on July 23, 2022

I don't see how that would prevent sequential embeds unless you have defined the semantics of embed to be a textual include, which if that's the case then the mistake are the semantics and not the constraints on compilers or embedded files.

edit: why is it desirable to concatenate files with #embed? Does that not seem out of scope if not contrived?

elcritch · on July 23, 2022

Why assume the data should be null terminated? Its an array with a known compile time size. Binary data often needs to include 0 / NULL.

saagarjha · on July 23, 2022

This is literally the premise of this comment chain. Can you see how the standard can get dragged around by various competing needs? People want to embed strings, with null termination, some just want to embed binary data. Neither is particularly “wrong”. And in this thread everyone is presenting “obvious” solutions that are one or the other.

elcritch · on July 24, 2022

No, at least in this case I think the feature chooses the correct path on this. If you want to embed a string that’s null terminated then just save the string into a file as a null terminated string or copy it from the array to a proper string type. It looks like you may even be able to use this with a struct which might let you add a null termination after the array.

Though yes I agree lots of features get bogged down trying to handle everything. But in this case not adding it creates even more complexity. You can store strings natively in C. You can’t include binary blobs in C in a platform independent way so you have hacks that explode compile times for everyone using that software.

The ability to add vendor specific attributes also allows for those use cases to evolve naturally while still solving the core problem of embedding binary data.

saagarjha · on July 24, 2022

I, FWIW, agree with the implementation. I'm just saying that this thread has someone saying "the compiler should add null termination" in it so the choice to make is not obvious.

pantalaimon · on July 24, 2022

If you need 0 termination simply write

    const char foo[] = {
    #embed <file.txt>
        , 0
    };

nyanpasu64 · on July 23, 2022

I don't think adding a null terminator is useful for binary files which are not null-terminated strings, and may even have embedded 0 bytes in the middle.

duped · on July 23, 2022

sure, but if it's a string that requires it to be null terminated, there's no reason the compiler can't solve that problem

remexre · on July 24, 2022

How does the compiler know if an array is a string or not?

duped · on July 24, 2022

It doesn't have to, you just add a zero byte at the end of the embedded byte sequence in the object file. It's up to the programmer to make the choice how to interpret that.

That said for binary embeds you almost always need the length embedded as well, which has been the case for every tool I've used to embed files in object code. You usually get something like

    const size_t My_FILE_LENGTH = ... ;
    const uint8_t MY_FILE[] = { ... };

remexre · on July 24, 2022

> It doesn't have to, you just add a zero byte at the end of the embedded byte sequence in the object file. It's up to the programmer to make the choice how to interpret that.

Like, are you proposing that the ideal semantics for `#embed "foo"` if foo contained 0x10 0x20 0x30 0x40 to be to expand to `16, 32, 48, 64, 0`? That seems more annoying than the opposite, given that:

> That said for binary embeds you almost always need the length embedded as well, which has been the case for every tool I've used to embed files in object code. You usually get something like

TFA demonstrated it by relying on sizeof() of arrays doing the right thing:

    static_assert((sizeof(sound_signature) / sizeof(*sound_signature)) >= 4,
        "There should be at least 4 elements in this array.");

You'd need to change this to subtract one every time, which sounds more annoying when embedding binary resources than adding the zero to strings would be.

jcelerier · on July 24, 2022

What ? No it shouldn't. Use pointer+size if you want strings

rootbear · on July 23, 2022

I was on X3J11, the ANSI committee that created the original C standard and my experience was similar. It was a great opportunity to learn C at depth and get an understanding of many of the subtle details. We rejected a great many suggestions because our mandate was to standardize existing practice, address some problem areas, and not get too creative. (We occasionally did get too creative. The less said about noalias the better.)

uecker · on July 23, 2022

We are still fixing bugs in restrict...

spc476 · on July 23, 2022

Maybe you can answer a question I have: what companies are still supporting C compilers for sign-magnitude and 1s-complement machines today? I've been programming for almost 40 years now, and I have never come across any machine that is sign-magnitude or 1s-complement (I have encountered real analog computers---a decent sized one too---about 9' (3m) long, 6' (2m) high, and about 3' (1m) deep, requiring hundreds of patch cables to program).

AlexanderDhoore · on July 23, 2022

"""Codify existing practice to address evident deficiencies. Only those concepts that have some prior art should be accepted. (Prior art may come from implementations of languages other than C.) Unless some proposed new feature addresses an evident deficiency that is actually felt by more than a few C programmers, no new inventions should be entertained."""

Source: Rationale for International Standard — Programming Languages — C https://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.1...

I don't know if this rationale is still followed, but I think it applies here. We need to be cautious when adding new features to C.

rurban · on July 24, 2022

well, basic string support would be fine, wouldn't it? the C standard still having no proper string library for decades didn't harm its popularity, but still.

you cannot find non-normalized substrings (strings are Unicode nowadays), utf-8 is unsupported. coreutils and almost all tools don't have proper string (=Unicode) support.

eps · on July 23, 2022

> where NULL isn't on address 0

Isn't there literally a single GPU for which it is true?

Asking because everytime this surfaces, someone inevitably asks for an example, and the only example I've seen over the years was of one specific (Nvidia?) GPU that uses NULL of 0xFFFFFFFA (or something similar).

That is, do you know how common it is for NULL to not be 0?

saagarjha · on July 23, 2022

There’s a lot of platforms where you might want to do this. If you’re programming baremetal the “address 0” might be a physical address that you expect stuff to exist at, so it might be relevant to use the bit pattern 0xffffffff instead. If you’re targeting a blockchain or WASM VM you may not also not have memory protection to work with, just a linear array of memory. And some machines don’t even have bit patterns for pointers, like say a Lisp machine.

pantalaimon · on July 24, 2022

There are many MCUs where flash is mapped to 0x0 but NULL is still 0.

It’s not really a problem in practice unless you want to dump the whole flash and some != NULL check somewhere

colejohnson66 · on July 24, 2022

For an older CPU example, x86 (in 16 bit mode) maps the interrupt table at the physical address 0. So to tell the CPU what handler to use for the 0th interrupt, you have to do:

    *(u16)0 = segment // dereference of null!
    *(u16)2 = offset

Granted, most 16 bit OSs were written in assembly, not C, but if you were to write one in C, you’d have this problem.

IIRC, the M68k (which was a popular C target with official Linux support) did the same thing.

For a more recent example, AVR (popularized by Arduino) maps it’s registers into RAM starting at address 0. So, if you wanted to write to r0, you could write to NULL instead. Although one would be using assembly for this, not C.

eslaught · on July 23, 2022

It's true (in some memory spaces) in AMD GPU too:

https://llvm.org/docs/AMDGPUUsage.html#memory-spaces

eps · on July 23, 2022

That's the one!

Bjartr · on July 23, 2022

Here is an answer that includes a few examples systems from comp.lang.c

https://c-faq.com/null/machexamp.html

oxff · on July 23, 2022

People who call C simple have some weird definition of simple. How many C programs contain UB or are pure UB? Probably over 95%+. Language's not simple at all.

bigdict · on July 23, 2022

A straight razor is simple and that's why it's the easiest to cut yourself with. An electric razor is much safer precisely because much engineering went into its creation.

pif · on July 23, 2022

Thank you for your post!

Thank you especially for reminding everybody that programming is much more than web programming and information systems.

quelsolaar · on July 23, 2022

Thank you,

Its also worth remembering that a lot of higher level languages have runtimes / VMs are implemented in C. Web applications rely heavily on databases, java script VM, network-stacks, system calls and operating system features, all of which are impemented in C.

If you are a software developer and want to do something about climate change, consider becomming a compiler engineer. If you manage to get a couple of tenths of a percent performance increase in one of the big compilers during your career, you will have materially impacted global warming. Compiler engineers are the unsung heroes of software engineering.

thefaux · on July 24, 2022

> If you manage to get a couple of tenths of a percent performance increase in one of the big compilers during your career, you will have materially impacted global warming.

I've heard this kind of claim a number of times and I think it's more complicated than the crude statistical measurement makes it sound. Personally, I think that most programs are not run frequently enough to matter from an emissions perspective. For programs that are, like ML training programs, users will just train more data if the algorithms are faster so most energy efficiencies will get wiped out by the increased usage.

Even if that theory is wrong, what if there is a language that is 10% better than C for 95% of common C use cases? Wouldn't it be better for compiler engineers to focus on developing that language than micro-optimizing C?

ErikCorry · on July 23, 2022

No JavaScript VM is implemented in C. They are all written in a language that's a bit like C++ but has no exceptions and relies on lots of compiler behaviour that is not defined by the C++ standard.

woodruffw · on July 23, 2022

Hm? I can think of two pure C JS engines off the top of my head: Duktape and Elk. I believe Samsung or another vendor also has their own; they’re all somewhat common in the embedded space.

ErikCorry · on July 23, 2022

Fair. I'm not familiar with the tiny JS VMs. But really the main point stands: It's not possible to build a decent GC without violating strict aliasing so C and C++ as standardized are not suitable for this task.

xyzzy_plugh · on July 23, 2022

I guess this doesn't exist then: https://bellard.org/quickjs/

astrange · on July 23, 2022

This is not written in C if it doesn’t pass UBSan/ASan/Frama-C and co. It’s written in a language that just happens to look like C.

saagarjha · on July 23, 2022

This is incredibly pedantic, even for Hacker News. I suspect you do not wish to be responded to like this in general, so I struggle to see why it is appropriate here.

ErikCorry · on July 24, 2022

The point is that if you are programming in a C dialect or a C++ dialect you will be told this again and again.

Ask a question on Stackoverflow about code that requires nonstandard flags or uses UB. You will be told your program is not really in C/C++ and that your question makes no sense. You will be lectured on nasal demons for the nth time.

File a bug against a compiler asking the optimizers to back off using UB to subvert the intentions of the programmer and you will be told there's no way to even know the intentions of the programmer given that the standards don't apply.

So we can all move to a nomenclature where C is used loosely to indicate a family of languages, once of which happens to be standardized. I'd be happy to do that.

But let's not bait and switch, using standards pedantry to dismiss feature requests and bug reports, but then turning around and saying C/C++ is suitable for implementing runtimes of other languages when nothing can really be achieved in that space without going beyond the language spec.

And really in 2022 runtimes, JITs, GCs should be the primary use of C/C++. Many other uses (systems software, compilers, desktop apps, phone apps) are not suited for C/C++ due to security, stability, ease of development and newer languages that make more sense for these domains.

saagarjha · on July 25, 2022

Well…the answer is that the response you get will depend. If you are familiar with the standard and how it is implemented in compilers, you will get a sense for which UBs are implicitly blessed by compilers and you can file bugs against. "I double freed a pointer and expected something reasonable to happen" is never going to get you a serious response. But if you ask something like "I tagged this pointer's bit pattern and untagged it and the pointer I got back isn't valid" you will generally be heard out. FWIW I would recommend picking a different language to write your language runtime in 2022, using it to support your new language is often a good way to invite memory unsafety bugs in your code.

astrange · on July 24, 2022

It’s a valuable distinction! This is a thread about the C standard. It’s misleading to talk about projects that use a specific C compiler on a specific platform but plainly need implementation details past what’s in the standard, or even straight up violating them.

saagarjha · on July 25, 2022

Well, yes and no. I do actually agree with your claim that Linux uses a nonstandard C because it builds with special flags and without them it would not work. But for code that just happens to have unintentional bugs that result in UB–I definitely consider that C. Code that has a gentleman's agreement with the compiler on certain UBs is straddling the line but I would still usually call it C, unless someone tried to claim that because GCC accepted something it must be part of the standard. My point really isn't that you're wrong or that this isn't useful in the right context–that's why I called it pedantic–but the difference between "we build our code with special flags and this doesn't work with any other code that is ostensibly in this language because of that" and "this works on major compilers but is not standards compliant" is kind large and I don't think you really needed to go where you did. Sort of like if you made a typo in a comment about the English language I don't go "you must be using a language other than English, because clearly this is not valid" I say you have a typo, but if you're one of the people who makes everything lowercase and doesn't use punctuation I might reasonable say "it's mostly English but I'm not sure I can really call it that without qualification".

icedchai · on July 23, 2022

Close enough. Will you claim the Linux kernel isn't C because it's compiled with -fno-strict-overflow and -fno-strict-aliasing ?

astrange · on July 23, 2022

Yes, that’s why it only supports specific C compilers.

Anything that includes its own memory allocator (that doesn’t call malloc()) is probably not implemented in standardized C.

icedchai · on July 23, 2022

It’s still “C”, even if it’s a specific dialect. Vendor specific C extensions have existed forever.

noobermin · on July 23, 2022

How would such a platform without file systems handle #include?

Reading further, I don't think this was ever addressed when someone else brought it up. I cannot for the life of me imagine a system where #include works but #embed doesn't. Again, it's fine if some systems have non-standard subsets of the C standard....why hobble the actual standard for code which can be compiled on systems where you have a filesystem (that will handle #include by the way) for the systems without filesystems?

bibabaloo · on July 24, 2022

> How would such a platform without file systems handle #include?

I don't think it would, you'd cross-compile for it on a platform with a file system. I think the parent poster's point was that C is the only option for some ultra low resources platforms and that a conservative approach should be taken to add new features in general. I don't think they were saying that specifically that not having a filesystem is problematic for this particular inclusion.

asojfdowgh · on July 24, 2022

include is with regards to the source platform, not the target platform la, you (generally) need a filesystem to compile, but you don't need a filesystem to run what you compiled

Asooka · on July 24, 2022

Congratulations! #embed is a very useful feature.

If I may gripe about C for a bit though. I do truly appreciate C's portability. It's possible to target a very diverse set of architectures and operating systems with a single source. Still, I do wish it would actually embrace each architecture, rather than try to mediate between them. A lot of my gripes with C are due to undefined behaviour which is left as such because of platform differences. I've never seen my program become faster if I remove `-fwrapv -fno-strict-aliasing`, but it has resulted in bugs due to compiler optimisations. I really wish by default "undefined behaviour" would become "platform-specific behaviour", with an officially blessed way to tell the compiler it can perform farther optimisations based on data guarantees.

C occupies a very pleasant niche where it lets you write software for the actual hardware, rather than for a VM, while still being high level enough to allow for expressiveness in algorithms and program organisation. I just wish by default every syntactically valid program would also be a well-defined program, because the alternative we have now makes it really hard to reason about and prove program correctness (i.e. that it does what you think it does).

pbohun · on July 23, 2022

Thanks for your work on the C standard. Any changes that are made will remain forever, so I'm glad the committee takes this seriously.

blippage · on July 23, 2022

Seems like a nice addition. Much better than futzing around with xxd and suchlike.

userbinator · on July 24, 2022

I'm curious what you think of UB from a standard perspective --- were things left undefined and not just implementation-defined because there was simply so much diversity in existing and possibly future implementations that specifying any requirements would be unnecessarily constraining? I can hardly believe that it was done to encourage compiler writers to do crazy nonsensical things without regard for behaving "in a documented manner characteristic of the environment" which seems like the original intent, yet that's what seems to have actually happened.

quelsolaar · on July 24, 2022

>I'm curious what you think of UB from a standard perspective

I think a lot about that! I'm a member of the UB study group and the lead author of a Technical Report we hope to release on UB.

In short, "Undefined behavior" is poorly named. It should have been called "Things compilers can assume the program wont do". With what we call "assumed absence of UB" compilers can and do do a lot of clever things.

Until we get the official TR out, you may find I made a video on the subject interesting:

https://www.youtube.com/watch?v=w3_e9vZj7D8

derefr · on July 24, 2022

> And when I say truly, I mean for platforms without file systems, or operating systems or where bytes aren't 8 bits, that doesn't use ASCI or Unicode, where NULL isn't on address 0 and so on.

Genuine question: why do we want these platforms to live, rather than to be forced to die? They sound awful.

I understand retrocomputing, legacy mainframes, etc; but 99% of that work is done in non-portable assembler and/or some flavor of BASIC; not in C.

quelsolaar · on July 24, 2022

May of these platforms are micro controllers, DSPs or other programmable hardware, that are in every device now a days, so its not retro, its very much current technology.

derefr · on July 26, 2022

Once again — I can understand wanting to program this hardware, but who's programming it in C, rather than writing directly to the metal in order to squeeze every cycle out of these?

naasking · on July 26, 2022

Arduino programmers don't need to squeeze every cycle out of their hobby projects.

jdougan · on July 24, 2022

Because they weren't necessarily awful. Because in the future we may discover we need to do "weird" things again, for performance or other reasons.

timhh · on July 23, 2022

Ha I suggested this on the C++ proposals mailing list 7 years ago:

https://groups.google.com/a/isocpp.org/g/std-proposals/c/b6n...

Enjoy the naysayers if you like! I'm glad someone spent the time and effort to push past them. Bit too late for me - I have moved on to Rust which had support for this from version 1.0.0.

> There's also the standard *nix/BSD utility "xxd".

> Seems like the niche is filled. Or, at least, if you want to claim that

> (A) XPM

> (B) incbin

> (C) "xxd -i"

> (D) various ad-hoc scripts given in http://stackoverflow.com/questions/8707183/script-tool-to-co...

>...do NOT completely fill this evolutionary niche

> This ultimately would encourage a weird sort of resource management philosophy that I think might be damaging in the long run.

> Speaking from experience, it is a tremendously bad idea to bake any resource into a binary.

> I'll point out that this is a non-issue for Qt applications that can simply use Qt's resources for this sort of business.

(Though credit to Matthew Woehlke, he did point out a solution which is basically identical to #embed)

> I find this useless specially in embedded environments since there should be some processing of the binary data anyway, either before building the application

In fairness there was a decent amount of support. But given the insane amount of negativity around an obviously useful feature I gave up.

I wonder if there was a similar response to the proposal to include `string::starts_with()`...

einpoklum · on July 23, 2022

> > Speaking from experience, it is a tremendously bad idea to bake any resource into a binary.

What a pompous douche whoever wrote that was.

> > This ultimately would encourage a weird sort of resource management philosophy that I think might be damaging in the long run.

So, this might be a valid point, although not enough to reject the feature for. It true that it's a feature that could potentially see over-use and ab-use. But then, so did templates :-P

dleslie · on July 23, 2022

> What a pompous douche whoever wrote that was.

And clearly someone who had never once written code for a system without a filesystem.

beached_whale · on July 24, 2022

Are they writing code on a system without a filesystem, preprocessor feature

boywitharupee · on July 23, 2022

what is the Rust equivalent for #embed?

zRedShift · on July 23, 2022

https://doc.rust-lang.org/std/macro.include_bytes.html

guipsp · on July 23, 2022

https://doc.rust-lang.org/std/macro.include_bytes.html

dbrgn · on July 23, 2022

And there's also https://doc.rust-lang.org/std/macro.include_str.html for strings.

GrumpySloth · on July 23, 2022

> told me this form was non-ideal and it was worth voting against (and that they’d want the pure, beautiful C++ version only[1])

I heard about #embed, but I didn't hear about std::embed before. After looking at the proposal, to me it does look a lot better than #embed, because reading binary data and converting it to text, only to then convert it to binary again seems needlessly complex and wasteful. I also don't like that it extends the preprocessor, when IMHO the preprocessor should at worst be left as is, and at best be slowly deprecated in favour of features which compose well with C proper.

Going beyond the gut reaction and moving on to hard data, as you can expect from this design, std::embed of course is faster during compilation than #embed for bigger files (comparable for moderately-sized files, and a bit slower for tiny files).

I'm not a huge fan of C++, but the fact that C++ removed trigraphs in C++17 and that it's generally adding features replacing the preprocessor scores a point with me.

[1]: <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p10...>

colonwqbang · on July 23, 2022

Compilers follow the "as if" principle, they don't have to literally follow the formal rules given by the standard. They could implement #embed by doing as you say, pretty printing out numbers and then parsing them back in again. But that would be an extremely roundabout way to do it, so I doubt anyone will actually do it that way. Unless you're running the compiler in some kind of debugging mode like GCC's -E.

twoodfin · on July 23, 2022

I don’t think the implication is that the C compiler must encode the binary file as a comma-separated integer list and then re-parse it, only act as if it did so.

GrumpySloth · on July 23, 2022

How would that work? It would need to depend on the grammar of surrounding C code. This directive isn't limited to variable initialisers. You can use it anywhere. So e.g. you can use it inside structure declaration, or between "int main()" and "{". etc. etc. Those will generate errors in subsequent phases, but during preprocessing the compiler doesn't know about it. Then there is also just that:

  int main () {
      return
  #embed "file.bin"
      ;
  }

There are plenty of cases, where it will all behave differently. And if you're going to pretend even more that the preprocessor understands C syntax, then why not just give this job to compiler proper, which actually understands it?

defen · on July 23, 2022

Preprocessing produces a series of tokens, so you would implement it as a new type of token. If you're using something like `-E` you would just pretty-print it as a comma-delimited list of integers. If you're moving on to translation phase 7, you'd have some sort of rules in your parser about where those kinds of tokens can appear . Just like you can't have a return token in an initializer, you wouldn't be allowed to have an embed token outside of one (or whatever the rules are). And you can directly instantiate some kind of node that contains the binary data.

GrumpySloth · on July 23, 2022

> you would implement it as a new type of token

That's a good point. I consider myself debunked.

mgaunard · on July 23, 2022

the preprocessor is a great tool to reduce duplication and boilerplate.

People that don't like it generally just don't know how to use it.

timhh · on July 23, 2022

People don't dislike it because they are unaware how helpful it can be. They dislike it because they are aware how hacky, fragile and error-prone it is. They want something more robust than text substitution.

chlorion · on Aug 6, 2022

People that don't like it generally have used macros that are more sophisticated than just blindly copy pasting text into your source files and have became aware of how absurd that is.

LadyCailin · on July 23, 2022

Or perhaps those people can think of better ways to get those benefits that also don't allow things like

    #ifndef asdf
    }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
    #endif

which obliterate tooling such as IDEs. Of course, this is a contrived example, but the preprocessor is just one big footgun, which offers no benefits over other ways of solving the problems you mentioned, such as constexpr and perhaps additional, currently unimplemented solutions.

mgaunard · on July 23, 2022

It being possible to misuse a tool does not mean the tool is not very useful.

KerrAvon · on July 23, 2022

A tool being very useful doesn’t mean that it is a very good tool.

There are better tools for the functionality the C preprocessor attempts to provide. Other languages have module inclusion systems and very powerful macros that don’t have the enormous footguns of the C preprocessor.

Édit: to be clear, I think #embed is a fine idea; I’d use it and it would make my sourcebase cleaner in some places.

makapuf · on July 23, 2022

My carpenter has a lot of tools that can be dangerous if misused. Of course better tools can be devised, but useful things have been done with them (and he still has all his fingers)

LadyCailin · on July 23, 2022

Yes, but we’re in a thread about ways to improve the language, not about how to make the best with what’s there. This type of argument holds back improvement.

kibwen · on July 23, 2022

This serves the same use as Rust's `include_bytes!` macro, right? Presumably most people just use this feature as a way to avoid having to stuff binary data into a massive array literal, but in our case it's essential because we're actually using it to stuff binaries from earlier in our build step into a binary built later in the build step. Not something you often need, but very handy when you do.

tialaramex · on July 23, 2022

This has different affordances than std::include_bytes! but I agree that if you were writing Rust and had this problem you'd reach for std::include_bytes! and probably not instead think "We should have an equivalent of #embed".

include_bytes! gives you a &'static [u8; N] which for non-Rust programmers means we're making a fixed size array (the size of your file) full of unsigned 8-bit integers (ie bytes) which lives for the life of the program, and we get an immutable reference to it. Rust's arrays know how big they are (so we can ask, now or later) but cannot grow.

#embed gets you a bunch of integers. The as-if rule means your compiler is likely to notice if what you're actually doing is putting those integers into an array of unsigned 8-bit integers and just stick all the file bytes in the array, short cutting what you wrote, but you could reasonably do other things, especially with smaller files.

kzrdude · on July 23, 2022

for both Rust and C, these features "just" make something you could otherwise do with the build system and generated code easier, I think.

masklinn · on July 23, 2022

As the article quotes, in C the lack of standardisation makes this tricky when you want to support more than one compiler, or even when you want to support just one compiler (cf email about the hacks to make it work on GCC with PIE).

tempodox · on July 23, 2022

> Even among people who control all the cards, they are in many respects fundamentally incapable of imagining a better world or seizing on that opportunity to try and create one, let alone doing so in a timely fashion.

That does sound soul-crushing. Congrats on this achievement!

quelsolaar · on July 23, 2022

This is simply wrong. We (the ISO wg14) don't hold the cards, compilers are free to implement what ever they want, users are free to use what ever tools or languages they want.

We exist only as long as we are trusted to be good stewards, and only go forward with the consensus of the wider community.

unreal37 · on July 23, 2022

You're both right.

It's amazing that you and the ISO team are good stewards of the C standard. Thank you for being part of that.

And it can also be true that it was "hell" and "hardly worth it" for the OP to get a new feature added to the language. I believe it was a miserable experience that has him questioning how he spends his time.

Both can be true. Thank you for your efforts. And thank the OP for his efforts too.

morelisp · on July 23, 2022

> > Even among people who control all the cards, they are in many respects fundamentally incapable of imagining a better world or seizing on that opportunity to try and create one, let alone doing so in a timely fashion.

> This is simply wrong. We (the ISO wg14) don't hold the cards, compilers are free to implement what ever they want, users are free to use what ever tools or languages they want.

This is an incredibly oblivious realization of JeanHeyd's point.

gtirloni · on July 23, 2022

> (the ISO wg14) don't hold the cards

That "standard" card seem to be a pretty huge one though.

quelsolaar · on July 23, 2022

Yeah, but a document doesn't compile c code so best to stay humble. :-)

moffkalast · on July 23, 2022

I think in our reality the prerequisite for holding all the cards is the lack of competence in knowing how to improve the world. We've gotten where we are now through sheer force of will of those that are empty handed.

jeffreygoesto · on July 23, 2022

The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.

George Bernard Shaw

moffkalast · on July 23, 2022

This reminds me, I'd argue that the explosion of JS frameworks can be mainly blamed on one thing: the lack of an <include src="somemodule.html"> tag. If you have that you basically have 80% of vue.js already natively supported. No clue why this was never added in any fashion. Change my mind.

TheAceOfHearts · on July 23, 2022

HTML imports were part of the original concept of Web Components, and I think they were supported in Chrome. If you look up examples of things built with Polymer 1.x, it was used extensively.

It was actually pretty neat, because you could have an HTML file with a template, style, and script section.

Safari rejected the proposal, so it had to get dropped.

But ESM makes it a bit redundant anyway. The end-goal is to allow you to import any kind of asset, not just JS. There have been demos and examples of tools supporting this going back over half a decade at this point.

elcritch · on July 23, 2022

Firefox refused the proposal as well. ESM requires javascript though. :/

moffkalast · on July 24, 2022

My admiration for Mozilla and Apple grows every day /s

Narishma · on July 25, 2022

Same, but without the /s

stevekemp · on July 23, 2022

It's funny I read that and I remember Apache's virtual-include facility:

    <!--#include virtual="/cgi-bin/example.cgi?argument=value" -->

I used that, back in the day, as an alternative to PHP.

lkschubert8 · on July 23, 2022

Wouldn't the include still need some templating functionality? Or are people using vue that heavily for just importing static html?

ear7h · on July 23, 2022

Not the parent comment, but my personal use case is for rendering a selectable list. The server side would render a static list with fragment links (ex. `#item-10`) and include elements with corresponding IDs, and a `:target` css rule to unhide the element. This would hopefully be paired with lazy loading the include elements.

edit:

My goal is to avoid reloading the page for each selection and rendering all items eagerly. JS frameworks are the only ones that really allow this behavior.

tirpen · on July 23, 2022

> https://caniuse.com/imports

It was a feature in Chrome 36-79 and there were working polyfills to make it work on other browsers.

It was actually a great feature and I used it extensively on an old project back then.

CanIUse: https://caniuse.com/imports

(Now obsolete) tutorial: https://www.sitepoint.com/introduction-html-imports-tutorial...

agumonkey · on July 23, 2022

I wonder why there's never been a

    <calc> /* ... compute and return dom element */ </calc>

Basically what php does but with structure and objects instead of a bytestream

in HTML. Or maybe it's been discussed but got left out

marwis · on July 23, 2022

It's always been possible:

agumonkey · on July 24, 2022

Good point, but it's very different in terms of thinking. Same as structural versus string macros in programming languages.

marwis · on July 25, 2022

You can do it with custom elements, e.g.

      <dom-calc>
        const p = document.createElement('button');
        p.innerText = 'hi';
        p.onclick = () => alert('hi');
        return p;
      </dom-calc>

For something like:

  class CalcElement extends HTMLElement {
    connectedCallback() {
      setTimeout(() => {
        const fn = new Function(this.innerText);
        this.replaceWith(fn());
      }, 0);
    }
  }

  customElements.define('dom-calc', CalcElement);

andai · on July 23, 2022

    <?php require("somemodule.html"); ?>

xigoi · on July 23, 2022

How would <include> be useful for dynamically updating the DOM based on data, which is the main point of Vue?

polskibus · on July 23, 2022

Is <script type="module" /> not sufficient for your needs? If not then what is missing?

nkozyra · on July 23, 2022

Seems to be arguing for modular layout/templating, which is what virtual includes did (the cgi in the example would hypothetically output html)

owalt · on July 23, 2022

Honestly I'm usually very wary of additions to C, as one of its greatest strengths (to me) is how rather straightforward it is as a language in terms of conceptual simplicity. There just aren't that many big concepts to understand in the language. (On the other hand there's _many_ footguns but that's another issue.)

That said, to me this seems like a great addition to the language. It's very single-purpose in its usage (so it doesn't seem to add much conceptual complexity to the language) and it replaces something genuinely painful (arcane linker hacks). I'm very much looking forward to using this as I often make single-executable programs in C. The only thing that's unfortunate is I'm sure it'll take decades before proprietary embedded toolchains add support for this.

pjmlp · on July 23, 2022

C23 and C26 are basically heading into C++ without classes.

sylware · on July 23, 2022

There is way too much in C already.

The first commandment of C is: 'writing a naive C compiler should be "reasonable" for a small team or even one individual'. That's getting harder and harder, longer and longer.

I did move from C being "the best compromise" to "the less worse compromise".

I wish we had a "C-like" language, which would kind of be a high-level assembler which: has no integer promotion or implicit casts, has compile-time/runtime casts (without the horrible c++ syntax), has sized primitive types (u64/s64,f32/f64,etc) at its core, has sized literals (42b,12w,123dw,2qw,etc), has no typedef/generic/volatile/restrict/etc well that sort of horrible things, has compile-time and runtime "const"s, and I am forgetting a lot.

From the main issues: the kernel gcc C dialect (roughly speaking, each linux release uses more gcc extensions). Aggressive optimizations can break some code (while programing some hardware for instance).

Maybe I should write assembly, expect RISC-V to be a success, and forget about all of this.