As someone that writes a lot in Rust, I have a longstanding dream to make a compiler for a subset of the language, since I don’t use all features of Rust and my hope is that this way I could have fast debug builds (in terms of compilation time) while I develop, with simplified borrow checker and so on. So then I can iterate faster. That’s one of my dreams. But I have not had time to even look at it yet, as the other code I work on takes precedence at the moment.
I'd like a GC (no borrow checker) version of Rust. There's so many cool concepts in the language that I'd like to know how useful it is in different contexts. (Compiled without a VM, option types, no nulls, easy error handling without exceptions...)
I think a scripting version of rust with a GC and gradual typing would be really nice. Especially one that allows incrementally converting a script into a compiled rust program.
You could try OCaml (it does have exception but they're usually "opt in"), it's a really nice language and probably one of the closest to GC'd Rust. There's also Swift that inspired some features.
Huh? OCaml is pretty obvious in its semantics and compilation strategy. Maybe flambda changed something, but before that I'd used the compiler source code as a reference on how one would compile an ML.
Flowtype has horrible performance, which in my opinion partially explains why it has lost so many users to Typescript, which has much more predicable and fast performance even though it's written in JS. The project is basically dead
Here is the article about Darklang leaving OCaml, nothing here is about performance https://blog.darklang.com/leaving-ocaml/. It's about ecosystem and dev experience.
About Flow, I've never used it so I can't say, but Rescript (which is written in OCaml) compiles 10 to 100 times faster than Typescript.
I've never used neither Flow nor TypeScript, but there's an essential difference between the two - Flow is type inference engine, whereas TypeScript is a type checking engine. The latter is a fundamentally a much simpler problem (both semantically, and algorithmically) than the former. In addition, Flow picks a particularly nasty niche for type inference - OCaml itself uses type inference, it's fast (with quadratic blowup in edge cases) because OCaml's types are simple, whereas Flow IIRC tries to support both subtyping and some of JS's dynamicity, so I'm not surprised if it's slower.
Darklang is being rewritten to .Net because they were unsatisfied with the OCaml ecosystem.
Horrible performance is generally more a question of algorithms and software architecture than a problem with a language, which only account for a flat percentage of performance.
You might want to check out Scala Native. It’s conceptually similar to Rust without borrow checking. The only issue is that it’s way less mature and community is much smaller.
We use D a lot at work, we genuinely can only laugh at people who refuse to entertain the idea of a GC. The GC allows our code to exist without extensive memory book-keeping.
When we want to avoid the GC we know how, when we want the productivity we use the GC and fuggedaboutit
I don't know how else to word it. Every thread about D has people going on and on and on and on about the GC, and they aren't always wrong (there is always good information here) but fundamentally it doesn't matter. Many D programmers come from PHP rather than C++ - D isn't attractive to them because of systems programming but rather high level programming e.g. metaprogramming.
There seems to be at least one reason to have a non-GC option: You want to write code that integrates with an environment that already has its own GC and you want that to decide things. For example you want to integrate existing libraries in C/C++/D/Go/Rust/whatever with a Lisp process, since there's lots of those libraries and they do useful things, and you don't have a Lisp library for that (yet). In that case having one point of deciding when to deallocate things seems to be better than having two of them.
Using GC basically locks you into a particular memory model or framework.
Having spent 20 years mostly working in GC languages, it's totally worth it! However, writing portable libraries like SQLite or libpng are very difficult to do without bringing in unacceptable overhead for your consumers.
I feel much at home with Swift, and it is my other favorite language to use besides Rust. I use Rust for server-side things and command-line utilities and Swift for iOS and macOS apps. I know that server-side Swift is a thing that some people do. But I don't feel that Swift would be what I am personally looking for server-side, so I maintain my dream of faster development in Rust with a subset of Rust and a custom compiler :)
In fact I would like to be able to integrate Swift and Rust so that I could also use Rust in my iOS and macOS apps directly. But for now and for the types of applications that I am currently working on it is not worth going down that particular route. But in the future I wish to write some games where I want to write all of the game logic in Rust and use Swift + Metal for the rendering.
I just totally disagree. Swift might look a bit more like C#, but in terms of features it's way more similar to Rust for all the ADT-related features mentioned in the grandparent comment.
Also, regarding memory safety, it's not as safe as Rust - it's basically only memory-safe in the single-threaded context, but it's still relatively safe compared to other languages.
I don't really find it easier to crash than Rust - thanks to optionals you get the same guarantees against NPE's. You can choose not to take advantage of it, but you can also force-unwrap in Rust if you want.
It's memory-safe in a single-threaded context. The actor model would theoretically bring multi-threaded memory safety (with a different set of tradeoffs than Rust).
I don't think borrow checking plays a significant roles in most projects compiling performance. Running `cargo check` generally runs a lot faster (after the first run) than `cargo build`, even if it runs borrow checking, etc. AFAIR macro expansion, LLVM code generation and linking takes a lot of time, but it depends on the project.
I think you would be better served by a simplified, less optimized code generation, as from what I understand LLVM is the thing that takes a lot of time. You can see this pattern in Crystal and Swift for example, that also use LLVM and are also relatively slow to compile, compared to OCaml or Go which have their own backend.
The D compiler has Walters last compilers backend. It is very useful to have a really fast debug build. It's also potentially remarkable how slow LLVM is given that the D compilers backend is actually pretty inefficient in places yet still munches LLVM on debug builds.
That's fair, Crystal seems to also take a lot of time with type inference. However the LLVM codegen is almost constantly slower than the alternatives. For example, Haskell has a non-LLVM backend and a LLVM backend, and the LLVM backend is slower. Zig uses LLVM, and they are working on a non-LLVM backend to speed up development builds. A sibling comment mentioned that the situation is the same for D too. For C, I'm almost certain that TCC is faster than Clang with LLVM.
There seems to be a pattern here. I'm not saying that LLVM is bad, in fact if it's used by so many projects it's for a reason. But it does have a cost.
LLVM doesn't support everything. If you want to see what is not supported, the story about Python cryptography migrating from C to Rust is a good start https://lwn.net/Articles/845535/.
I don't see why you would want to do that as you can always ssh into a system where llvm is supported, then compile to C, copy the C sources back to your system and compile rustc using a C compiler.
How are you compiling to C? With the LLVM IR to C backend that was (is?) maintained by the Julia folks? My experience from a couple years ago was that it didn't produce output which would compile with baroque C toolchains for platforms that LLVM doesn't support... I'm giving very unsubtle side eye to you TriCore.
Well, in any case, writing a backend for rust or llvm that generates C is probably still less work than writing an entire rust compiler. Maintaining it is even orders of magnitude less work than maintaining a C++ "shadow" version of the rust compiler.
Niko Matsakis, of Rust's lang and compiler teams, has a talk[1] on how the machinery behind borrow checker is being rethought. The idea is that in the future, the compiler won't really "check borrows" but will instead "track origins."
Reframing the system in this way will (hopefully) make things be flexible enough to Just Work™ without surprises, while providing the same guarantees as the current borrow checker. The video itself has a motivating example or two that shows where this stuff would be helpful.
It would be a garbage collector. I assumed that the person asking the question was primarily looking for a simpler implementation option, not to re-create rustc exactly.
Haven’t looked at the rust compiler code base, but often with languages you can hack together subsets via deletions and comments and flags of existing compilers.
Hadn't heard of this book yet! From my cursory glance this looks like quite a good introduction into language implementation, which is just what I've been looking for myself, thanks a lot!
My favourite FizzBuzz is in Haskell, I found it in this talk by Kevlin Henney[0], and looks like this:
fizzes = cycle ["", "", "Fizz"]
buzzes = cycle ["", "", "", "", "Buzzes"]
words = zipWith (++) fizzes buzzes
numbers = map show [1..]
fizzbuzz = zipWith max words numbers
Henney explains in detail in the video, but it makes use of lazy evaluation to create an infinite list of FizzBuzzes, and also uses no if statements. I find it intellectually exciting, but make no claims on readability
Most of the languages ideas I have aren't focused around syntax these days, rather about automating some tedious part of programming. Two that have stuck with me for a few years now:
1.) A language for plumbing. Basically, this is a mini-language for defining data structures and how they're shipped around different devices and processed. Think of protobufs or Apache Avro, but also including functionality like conditionals & flow control, rich collection operations like map/filter/fold, account & device management libraries, and most importantly, an Actor-based syntax (somewhat like Erlang) where entities like "The user's iPhone" or "The user's smartwatch" or "the database" just exist like process handles, and you can send messages to and from any of them. It'd then compile down into idiomatic Swift/Kotlin/Java/C/SQL to run on appropriate machine, so that your UI code just includes a library and you don't need to rewrite all your data plumbing, serialization, and business logic for each client platform.
2.) A language where you literally can use machine learning like if-statements. Basically it'd have functionality to dump out a feature vector (an array-of-structs), visualize it in a Jupiter notebook, label it (or send it off to Mechanical Turk for labeling), and then feed the labeled data back into any of multiple classifier types for use in an if-statement. Once trained, the model and training data would be checked in as if they were source code, and could be attached to code and versioned the same way that an algorithm would be. You'd run your program in two modes: in training mode, the program executes as much of the code path it can until it gets to an untrained classifier, then dumps out the data for that and starts the labeling/training process to generate the trained model. In execution mode, it uses the generated models to actually make flow-control decisions.
Doubtful I'll ever have time to implement either one of these, but at least at the moment, they're somewhat timely and don't have convenient solutions. I think syntax is basically a solved problem, and don't really care about it.
I would build around a constrained high-level concept. Something like actors, perhaps. It would be assembly-like, command-oriented.
I would then build a visualization environment which shows what the actors are doing, possibly step-by-step. It would show them moving values between registers and buffers. It would show messages passing between actors. It would show new actors create and finished actors die and errors get dumped into the canvas.
Your concept as an end-user would be that you're programming these automatons. You could clearly visualize what's occurring and debug control-flow or algorithmic issues by stepping through their movements.
Maybe not a great idea in the long run, but if it was a weekend lang jam, that's what I'd do.
2. Apparently there's going to be a theme? So that might make whatever wacky language I've been mulling a bad fit. I'm rather curious about how that will play out
Another "crazy" idea: I'd like the compiler to reformat code to match strict style conventions.
Why? Sometimes there's no "right" style. Sometimes novices need training wheels. In the long run, style doesn't matter, but in big projects code is easier to read when style is consistent. It's also easier to onboard when code follows industry-standard styles.
Would it be better when a language gently pushes everyone to the same style?
This has been a part of Go culture from the beginning. You avoid style conflicts because you just run everything through 'gofmt' before checking it in.
It's slowly catching on in other languages, eg. Clang now has clang-format and IntelliJ can auto-reformat your code according to a rule config you set before each check-in.
This is almost always a subjective opinion. I've worked on many projects where people reformat large portions of the codebase to make it "easier to read", and in the end they waste a bunch of time, make using `git blame` a pain, and subjectively either make no difference in code readability or make the code harder for half the team to read.
> I'd like the compiler to reformat code...
Why is this the compiler's job? Most people aren't reading code after it's been compiled. In most of the languages I've worked with, this is handled by a formatter + styleguide/config that either runs in the IDE on save or on a git hook, or both.
> Ever start a new job with a bulk of code you didn't write? Worse, ever take over code written by novices who ignore common conventions?
Yes, I have seen lots of this in scientific computing. However, things like too many/not enough spaces, line widths, etc, are never a huge hindrance for me.
What does make code "hard to read" are things like bad and inconsistent variable/function/class names, bad inheritance practices, bad file organization, and not adhering to common language idioms. That stuff is rarely, if ever, caught by linters.
That’s an idea I’ve been playing around with for my personal language. If I do decide to release it, I have a feeling there will be common style rules that will be required for any ‘library’ code (it would be like if C++ files had to meet google style guides to be posted to some central collection repo). I think the compiler could implement the conversion, but it could require a programmer supplied ‘translation’ file (like including Unicode to ASCII substitutions, or required macro unrolling to some language core set of language features). That would allow for anyone to keep there code formatted in the most logical way for them, but would allow for a common format for distribution. I would also think going the other way, from public style to private should be a possibility.
I don’t know if I will ever actually put my personal language out for release, but I’ve had two acquaintances repeatedly tell me (along with several HN commenters) that I really should at least release some blog posts about the language. Currently it is just my daily driver for work related programming and it compiles to C++, C, or JavaScript.
The usage of `unless` strikes me as unmaintainable and "smart", not in a good way tbh. The types are all other the places, too. What is it even doing exactly? Appending a string to a null value..?
Appending a string to an undefined value, via array addition, which turns them into strings but doesn't turn null into 'null' or undefined into 'undefined'. It's a bit of a hack.
As is it's just doing the first 100 iterations of fizzbuzz as either a number or text depending on the fizzbuzz step but not printing or saving anything. If you wrap it in a console.log() it'll create a 100 element array of objects which are the answer for fizzbuzz[index+1] and log them.
All that said I've never been a fan of dynamically typed languages myself. Fast to make things in but difficult to reason about later.
I think there's a lot of opportunity in the 2d domain, in both directions.
You might have a picture represent a program, like Piet [1]. Or you might have a language specifically designed to show certain types of pictures, like TeX or Processing or more narrowly, PlantUML.
Another area to explore is the distributed space. Perhaps a language with Kubernetes capabilities as first-class objects.
I'd like a language that makes it easy to mock and dependency inject without doing anything special.
IE, if in my business logic, I write "new Foo()", I'd like to be able to write a unit test where I say something like, "whenever I wrote 'new Foo()' swap in this mock object instead."
Or, in a module that's an entry-point, I'd like to say, "whenever I wrote 'new Foo()' swap in this subclass instead."
I've spent so much time refactoring to make code mocking and dependency-injectable.
Take a look at newspeak (smalltalk-ish, it avoids global references enforcing each component to receive dependencies as arguments, i.e. it has DI builtin) and NesC (C with components, which are composed at build time).
Neither is popular but they're pretty interesting!
It started with XPath, then came XPath 2, then XQuery 1, then XPath 3.0 and XQuery 3.0, and then XPath 3.1 and XQuery 3.1.
Six new languages each more expressive than previous one. That is why it takes forever to implement them. And now people are working on XPath 4.0 and XQuery 4.0.
Mayby they should work on a new language with a new name instead of stuffing features in existing languages? And let users decide what they like better?
And this is a post about creating lots of new languages. This doesn't have the goal of having languages that run in production, it has the goal of exploring the solution space.
> You can code in any programming language you'd like to create your project, so long as the language is part of the Debian/Ubuntu or Arch package repo (or one of the language-specific repos, like Rust's cargo).
So seems like it should be fine! Go ahead and make a racket :)
When you think "language", try not to let your mind jump straight to "compiler" or "interpreter". Think more, "specification".
If I asked you to write me a C compiler and you delivered on that, I'm not left with a new language. You didn't design a new language, only the compiler for an already specified language.
Designing a language is more about defining a specification. Then 10 different compiler engineers may pick that spec up and implement 10 different implementations of the same language, for example.
As I was following the implementation recipe I broke it down into "educational steps". Although it isn't a true FORTH it is pretty easy to understand and useful enough to embed inside other applications.
Now and again I consider doing it again, but using a real return-stack to remove the hardcoded control-flow words from the interpreter, but I never quite find the time.