Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Lang Jam: create a programming language in a weekend (github.com/langjam)
272 points by ingve on July 31, 2021 | hide | past | favorite | 130 comments


As someone that writes a lot in Rust, I have a longstanding dream to make a compiler for a subset of the language, since I don’t use all features of Rust and my hope is that this way I could have fast debug builds (in terms of compilation time) while I develop, with simplified borrow checker and so on. So then I can iterate faster. That’s one of my dreams. But I have not had time to even look at it yet, as the other code I work on takes precedence at the moment.


I'd like a GC (no borrow checker) version of Rust. There's so many cool concepts in the language that I'd like to know how useful it is in different contexts. (Compiled without a VM, option types, no nulls, easy error handling without exceptions...)


I think a scripting version of rust with a GC and gradual typing would be really nice. Especially one that allows incrementally converting a script into a compiled rust program.


You could try OCaml (it does have exception but they're usually "opt in"), it's a really nice language and probably one of the closest to GC'd Rust. There's also Swift that inspired some features.


OCaml is often dropped due to its unpredictable performance


Interesting. I’ve actually seen OCaml being chosen because of its predictable performance. But that was by people coming from Haskell.


Well, almost everything has more predicable performance than Haskell. And I love Haskell.


Huh? OCaml is pretty obvious in its semantics and compilation strategy. Maybe flambda changed something, but before that I'd used the compiler source code as a reference on how one would compile an ML.


I've heard the opposite about it, that its predictable performance is one of its strength. Do you have any sources?


Darklang is being rewritten in .Net

Flowtype has horrible performance, which in my opinion partially explains why it has lost so many users to Typescript, which has much more predicable and fast performance even though it's written in JS. The project is basically dead


Here is the article about Darklang leaving OCaml, nothing here is about performance https://blog.darklang.com/leaving-ocaml/. It's about ecosystem and dev experience.

About Flow, I've never used it so I can't say, but Rescript (which is written in OCaml) compiles 10 to 100 times faster than Typescript.


I've never used neither Flow nor TypeScript, but there's an essential difference between the two - Flow is type inference engine, whereas TypeScript is a type checking engine. The latter is a fundamentally a much simpler problem (both semantically, and algorithmically) than the former. In addition, Flow picks a particularly nasty niche for type inference - OCaml itself uses type inference, it's fast (with quadratic blowup in edge cases) because OCaml's types are simple, whereas Flow IIRC tries to support both subtyping and some of JS's dynamicity, so I'm not surprised if it's slower.


None of this seems related to OCaml performance?

Darklang is being rewritten to .Net because they were unsatisfied with the OCaml ecosystem.

Horrible performance is generally more a question of algorithms and software architecture than a problem with a language, which only account for a flat percentage of performance.


I don't think I've ever heard that. Ocaml performance is surprisingly very predictable.


You might want to check out Scala Native. It’s conceptually similar to Rust without borrow checking. The only issue is that it’s way less mature and community is much smaller.


We use D a lot at work, we genuinely can only laugh at people who refuse to entertain the idea of a GC. The GC allows our code to exist without extensive memory book-keeping.

When we want to avoid the GC we know how, when we want the productivity we use the GC and fuggedaboutit


> we genuinely can only laugh at people

That's perhaps not the best wording.

Nonetheless, I do agree with you. D can be very flexible in that regard.


I don't know how else to word it. Every thread about D has people going on and on and on and on about the GC, and they aren't always wrong (there is always good information here) but fundamentally it doesn't matter. Many D programmers come from PHP rather than C++ - D isn't attractive to them because of systems programming but rather high level programming e.g. metaprogramming.


There seems to be at least one reason to have a non-GC option: You want to write code that integrates with an environment that already has its own GC and you want that to decide things. For example you want to integrate existing libraries in C/C++/D/Go/Rust/whatever with a Lisp process, since there's lots of those libraries and they do useful things, and you don't have a Lisp library for that (yet). In that case having one point of deciding when to deallocate things seems to be better than having two of them.


Using GC basically locks you into a particular memory model or framework.

Having spent 20 years mostly working in GC languages, it's totally worth it! However, writing portable libraries like SQLite or libpng are very difficult to do without bringing in unacceptable overhead for your consumers.


You just essentially described Swift


I feel much at home with Swift, and it is my other favorite language to use besides Rust. I use Rust for server-side things and command-line utilities and Swift for iOS and macOS apps. I know that server-side Swift is a thing that some people do. But I don't feel that Swift would be what I am personally looking for server-side, so I maintain my dream of faster development in Rust with a subset of Rust and a custom compiler :)

In fact I would like to be able to integrate Swift and Rust so that I could also use Rust in my iOS and macOS apps directly. But for now and for the types of applications that I am currently working on it is not worth going down that particular route. But in the future I wish to write some games where I want to write all of the game logic in Rust and use Swift + Metal for the rendering.


Swift is more like C# or Java than Rust. It's more like adapting C# to the objective C runtime than Rust.

It's also not memory safe. It's very easy to crash the process.


I just totally disagree. Swift might look a bit more like C#, but in terms of features it's way more similar to Rust for all the ADT-related features mentioned in the grandparent comment.

Also, regarding memory safety, it's not as safe as Rust - it's basically only memory-safe in the single-threaded context, but it's still relatively safe compared to other languages.

I don't really find it easier to crash than Rust - thanks to optionals you get the same guarantees against NPE's. You can choose not to take advantage of it, but you can also force-unwrap in Rust if you want.


Absolutely not. Both Swift and Rust are actively taking inspiration from each other.


Whether or not Swift succeeds at being memory-safe, in theory, it's supposed to be.


It's memory-safe in a single-threaded context. The actor model would theoretically bring multi-threaded memory safety (with a different set of tradeoffs than Rust).


Thanks for clarifying.


I don't think borrow checking plays a significant roles in most projects compiling performance. Running `cargo check` generally runs a lot faster (after the first run) than `cargo build`, even if it runs borrow checking, etc. AFAIR macro expansion, LLVM code generation and linking takes a lot of time, but it depends on the project.

In the perf site you can see some examples: https://perf.rust-lang.org/


I think you would be better served by a simplified, less optimized code generation, as from what I understand LLVM is the thing that takes a lot of time. You can see this pattern in Crystal and Swift for example, that also use LLVM and are also relatively slow to compile, compared to OCaml or Go which have their own backend.


There's a Cranelift backend, which is apparently faster at creating debug builds than LLVM: https://github.com/bjorn3/rustc_codegen_cranelift


The D compiler has Walters last compilers backend. It is very useful to have a really fast debug build. It's also potentially remarkable how slow LLVM is given that the D compilers backend is actually pretty inefficient in places yet still munches LLVM on debug builds.


Or Haskell, which also has multiple backends to choose from.


That is definitely not the case. Compiling C with LLVM is very fast.

In Swift, type inference is one thing that can be extremely slow.


That's fair, Crystal seems to also take a lot of time with type inference. However the LLVM codegen is almost constantly slower than the alternatives. For example, Haskell has a non-LLVM backend and a LLVM backend, and the LLVM backend is slower. Zig uses LLVM, and they are working on a non-LLVM backend to speed up development builds. A sibling comment mentioned that the situation is the same for D too. For C, I'm almost certain that TCC is faster than Clang with LLVM.

There seems to be a pattern here. I'm not saying that LLVM is bad, in fact if it's used by so many projects it's for a reason. But it does have a cost.




Why would someone write a bootstrapping rust compiler in C++ when you can just compile the rust to C using a proper llvm backend?


LLVM doesn't support everything. If you want to see what is not supported, the story about Python cryptography migrating from C to Rust is a good start https://lwn.net/Articles/845535/.


Bootstrapping on platforms where llvm isn't supported?


I don't see why you would want to do that as you can always ssh into a system where llvm is supported, then compile to C, copy the C sources back to your system and compile rustc using a C compiler.


How are you compiling to C? With the LLVM IR to C backend that was (is?) maintained by the Julia folks? My experience from a couple years ago was that it didn't produce output which would compile with baroque C toolchains for platforms that LLVM doesn't support... I'm giving very unsubtle side eye to you TriCore.


Well, in any case, writing a backend for rust or llvm that generates C is probably still less work than writing an entire rust compiler. Maintaining it is even orders of magnitude less work than maintaining a C++ "shadow" version of the rust compiler.


Perhaps the generated C isn't as portable (endianess, byte alignment, etc).


Any ideas on how one can simplify the borrow checker?


Use a garbage collected language instead.

Eventually as optimization point, for the few use cases where the ultimate performance is needed, provide a subset of borrow checker capabilities.

Examples of current efforts into this direction, D, Swift, Haskell, Chapel, some .NET ongoing research.


Niko Matsakis, of Rust's lang and compiler teams, has a talk[1] on how the machinery behind borrow checker is being rethought. The idea is that in the future, the compiler won't really "check borrows" but will instead "track origins."

Reframing the system in this way will (hopefully) make things be flexible enough to Just Work™ without surprises, while providing the same guarantees as the current borrow checker. The video itself has a motivating example or two that shows where this stuff would be helpful.

[1]: https://youtu.be/_agDeiWek8w


For those who don't want to sit through a video there is also the book[1].

[1]: https://rust-lang.github.io/polonius/


You could implement it as a runtime check


Why not just use GC at that point? I thought the whole USP is the zero runtime cost


It would be a garbage collector. I assumed that the person asking the question was primarily looking for a simpler implementation option, not to re-create rustc exactly.


Haven’t looked at the rust compiler code base, but often with languages you can hack together subsets via deletions and comments and flags of existing compilers.


Crafting Interpreters came out just in time for this :)

http://craftinginterpreters.com


Hadn't heard of this book yet! From my cursory glance this looks like quite a good introduction into language implementation, which is just what I've been looking for myself, thanks a lot!


You should definitely check out https://interpreterbook.com/ and https://compilerbook.com/ too. They're very well written and very thorough. Highly recommend them as approachable primers on the topic.


I've been working on a toolkit for creating interpreted languages, right now I'm liking Swift a lot as host language:

https://github.com/codr7/swifties


I love working with Swift, but it's such a pain to maintain and deploy it


Depends on the purpose, for those that only care about Apple platforms it is alright.


If you're looking for similar hackathons, you may be interested in https://quirkylanguages.com/


What kind of language ideas do you folks have? What would your fizzbuzz look like?

I'm not sure I've ever seen a more readable compact fizzbuzz than this version in coffeescript

    ['fizz' unless i%3] + ['buzz' unless i%5] or i for i in [1..100]


My favourite FizzBuzz is in Haskell, I found it in this talk by Kevlin Henney[0], and looks like this:

  fizzes   = cycle ["", "", "Fizz"]
  buzzes   = cycle ["", "", "", "", "Buzzes"]
  words    = zipWith (++) fizzes buzzes
  numbers  = map show [1..]
  fizzbuzz = zipWith max words numbers
Henney explains in detail in the video, but it makes use of lazy evaluation to create an infinite list of FizzBuzzes, and also uses no if statements. I find it intellectually exciting, but make no claims on readability

[0]https://youtu.be/LueeMTTDePg?t=2891


"max" seems like an odd choice. Wouldn't that start to fail at large numbers?

Edit: Nevermind, max is lexicographic, not based on string length. Brain fart.


Yes, that part is hacky. It's easy to write a short function to do the same thing explicitly but then you lose the concise charm


Thanks for explaining it!


Seems pretty readable to me if you’re comfortable with Haskell.


I think 0 should be a FizzBuzz?

Also I'm not sure how that solution would work for negative numbers...

Then again what is the right answer for 7i + 24?

Corner cases duck!


I've never seen fizzbuzz defined in such a way that zero or negative numbers are part of the problem.

Typically it is a counting game, starting at 1 and going up to some arbitrary value.


For 0 being a FizzBuzz you can `cons` a "FizzBuzz" to the start of the list

  fizzbuzz = "FizzBuzz" : zipWith...
To make negative numbers work you'd need a new numbers definition

  numbers = map show [-1, -2..]
Should yield all the negative integers eventually.

I've never heard of FizzBuzz defined for complex/imaginary/2-d numbers. That's interesting to consider, I'll be thinking about this for a while


Most of the languages ideas I have aren't focused around syntax these days, rather about automating some tedious part of programming. Two that have stuck with me for a few years now:

1.) A language for plumbing. Basically, this is a mini-language for defining data structures and how they're shipped around different devices and processed. Think of protobufs or Apache Avro, but also including functionality like conditionals & flow control, rich collection operations like map/filter/fold, account & device management libraries, and most importantly, an Actor-based syntax (somewhat like Erlang) where entities like "The user's iPhone" or "The user's smartwatch" or "the database" just exist like process handles, and you can send messages to and from any of them. It'd then compile down into idiomatic Swift/Kotlin/Java/C/SQL to run on appropriate machine, so that your UI code just includes a library and you don't need to rewrite all your data plumbing, serialization, and business logic for each client platform.

2.) A language where you literally can use machine learning like if-statements. Basically it'd have functionality to dump out a feature vector (an array-of-structs), visualize it in a Jupiter notebook, label it (or send it off to Mechanical Turk for labeling), and then feed the labeled data back into any of multiple classifier types for use in an if-statement. Once trained, the model and training data would be checked in as if they were source code, and could be attached to code and versioned the same way that an algorithm would be. You'd run your program in two modes: in training mode, the program executes as much of the code path it can until it gets to an untrained classifier, then dumps out the data for that and starts the labeling/training process to generate the trained model. In execution mode, it uses the generated models to actually make flow-control decisions.

Doubtful I'll ever have time to implement either one of these, but at least at the moment, they're somewhat timely and don't have convenient solutions. I think syntax is basically a solved problem, and don't really care about it.


I would build around a constrained high-level concept. Something like actors, perhaps. It would be assembly-like, command-oriented.

I would then build a visualization environment which shows what the actors are doing, possibly step-by-step. It would show them moving values between registers and buffers. It would show messages passing between actors. It would show new actors create and finished actors die and errors get dumped into the canvas.

Your concept as an end-user would be that you're programming these automatons. You could clearly visualize what's occurring and debug control-flow or algorithmic issues by stepping through their movements.

Maybe not a great idea in the long run, but if it was a weekend lang jam, that's what I'd do.



> What kind of language ideas do you folks have?

1. That would be telling.

2. Apparently there's going to be a theme? So that might make whatever wacky language I've been mulling a bad fit. I'm rather curious about how that will play out


Another "crazy" idea: I'd like the compiler to reformat code to match strict style conventions.

Why? Sometimes there's no "right" style. Sometimes novices need training wheels. In the long run, style doesn't matter, but in big projects code is easier to read when style is consistent. It's also easier to onboard when code follows industry-standard styles.

Would it be better when a language gently pushes everyone to the same style?


This has been a part of Go culture from the beginning. You avoid style conflicts because you just run everything through 'gofmt' before checking it in.

It's slowly catching on in other languages, eg. Clang now has clang-format and IntelliJ can auto-reformat your code according to a rule config you set before each check-in.


> code is easier to read when...

This is almost always a subjective opinion. I've worked on many projects where people reformat large portions of the codebase to make it "easier to read", and in the end they waste a bunch of time, make using `git blame` a pain, and subjectively either make no difference in code readability or make the code harder for half the team to read.

> I'd like the compiler to reformat code...

Why is this the compiler's job? Most people aren't reading code after it's been compiled. In most of the languages I've worked with, this is handled by a formatter + styleguide/config that either runs in the IDE on save or on a git hook, or both.


> Most people aren't reading code after it's been compiled

Ever start a new job with a bulk of code you didn't write? Worse, ever take over code written by novices who ignore common conventions?

The whole point is to see if having a standard style is easier in the long run.


> Ever start a new job with a bulk of code you didn't write? Worse, ever take over code written by novices who ignore common conventions?

Yes, I have seen lots of this in scientific computing. However, things like too many/not enough spaces, line widths, etc, are never a huge hindrance for me.

What does make code "hard to read" are things like bad and inconsistent variable/function/class names, bad inheritance practices, bad file organization, and not adhering to common language idioms. That stuff is rarely, if ever, caught by linters.

Great talk by Raymond Hettinger about this: https://www.youtube.com/watch?v=wf-BqAjZb8M


Still, why would you be reading compiled code?


That’s an idea I’ve been playing around with for my personal language. If I do decide to release it, I have a feeling there will be common style rules that will be required for any ‘library’ code (it would be like if C++ files had to meet google style guides to be posted to some central collection repo). I think the compiler could implement the conversion, but it could require a programmer supplied ‘translation’ file (like including Unicode to ASCII substitutions, or required macro unrolling to some language core set of language features). That would allow for anyone to keep there code formatted in the most logical way for them, but would allow for a common format for distribution. I would also think going the other way, from public style to private should be a possibility.

I don’t know if I will ever actually put my personal language out for release, but I’ve had two acquaintances repeatedly tell me (along with several HN commenters) that I really should at least release some blog posts about the language. Currently it is just my daily driver for work related programming and it compiles to C++, C, or JavaScript.


It's definitely been done before, Zig comes to mind immediately but I'm pretty sure it wasn't the first place I've seen it done first party.


I have that at work with javascript/typescript and eslint.


In other domains, that is called authoritarianism.


Gotta wait to see what the theme is, but I’m going to be building a 2D tree language (https://jtree.treenotation.org/designer/). Probably with a spreadsheet IDE like https://youtu.be/0l2QWH-iV3k


Python can do it with similar compactness:

    [(("fizz" if not i%3 else "")+("buzz" if not i%5 else "")) or i for i in range(1, 101)]
Gotta love list comprehensions.


The usage of `unless` strikes me as unmaintainable and "smart", not in a good way tbh. The types are all other the places, too. What is it even doing exactly? Appending a string to a null value..?


> Appending a string to a null value..?

Appending a string to an undefined value, via array addition, which turns them into strings but doesn't turn null into 'null' or undefined into 'undefined'. It's a bit of a hack.

`'a' + undefined === 'aundefined'`

`[ 'a' ] + [ undefined ] === 'a'`

Also the empty string '' evaluates to false.

`[ undefined ] + [ undefined ] == ''`

This is wat - https://archive.org/details/wat_destroyallsoftware


> This is wat

For a second, I thought it was JavaScript


That's what I said :)


As is it's just doing the first 100 iterations of fizzbuzz as either a number or text depending on the fizzbuzz step but not printing or saving anything. If you wrap it in a console.log() it'll create a 100 element array of objects which are the answer for fizzbuzz[index+1] and log them.

All that said I've never been a fan of dynamically typed languages myself. Fast to make things in but difficult to reason about later.


    // Generated by CoffeeScript 2.4.1
     (function() {
      var i, j;
     
      for (i = j = 1; j <= 100; i = ++j) {
        [!(i % 3) ? 'fizz' : void 0] + [!(i % 5) ? 'buzz' : void 0] || i;
      }
    
    }).call(this);


I think there's a lot of opportunity in the 2d domain, in both directions.

You might have a picture represent a program, like Piet [1]. Or you might have a language specifically designed to show certain types of pictures, like TeX or Processing or more narrowly, PlantUML.

Another area to explore is the distributed space. Perhaps a language with Kubernetes capabilities as first-class objects.

1. https://www.dangermouse.net/esoteric/piet.html


I'd like a language that makes it easy to mock and dependency inject without doing anything special.

IE, if in my business logic, I write "new Foo()", I'd like to be able to write a unit test where I say something like, "whenever I wrote 'new Foo()' swap in this mock object instead."

Or, in a module that's an entry-point, I'd like to say, "whenever I wrote 'new Foo()' swap in this subclass instead."

I've spent so much time refactoring to make code mocking and dependency-injectable.


Clojure does something very much like this using the with-redefs [0] form.

If you have a function like:

    (defn send-mail [receiver-ids title text]
      ...)
You can mock it out in your tests like:

    (with-redefs [send-mail (fn [r-ids title text]
                              (println r-ids title text))]
      ;; send-mail will just print
      ...)
[0] https://clojuredocs.org/clojure.core/with-redefs


Python already has everything you need. The most important aspect is that you don't have to change your code to use it.

https://docs.python.org/3/library/unittest.mock.html


Take a look at newspeak (smalltalk-ish, it avoids global references enforcing each component to receive dependencies as arguments, i.e. it has DI builtin) and NesC (C with components, which are composed at build time).

Neither is popular but they're pretty interesting!


I'm dubious about the order of operations here.


JavaScript was made in a weekend, wasn't it?


Even more motivation to give it a shot! Certainly you can’t do any worse ;)


But took 25 years to make ES6 ;)


10 days, IIRC


A weekend is 10 days, right? (I'll show myself out...)


0b10 is the best 10



Ten days, to be more exact.


Reminds me of the PLTGames from years ago. It didn't last that long.

https://twitter.com/pltgames


I could use something similar to jq, but for XML.

Or something like a lovechild of sed, awk, seq, and cut, with regexp match groups and string interpolation.


> I could use something similar to jq, but for XML.

That would be XPath/XQuery

I have spent 15 years implementing that in Xidel

>Or something like a lovechild of sed, awk, seq, and cut, with regexp match groups and string interpolation.

Sounds like Perl


The argument that you can express something in an existing language doesn't mean you shouldn't try to create something more expressive.


They are making new languages far too often

It started with XPath, then came XPath 2, then XQuery 1, then XPath 3.0 and XQuery 3.0, and then XPath 3.1 and XQuery 3.1.

Six new languages each more expressive than previous one. That is why it takes forever to implement them. And now people are working on XPath 4.0 and XQuery 4.0.

And Perl got Perl 6 and Perl 7 and Raku


Mayby they should work on a new language with a new name instead of stuffing features in existing languages? And let users decide what they like better?

And this is a post about creating lots of new languages. This doesn't have the goal of having languages that run in production, it has the goal of exploring the solution space.


Sounds like a really good use of Common Lisp and Esrap.


Would it be cheating to use Racket?

https://racket-lang.org/


> You can code in any programming language you'd like to create your project, so long as the language is part of the Debian/Ubuntu or Arch package repo (or one of the language-specific repos, like Rust's cargo).

So seems like it should be fine! Go ahead and make a racket :)


This looks cool. My most recent attempt at language creation was a bit esoteric, for code golf [0]

[0] https://github.com/Slord6/Spice


Does anyone have a good resource on writing compiled languages? I'd like to join this Jam but I don't know where to start.


Maybe I'm not reading correctly, but it seems the project is about making a language compiler and not a language itself?


From the site:

> You can build an interpreter or a compiler, so long as it can run or build examples of code in the programming language you create.

No you create a language, but it needs to be able to run. So need to either build a compiler or interpreter to make the language do things.


What’s the difference?


When you think "language", try not to let your mind jump straight to "compiler" or "interpreter". Think more, "specification". If I asked you to write me a C compiler and you delivered on that, I'm not left with a new language. You didn't design a new language, only the compiler for an already specified language.

Designing a language is more about defining a specification. Then 10 different compiler engineers may pick that spec up and implement 10 different implementations of the same language, for example.


It would be very fun to participate in this! How much experience would one need to be useful as part of a team though?


Looks like a lot of fun! I hope I get to see what people will come up with.


And that's how we get another javascript...


Nice! I guess I've got to face ASTs again:)


Write a Forth. ;)


Or a Lisp and let the user face them instead.


Forth seems easier to do over a weekend, though.


There's even a recipe posted in a couple of comments here:

https://news.ycombinator.com/item?id=13082825

I followed that guide to implement a simple FORTH-like system in golang:

https://github.com/skx/foth

As I was following the implementation recipe I broke it down into "educational steps". Although it isn't a true FORTH it is pretty easy to understand and useful enough to embed inside other applications.

Now and again I consider doing it again, but using a real return-stack to remove the hardcoded control-flow words from the interpreter, but I never quite find the time.


Definitely easier; it has no structure at all, it's all in your head/stack.

https://github.com/codr7/fipl


More like parser jam


I don't see how?

> You can build an interpreter or a compiler, so long as it can run or build examples of code in the programming language you create.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: