Very cool to see you work! Early in my PhD I did some work with GNN accelerators on FPGAs (which I think later ended up in some form as a colab with some CERN or Fermilab folks) and have chatted a bit in the past with the FastML, HLS4ML, and HEP folks.
I have since pivoted a lot of my PhD work (still related the HLS and EDA). But I wonder what is the current main limitation/challenges of building these trigger systems in hardware today. For example, in my mind it seems like the EDA and tooling can be a big limitation such as reliance on commercial HLS tools which can be buggy, hard to use, and hard to debug. From experience, this makes it harder to build different optimized architectures in hardware or build co-design frameworks without having high HLS expertise or putting in a lot of extra engineering/tooling effort. Also tool runtimes make the design and debug cycle longer, especially if you are trying to DSE on post-implementation metrics since you bring in implementation tools as well.
But I might be way off here and the real challenges are with other aspects beyond the tools.
Vitis HLS is garbage. Catapult might be better. But fundamentally synthesizing an FSM from imperative code is just an ill-posed problem. There is a reason that it is always said that no one in industry uses HLS - not because it's true (it is) but because HLS only works for "toy" designs.
Thank you for the comment, and the questions are great.
The problems you described here are pretty much precise. In the past, and mostly now, we are replying on the commercial Vivado/Vitis HLS toolchains for the deployment of these networks through hls4ml, a template based compiler of the quantized models to the HLS projects. For this class of fully parallel (II=1) models, the tools usually give fine results, but indeed can be wrong sometimes (great recent example from our college's post: https://sioni.web.cern.ch/2026/03/24/debugging-fastml).
Tool runtime is another issue. For the models discussed in this post, they are not larger than ~30K LUTs, and with the low complexity (~dense only), synthesis time was fine. But for larger ones, like the ones here (https://arxiv.org/abs/2510.24784), it can take up to... a week for one HLS compilation while eating ~80G ram. Can get worse if time multiplex is in place things like #pragma HLS dataflow is used...
Personally, I do not usually DSE on post implementation/HLS results, since for the unrolled logic blocks, ok-ish performance model can be derived obtained w/o doing the synthesis (via ebops defined in HGQ, or better if using heuristics based on the rough cost of low level operations the design will translate to). But there are works doing DSE based on post HLS results (https://arxiv.org/pdf/2502.05850, real vitis synth), or using some other surrogate to get over the problem (e.g., https://arxiv.org/abs/2501.05515, using bops). High-level surrogate models are also being developed (https://arxiv.org/pdf/2511.05615).
We are also trying to get alternatives to the commercial HLS toolflows. For instance, I'm working on the direct to RTL codegen (da4ml) way (optionally via XLS), and the current work-in-progress is at https://github.com/calad0i/da4ml/tree/dev, if you are interested: all combinational or fully pipelined things are supported with reasonable performance model (~10% err in LUTs and ~20% err in latency), but multicycle, or stateful design generations still need a lot of manual intervention (not automated), which are to be implemented in the future. Since at some stages of the trigger chain, the system is/will be time-multiplexed, such functionality will be needed in the future.
Other works on this direction includes adding new backends to hls4ml that are oos (e.g., openhls/XLS), or other alternatives like chisel4ml (https://github.com/cs-jsi/chisel4ml). Hopefully, we will be no-longer reliant on the commercial tools till RTL for the incoming upgrade. That being said, Vivado still appears to be the only choice for the post RTL stages for us.
Prof. Cunxi Yu and his students at UMD is working on this exact topic and published a paper on agents for improving SAT solvers [1].
I believe they are extending this idea to EDA / chip design tools and algorithms which are also computationally challenging to solve. They have an accepted paper on this for logic synthesis which will come out soon.
I am not the original author, but I posted this since it mirrors some experiences I have had in my PhD so far submitting papers. This kind of tweaking in paper and writing even happens when writing the first draft or sometimes even in the conception of the research idea or how to go about the implementation and experimentation.
There is a half-joke in our lab that the more times a paper is rejected, the bigger or more praised it will be once it's accepted. This simply alludes to the fact that many times reviewers can be bothered with seeing value in certain ideas or topics in a field unless it is "novel" or the paper is written in a way that is geared towards them, rather than being relegated to "just engineering effort" (this is my biased experience). However, tailoring and submitting certain ideas/papers to venues that value the specific work is the best way I have found to work around this (but even then it takes some time to really understand which conferences value which style of work, even if it appears they value it).
I do think there is some saving grace in the section the author writes about "The Science Thing Was Improved," implying that these changes in the paper make the paper better and easier to read. I do agree very much with this; many times, people have bad figures, poor tables or charts, bad captions, etc., that make things harder to understand or outright misleading. But I only agree with the author to a certain extent. Rather, I think that there should also be changes made on the other side, the side of the reviewer or venue, to provide high-quality reviews and assessments of papers. But I think this is a bit outside the scope of what the author talks about in their post.
There are other posts in the series of the author. He was the co-author of BERT! Yet his paper was scoffed at as "just engineering". He knows what he is talking about.
Omg, I was not a BERT coauthor! But thank you so much for writing this, I had no idea that other post could have accidentally implied this. I will revise that section.
I have a running joke with my friends. "If your paper is not rejected once, what kind of science are you doing". Either you spent too much time on a project or aiming low.
To offer a non-AI alternative but a similar game concept, I highly recommend TimeGuesser: https://timeguessr.com/.
You are shown historical photos (from the past 100 years, even up to the past year) and need to guess the location on a map and the year of the image. Extremely fun and varied gameplay because of the varied events the photos capture (some mundane while others more recognizable).
Timeguessr is neat. The advantage there is that, unlike Time Portal, I don't feel pain and depression by the countless anachronisms made up by AI that people will think bear some semblance to reality (I'm a historian).
While watching, I started playing a fun game where I try to guess the location of the video, GeoGuessr style. Very interesting when it comes to the odd handheld angles and low quality of some of the video clips. Would recommend.
Am I lost or are the cores not open source? I cannot find any Verilog, VHDL, or bundled IP blocks to downloaded. Very strange for what on the surface appears to be a hobby FPGA project.
This heavily overlaps with my current research focus for my Ph.D., so I wanted to provide some additional perspective to the article. I have worked with Vitis HLS and other HLS tools in the past to build deep learning hardware accelerators. Currently, I am exploring deep learning for design automation and using large language models (LLMs) for hardware design, including leveraging LLMs to write HLS code. I can also offer some insight from the academic perspective.
First, I agree that the bar for HLS tools is relatively low, and they are not as good as they could be. Admittedly, there has been significant progress in the academic community to develop open-source HLS tools and integrations with existing tools like Vitis HLS to improve the HLS development workflow. Unfortunately, substantial changes are largely in the hands of companies like Xilinx, Intel, Siemens, Microchip, MathWorks (yes, even Matlab has an HLS tool), and others that produce the "big-name" HLS tools. That said, academia has not given up, and there is considerable ongoing HLS tooling research with collaborations between academia and industry. I hope that one day, some lab will say "enough is enough" and create a open-source, modular HLS compiler in Rust that is easy to extend and contribute to—but that is my personal pipe dream. However, projects like BambuHLS, Dynamatic, MLIR+CIRCT, and XLS (if Google would release more of their hardware design research and tooling) give me some hope.
When it comes to actually using HLS to build hardware designs, I usually suggest it as a first-pass solution to quickly prototype designs for accelerating domain-specific applications. It provides a prototype that is often much faster or more power-efficient than a CPU or GPU solution, which you can implement on an FPGA as proof that a new architectural change has an advantage in a given domain (genomics, high-energy physics, etc.). In this context, it is a great tool for academic researchers. I agree that companies producing cutting-edge chips are probably not using HLS for the majority of their designs. Still, HLS has its niche in FPGA and ASIC design (with Siemens's Catapult being a popular option for ASIC flows). However, the gap between an initial, naive HLS design implementation and one refined by someone with expert HLS knowledge is enormous. This gap is why many of us in academia view the claim that "HLS allows software developers to do hardware development" as somewhat moot (albeit still debatable—there is ongoing work on new DSLs and abstractions for HLS tooling which are quite slick and promising). Because of this gap, unless you have team members or grad students familiar with optimizing and rewriting designs to fully exploit HLS benefits while avoiding the tools' quirks and bugs, you won't see substantial performance gains. Al that to say, I don't think it is fair to comply write off HLS as a lost cause or not sucesfull.
Regarding LLMs for Verilog generation and verification, there's an important point missing from the article that I've been considering since around 2020 when the LLM-for-chip-design trend began. A significant divide exists between the capabilities of commercial companies and academia/individuals in leveraging LLMs for hardware design. For example, Nvidia released ChipNeMo, an LLM trained on their internal data, including HDL, tool scripts, and issue/project/QA tracking. This gives Nvidia a considerable advantage over smaller models trained in academia, which have much more limited data in terms of quantity, quality, and diversity. It's frustrating to see companies like Nvidia presenting their LLM research at academic conferences without contributing back meaningful technology or data to the community. While I understand they can't share customer data and must protect their business interests, these closed research efforts and closed collaborations they have with academic groups hinder broader progress and open research. This trend isn't unique to Nvidia; other companies follow similar practices.
On a more optimistic note, there are now strong efforts within the academic community to tackle these problems independently. These efforts include creating high-quality, diverse hardware design datasets for various LLM tasks and training models to perform better on a wider range of HLS-related tasks. As mentioned in the article, there is also exciting work connecting LLMs with the tools themselves, such as using tool feedback to correct design errors and moving towards even more complex and innovative workflows. These include in-the-loop verification, hierarchical generation, and ML-based performance estimation to enable rapid iteration on designs and debugging with a human in the loop. This is one area I'm actively working on, both at the HDL and HLS levels, so I admit my bias toward this direction.
For more references on the latest research in this area, check out the proceedings from the LLM-Aided Design Workshop (now evolving into a conference, ICLAD: https://iclad.ai/), as well as the MLCAD conference (https://mlcad.org/symposium/2024/). Established EDA conferences like DAC and ICCAD have also included sessions and tracks on these topics in recent years. All of this falls within the broader scope of generative AI, which remains a smaller subset of the larger ML4EDA and deep learning for chip design community. However, LLM-aided design research is beginning to break out into its own distinct field, covering a wider range of topics such as LLM-aided design for manufacturing, quantum computing, and biology—areas that the ICLAD conference aims to expand on in future years.
The main author of KANs did a tutorial session yesterday at MLCAD, an academic conference focused on the intersection of hardware / semiconductor design and ML / deep learning.
It was super fascinating and seems really good for what they advertise it for, gaining insight and interpret for physical systems (symbolic expressions, conserved quantities , symmetries). For science and mathematics this can be useful but for engineering this might not be the main priority of an ML / deep learning (to some extent).
There are still unknowns for leaning hard tasks and learning capacity over harder problems. Even choices in for things like the chosen basis function used for the KAN “activations” and what other architectures these layers can be plugged into with some gain is still unexplored. I think as people mess around with KANs we’ll get better answers to these questions.
Very cool to learn where my Linux file manager GUI app came from.
One point that is light on details in the wiki was what exactly their monetization plan was or if they even had one at all? It sounds like they wanted to integrate business/enterprise features and support, or perhaps features that plug into other internet services that they could possibly monetize. It sounds like the "network user experience" with "Eazel Online Storage" and "Software Catalog" were their initial monetization ideas.
I have since pivoted a lot of my PhD work (still related the HLS and EDA). But I wonder what is the current main limitation/challenges of building these trigger systems in hardware today. For example, in my mind it seems like the EDA and tooling can be a big limitation such as reliance on commercial HLS tools which can be buggy, hard to use, and hard to debug. From experience, this makes it harder to build different optimized architectures in hardware or build co-design frameworks without having high HLS expertise or putting in a lot of extra engineering/tooling effort. Also tool runtimes make the design and debug cycle longer, especially if you are trying to DSE on post-implementation metrics since you bring in implementation tools as well.
But I might be way off here and the real challenges are with other aspects beyond the tools.
reply