Hacker Newsnew | past | comments | ask | show | jobs | submit | solomatov's commentslogin

Does github copilot ToS allow this?


This is very interesting. This could allow custom harnesses to be used economically with Opus. Depending on the usage limits, this may be cheaper than their API.

I don't see why not. It's just using the Github Copilot API.

Do they have any sandbox out of the box?

I built Fence for this! https://github.com/Use-Tusk/fence

fence -t code -- opencode


I use bubblewrap. This ensures it only has access to the current working directory and its own configuration. No ability to commit or push (since it doesn't have access to ssh keys) or try to run aws commands (no access to awscli configuration) and so on. It can read anything from my .envrc, since it doesn't have access to direnv or the parent directory. You could lock down the network even further if you wanted to limit web searches.

  exec bwrap \
    --unshare-pid \
    --unshare-ipc \
    --unshare-uts \
    --share-net \
    --bind "$OPENCODE_ROOT" "$OPENCODE_ROOT" \
    --bind "$CURRENT_DIR" "$CURRENT_DIR" \
    --bind "$HOME/.config/opencode/" "$HOME/.config/opencode/" \
    --ro-bind /bin /bin \
    --ro-bind /etc /etc \
    --ro-bind /lib /lib \
    --ro-bind /lib64 /lib64 \
    --ro-bind /usr /usr \
    --bind /run/systemd /run/systemd \
    --tmpfs /tmp \
    --proc /proc \
    --dev /dev \
    --setenv OPENCODE_EXPERIMENTAL_LSP_TOOL true \
    --setenv EDITOR emacs \
    --setenv PATH "$OPENCODE_BINDIR:/usr/bin:/bin" \
    --setenv HOME "$HOME" \
    -- \
    "opencode" "$@"

nope - most folks wrap it in nono: https://nono.sh/docs/cli/clients/opencode

What do you mean by custom LMStudio license? Your employer requires reviews of proprietary EULAs or do you try to get a custom licensing deal from LMStudio?

Employer must review all EULA terms, yes. The license is a hand crafted proprietary license not a standard OS license.

Could you compare it to other similar software? E.g. Codex App, Conductor, and others? Why your app?


We connect to remote servers via SSH, are provider-agnostic, and open-source. e.g. in Codex you can only run OpenAI models and not Gemini, Amp, you name it. Give it a spin :)


How is quality for what Qwen 8B provides compares to proprietary models? Is it good enough for your use case?


For the mechanical stages (scanning, scoring, dedup) — indistinguishable from proprietary models. These are structured tasks: "score this post 1-10 against these criteria" or "extract these fields from this text." An 8B model handles that fine at 30 tok/s on consumer GPU.

For synthesis and judgment — no, it's not close. That's exactly why I route those stages to Claude. When you need the model to generate novel connections or strategic recommendations, the quality gap between 8B and frontier is real.

The key insight is that most pipeline stages don't need synthesis. They need pattern matching. And that's where the 95% cost savings live.


You aren't supposed to read code, but do you from time to time, just to evaluate what is going on?


No. But, I do ask questions (in $CODING_AGENT to always have a good mental model of everything that I’m working on though.


Is it essentially using LLMs as a compiler for your specs?

What do you do if the model isn't able to fulfill the spec? How do you troubleshoot what is going on?


Using models to go from spec to program is one use case, but it’s not the whole story. I’m not hand-writing specs; I use LLMs to iteratively develop the spec, the validation harness, and then the implementation. I’m hands-on with the agents, and hands-off with our workflow style we call Attractor

In practice, we try to close the loop with agents: plan -> generate -> run tests/validators -> fix -> repeat. What I mainly contribute is taste and deciding what to do next: what to build, what "done" means, and how to decompose the work so models can execute. With a strong definition of done and a good harness, the system can often converge with minimal human input. For debugging, we also have a system that ingests app logs plus agent traces (via CXDB).

The more reps you get, the better your intuition for where models work and where you need tighter specs. You also have to keep updating your priors with each new model release or harness change.

This might not have been a clear answer, but I am happy to keep clarifying as needed!


But what is the result of your work? What do you commit to the repo? What do you show to new folks when they join your team?


> What do you show to new folks when they join your team?

I think this is an interesting question because we have not fully figured out the best way to onboard people to our codebases. Each person is responsible for multiple codebases (yay microservices!), and no one else commits to a repository while they have dibs. We also have conventions for how agents write documentation around deployments and validations.

In theory, when a new person joins the team or is handed a repository, they can throw some tokens at the codebase, interrogate it, and ask questions about how things are implemented.

> But what is the result of your work?

The end result is a final, working codebase. The specs and sprint plans are also committed to the repository for posterity, so agents in a fresh session can see what work has been completed and the trajectory we are moving toward.


>But it does reduced by an order of magnitude the amount of money you need to spend on programming a solution that would work better

Could you share any data on this? Are there any case studies you could reference or at least personal experience? One order of magnitude is 10x improvement in cost, right?


I‘m not sure it’s a perfect example, but at least it’s a very realistic example from a company that really doesn’t have time and energy for hype or fluff:

We are currently sunsetting our use of Webflow for content management and hosting, and are replacing it with our own solution which Cursor & Claude Opus helped us build in around 10 days:

https://dx-tooling.org/sitebuilder/

https://github.com/dx-tooling/sitebuilder-webapp


Thanks for the link.

So, basically you made a replacement for webflow for your use case in 10 days, right?


That's fair to say, yes, with the important caveat that it isn't a 1:1 replacement of Webflow, which is exactly the point.


I’m not sure the world needed yet another CMS


It doesn't. The person is saying they built just the functionality they needed. Probably 25% of a CMS. That's the point.


Exactly.

And the big advantage for us is two things: Our content marketers now have a "Cursor-light" experience when creating landingpages, as this is a "text-to-landingpage" LLM-powered tool with a chat interface from their point of view; no fumbling around in the Webflow WYSIWYG interface anymore.

And from the software engineering department's point of view, the results of the work done by the content marketers are simply changes/PR in a git repository, which we can work on in the IDE of our choice — again, no fumbling around in the Webflow WYSIWYG interface anymore.


This is the benefit few understand properly. The storage layer is where you get a lot of benefits.


Is it open source? Do they disclose which framework they use for the GUI? Is it Electron or Tauri?


lol ofc not

looks like the same framework they used to build chatgpt desktop (electron)

edit - from another comment:

> Hi! Romain here, I work on Codex at OpenAI. We totally hear you. The team actually built the app in Electron specifically so we can support Windows and Linux as well. We shipped macOS first, but Windows is coming very soon. Appreciate you calling this out. Stay tuned!


>I think that really high quality code can be created via coding agents. Not in one prompt, but instead an orchestration of planning, implementing, validating, and reviewing.

Do you have any advice to share (or resources)? Have you experienced it yourself?



This all sounds interesting, but how effective are they? Did anyone has experience with any of them?


Yes, agentic search over vector embeddings. It can be very effective.


It's a very well known pattern. But what about others? There're a lot of very interesting stuff there.


Tool Use Steering via Prompting. I’ve seen that work well also, but I don’t know if I’d quite call it an architectural pattern.


I’m eager to tackle issues and PRs.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: