Well, I was wondering about bias in the model, so I entered "a president" as the prompt. Looks like it has a bias alright, but it's even more specific than I expected...
Schnell is definitely worse in quality, although still impressive (it gets text right). Dev is the really good one that arguably outperforms the new Midjourney 6.1
What's the difference between pro and dev? Is the pro one also 12B parameters? Are the example images on the site (the patagonia guy, lego and the beach potato) generated with dev or pro?
I think they are mainly -dev and -schnell. Both models are 12B. -pro is the most powerful and raw, -dev is guidance distilled version of it and -schnell is step distilled version (where you can get pretty good results with 2-8 steps).
something about pro must be better than dev or it wouldn't be made API-only, but what exactly, how does guidance distilling affect pro it and what quality remains in dev?
I think they may have turned on the gating some time after this was submitted to HackerNews. Earlier this morning I definitely ran the model several times without signing in at all (not via GitHub, not via anything). But now it says "Sign in to run".
Just to make it clear, the actual Stable Diffusion 3 (8B variant, the one they are exclusively serving as an API) that they announced in march is still closed source and there are no indications on when or if it will be released.
~38 TOPS at fp16 is amazing, if the quoted number if fp16 (ANE is fp16 according to this [1] but that honestly seems like a bad choice when people are going smaller and smaller even at the higher level datacenter cards so not sure why apple would use it instead of fp8 natively)
backlight is now the main bottleneck for consumption heavy uses. I wonder what are the main advancements that are happening there to optimize the wattage.
If the usecases involve
working on dark terminals all day or watching movies with dark scenes or if the general theme is dark, may be the new oled display will help reduce the display power consumption too.
AMD gpus have "Adaptive Backlight Management" which reduces your screen's backlight but then tweaks the colors to compensate. For example, my laptop's backlight is set at 33% but with abm it reduces my backlight to 8%. Personally I don't even notice it is on / my screen seems just as bright as before, but when I first enabled it I did notice some slight difference in colors so its probably not suitable for designers/artists. I'd 100% recommend it for coders though.
Strangely, Apple seems to be doing the opposite for some reason (Color accuracy?), as dimming the display doesn't seem to reduce the backlight as much, and they're using a combination of software dimming, even at "max" brightness.
Evidence can be seen when opening up iOS apps, which seem to glitch out and reveals the brighter backlight [1]. Notice how #FFFFFF white isn't the same brightness as the white in the iOS app.
The max brightness of the desktop is gonna be lower than the actual max brightness of the panel, because the panel needs to support HDR content. That brightness would be too much for most cases
This was a photo of my MBA 15" which doesn't have an HDR capable screen afaik. Additionally, this artifacting happens at all brightness levels, including the lowest.
It also just doesn't seem ideal that some apps (iOS) appear much brighter than the rest of the system. HDR support in macOS is a complete mess, although I'm not sure if Windows is any better.
Dang, yeah, this is the opposite of what I had in mind
I was thinking, like, a couple hundred dollar Kindle the size of a big iPad I can plug into a laptop for text-editing out and about. Hell, for my purposes I'd love an integrated keyboard.
Basically a second, super-lightweight laptop form-factor I can just plug into my chonky Macbook Pro and set on top of it in high-light environments when all I need to do is edit text.
Honestly not a compelling business case now that I write it out, but I just wanna code under a tree lol
I think we're getting pretty close to this. The Remarkable 2 tablet is $300, but can't take video input and software support for non-notetaking is near non-existent. There's even a keyboard available. Boox and Hisense are also making e-ink tablets/phones for reasonable prices.
If that existed as a drop-in screen replacement on the framework laptop and with a high refresh rate color gallery 3 panel, then I'd buy it at that price point in a heart beat.
I can't replace my desktop monitor with eink because I occasionally play video games. I can't use a 2nd monitor because I live in a small apartment.
I can't replace my laptop screen with greyscale because I need syntax highlighting for programming.
Maybe the $100 nano-texture screen will give you the visibility you want. Not the low power of a epaper screen though.
Hmm, emacs on an epaper screen might be great if it had all the display update optimization and "slow modem mode" that Emacs had back in the TECO days. (The SUPDUP network protocol even implemented that at the client end and interacted with Emacs directly!)
QD-OLED is an engineering improvement, i.e. combining existing researched technology to improve the result product. I wasn't able to find a good source on what exactly it improves in efficiency, but it's not a fundamental improvement in OLED electrical→optical energy conversion (if my understanding is correct.)
In general, OLED screens seem to have an efficiency around 20≈30%. Some research departments seem to be trying to bump that up [https://www.nature.com/articles/s41467-018-05671-x] which I'd be more hopeful on…
…but, honestly, at some point you just hit the limits of physics. It seems internal scattering is already a major problem; maybe someone can invent pixel-sized microlasers and that'd help? More than 50-60% seems like a pipe dream at this point…
…unless we can change to a technology that fundamentally doesn't emit light, i.e. e-paper and the likes. Or just LCD displays without a backlight, using ambient light instead.
Is the iPad Pro not yet on OLED? All of Samsung's flagship tablets have OLED screens for well over a decade now. It eliminates the need for backlighting, has superior contrast and pleasant to ise in low-light conditions.
The iPad that came out today finally made the switch. iPhones made the switch around 2016. It does seem odd how long it took for the iPad to switch, but Samsung definitely switched too early: my Galaxy Tab 2 suffered from screen burn in that I was never able to recover from.
I'm not sure how OLED and backlit LCD compare power-wise exactly, but OLED screens still need to put off a lot of light, they just do it directly instead of with a backlight.
we considered which one adheres to the prompt more, which one has overall best aesthetics etc but ended up with a simple which one is overall better type question. it is easier for people to vote and decide one and still applicable as preference data at a larger scope (trading volume for simplicity).
the dataset is open source and we plan to train an aesthetics picker on it but obviously have to do proper evals (with at least 1M data) to come to a reasonable conclusion.
As feedback I can tell you it's not easy at all for me to decide which one is better. Maybe if the prompts would include the art style I would be able to clearly identify the better ones, but they don't. Style is where I see most of the differences.
comparing it to lmsys chatbot arena, what sort of an option would you expect? the prompts essentially come from public HF datasets like parti prompts where they test a bunch of stuff (prompt adherence, attention mapping [something in front of something else etc], aesthetics, photo-realism, etc.) so it is hard to ask about each category.
The question is ok but I need to have a clear input for me to decide which one is better. For example: A serene forest night, a lamp-lit path leads to a cozy wooden house. It comes up with a very detailed almost photorealistic image of the scene, while also bringing up a very well painted one. What do I choose? The input didn't mention anything about the style so it's very hard for me to pick a winner unless (like I said) I'm incredibly subjective.
i see about that case, and yeah you are right. we probably need realistic/artistic tags as you mentioned. thanks for the example! we'll probably include something like that in the next release and group models by ELO on different categories (can be considered like language analogue)
How transferable the open source experience from major projects (like being a core developer of the python language itself) in terms of O1 to provide the criteria of "reviewing other's work"?
Pretty neat implementation. In general, for these sort of exercises (and even if the intention is to go to prod with custom kernels) I lean towards Triton to write the kernels themselves. It is much more easier to integrate to the tool chain, and allows a level of abstraction that doesn't affect performance even a little bit while providing useful constructs.
It was written with cutlass? No wonder Peter Kim found it valuable and worthwhile to de-obfuscate. Adopting a new programming language invented by OpenAI doesn't sound like a much better alternative. I'd be shocked if either of them were able to build code for AMD GPUs, where it's easy to adapt CUDA code, but not if it's buried in tens of thousands of lines of frameworks. I like open source code to have clarity so I can optimize it for my own production environment myself. When people distribute code they've productionized for themselves, it squeezes out all the alpha and informational value. Just because something's open source doesn't mean it's open source. I think people mostly do it to lick the cookie without giving much away.
zero cost abstractions exist. doesn't mean all abstractions are zero-cost. or being zero-cost somehow invalidates their abstractness/genericness. but maybe we differ on the definition of abstractions.
So does perpetual motion :shrug: but my point is Triton is not an abstraction in the least. Source: 1) I spent 6 months investigating targeting other backends 2) Phil himself said he doesn't care to support other backends https://github.com/openai/triton/pull/1797#issuecomment-1730...
It's amazing how heavily provided hn is. I have a response here that's been deleted that is like 15 words, including a link to source that corroborates my claim but that response contains a transcribed emoji and so it's been deleted by dang or whomever. Lol super rich environment for discourse we've got going here.
Latency is really important and that is honestly why we re-wrote most of this stuck ourselves but the project with the gurantee of 25ms< looks interesting. I wish there was an "instant" mode where enough workers are available it could just do direct placement.
To be clear, the 25ms isn't a guarantee. We have a load testing CLI [1] and the secondary steps on multi-step workflows are in the range of 25ms, while the first steps are in the range of 50ms, so that's what I'm referencing.
There's still a lot of work to do for optimization though, particularly to improve the polling interval if there aren't workers available to run the task. Some people might expect to set a max concurrency limit of 1 on each worker and have each subsequent workflow take 50ms to start, which isn't be the case at the moment.
(available without sign-in) FLUX.1 [schnell] (Apache 2.0, open weights, step distilled): https://fal.ai/models/fal-ai/flux/schnell
(requires sign-in) FLUX.1 [dev] (non-commercial, open weights, guidance distilled): https://fal.ai/models/fal-ai/flux/dev
FLUX.1 [pro] (closed source [only available thru APIs], SOTA, raw): https://fal.ai/models/fal-ai/flux-pro