> This is a partly solved problem right now. I'd agree with the partly. I have y...

mh- · on Nov 10, 2024

I'm far (far) from an expert in this field, but when you think about how audio is quantized into digital form, I'm really not sure how one solves this with the current approaches.

That is: frequencies from one instrument will virtually always overlap with another one (including vocals), especially considering harmonics.

Any kind of separation will require some pretty sophisticated "reconstruction" it seems to me, because the operation is inherently destructive. And then the problem becomes one of how faithful the "reproduction" is.

This feels pretty similar to the inpainting/outpainting stuff being done in generative image editing (a la Photoshop) nowadays, but I don't think anywhere near the investment is being made in this field.

Very interested to hear anyone with expertise weigh in!

nineteen999 · on Nov 12, 2024

I won't say expertise, but what I've done recently:

1) used PixBim AI to extract "stems" (drums, bass, piano, all guitars, vocals). Obviously a lossless source like FLAC works better than MP3 here

2) imported the stems to ProTools.

3) from there, I will usually re-record the bass, guitars, pianos and vocals myself. Occassionally the drums as well.

This is a pretty good way I found to record covers of tracks at home, re-using the original drums if I want to, keeping the tempo of the original track intact etc. I can embellish/replace/modify/simplify parts that I re-record obviously.

It's a bit like drawing using tracing paper, you're creating a copy to the best of your ability, but you have a guide underneath to help you with placement.

Earw0rm · on Nov 15, 2024

It's not really digital quantisation that's the problem, but everything else that happens during mixing - which is a much more complicated process, especially for pop/rock/electronic etc., than just "sum all the signals together".

There's a bunch of other stuff that happens during and after summing which makes it much harder to reliably 100% reverse that process.

mh- · on Nov 15, 2024

I didn't mean to say that quantization was the problem, just that you're basically trying to pick apart a "pixel" (to continue my image-based analogy) that is a composite of multiple sounds (or partially-transparent image layers).

I was sincere when I said:

> I'm really not sure how one solves this with the current approaches.

I was hoping someone would come along and say it is, in fact, possible. :)