In practice RLHF isn't a survey of every living humans personal style or preferences though, its purpose is to make the model more useful in the eyes of the vendor, mainly by getting cheap third-world labor to nudge the model according to the vendors instructions. You don't get a subservient, sycophantic and "safe" chat interface out of unstructured data without putting your thumb on the scale, hard.