I see a lot of people here expressing doubts and confusion. I want to try to clear up some of that.
The key notion here is scale relativity. This is the reason why transformer models have been so, well, transformative. Bigger models are better than smaller models in a proportional manner. That is, they display scale relativity. Where is the limit? Where does this break down? We don't know. We haven't found the ceiling yet.
Another important notion is multimodality. When you can cross-reference your text-based knowledge of an apple with your image-based knowledge of an apple, you can use this information as leverage. Archimedes said, "Give me a place to stand on, and I'll move the Earth." It might seem ridiculous to say that the same is true when it comes to information, but it is. Informational leverage is powerful. Multimodality allows you to make very accurate predictions. The McGurk effect is a nice demonstration of how we do the exact same thing. We rely on visual information from a speaker's lips to predict what they're going to say. In other words: we make use of multimodal leverage.
The twin notions of scale relativity and multimodality explain what makes MUM possible. As some of you have pointed out, there's another aspect that we can't ignore: utility. Google will be using MUM to make money. Which means that they'll have to train MUM to make you spend it. But if you're uncomfortable with this idea, you are uncomfortable with capitalism in general. Which is fair, but I think it's important to keep it in mind.
As I'm sure they've already considered at Google, MUM can be used to revolutionize education. Imagine people all over the world having access to an expert instructor who can answer all of your questions. You might think this sounds like a dream, but we're a mere stone toss away from achieving it. That's the true power of scale relativity + multimodality: we can now make advanced systems that can communicate with us.
I appreciate the skeptics and naysayers here: you keep the rest of us sane. For that, I thank you. At the same time, I want you to open your eyes to the possibility that something very important and transformative is happening right now. You don't have to go full Kurzweil, but I think you would benefit from reflecting on the opportunities this new technology might offer.
Yeah, I'm a little surprised at all the negativity here considering the game-changing potential of this sort of research. The HN crowd has always been a pretty cynical bunch, but come on! A single model that can extract information from images, text, and webpages across multiple languages and generate answers in response to natural language questions written by a user? This feels like straight-up wizardry!
The key notion here is scale relativity. This is the reason why transformer models have been so, well, transformative. Bigger models are better than smaller models in a proportional manner. That is, they display scale relativity. Where is the limit? Where does this break down? We don't know. We haven't found the ceiling yet.
Another important notion is multimodality. When you can cross-reference your text-based knowledge of an apple with your image-based knowledge of an apple, you can use this information as leverage. Archimedes said, "Give me a place to stand on, and I'll move the Earth." It might seem ridiculous to say that the same is true when it comes to information, but it is. Informational leverage is powerful. Multimodality allows you to make very accurate predictions. The McGurk effect is a nice demonstration of how we do the exact same thing. We rely on visual information from a speaker's lips to predict what they're going to say. In other words: we make use of multimodal leverage.
The twin notions of scale relativity and multimodality explain what makes MUM possible. As some of you have pointed out, there's another aspect that we can't ignore: utility. Google will be using MUM to make money. Which means that they'll have to train MUM to make you spend it. But if you're uncomfortable with this idea, you are uncomfortable with capitalism in general. Which is fair, but I think it's important to keep it in mind.
As I'm sure they've already considered at Google, MUM can be used to revolutionize education. Imagine people all over the world having access to an expert instructor who can answer all of your questions. You might think this sounds like a dream, but we're a mere stone toss away from achieving it. That's the true power of scale relativity + multimodality: we can now make advanced systems that can communicate with us.
I appreciate the skeptics and naysayers here: you keep the rest of us sane. For that, I thank you. At the same time, I want you to open your eyes to the possibility that something very important and transformative is happening right now. You don't have to go full Kurzweil, but I think you would benefit from reflecting on the opportunities this new technology might offer.