The downloads are (presumably) already compressed.
And there are strong ties between LLMs and compression. LLMs work by predicting the next token. The best compression algorithms work by predicting the next token and encoding the difference between the predicted token and the actual token in a space-efficient way. So in a sense, a LLM trained on Wikipedia is kind of a compressed version of Wikipedia.
Yes, they're uncompressed. For reference, `enwiki-20250620-pages-articles-multistream.xml.bz2` is 25,176,364,573 bytes; you could get that lower with better compression. You can do partial reads from multistream bz2, though, which is handy.
Kiwix (what the author used) uses "zim" files, which are compressed. I don't know where the difference come from, but Kiwix is a website image, which may include some things the raw Wikipedia dump doesn't.
And 57 GB to 25 GB would be pretty bad compression. You can expect a compression ratio of at least 3 on natural English text.
Can someone explain the impact of widely-distributed fiber on DDOS attacks? As I understand it, if 1-2 of these nodes get compromised, then add in the 10-100x magnification via DNS, you're looking at 10-100gigabits of bandwidth off only two nodes? Compared to the recently published high of 300, this seems disproportionally high.
Pedantically speaking, it's difficult - in fact darned near impossible - for a single residential type host (ie, likely a laptop running windows) to utilize anywhere near the full 1Gb capacity offered.
That said, a botnet host running on this network would be substantially more capable of causing damage than a host on an 8Mbps upstream.
Hopefully google has plans or already has implemented some ability to mitigate that type of problem...
I'm curious what the size of the upstream pipe is? Is it 1G up/down? Even without DNS amplification, it only takes infecting a couple hundred machines on fiber to have a massive botnet. Hopefully ISPs will better handle DDoS attacks, because likely a few host machines could take down many sites.
(Author here.) I've long wanted to write a follow-up piece addressing exactly these questions, but haven't gotten around to it.
First, find a PM to talk to, especially one you admire who's made the switch. Everybody's situation is different, and having someone to mentor you through it is invaluable.
Second, make sure you have the right motivations. A friend of mine says "run to product management, not from engineering." If you want to switch because your current manager is a jerk, you hate the project you're working on, or (gasp!) your PM is an idiot, you might be doing it for the wrong reasons.
You could test it out with a small project at first. Stepping up and saying "I'd love to take on more of a PM role on this feature/bug/release" will often be positively received by a healthy organization.
Startups are ideal places to make the switch (that's where I did it.) Everyone is used to wearing many hats and the "all hands on deck" attitude is much more welcoming of someone raising their hand and volunteering to be a PM. Smart, bigger companies (like Google or Facebook) often have formal programs for moving into PM. At Google, for example, we even have a six-month rotation program that lets you try it on for size.
And if you're convinced you want to do it and your current company discourages you, find a new one.
I first wanted to say thanks for putting up this essay on product management. I don't think quality pieces on the subject are easy to come by, and I think you give a lot of valuable and tangible insight on it.
Second, it looks like you favor past experience as a PM --- but what about college grads? Do you think it's usually a better idea for someone just beginning their career in the "real world" to start out as an engineer first? Maybe this sounds obvious as I've stated it, but I have a few friends that started their careers as PMs and have been successful. Interviewing a college grad might be a little trickier, no?
Lastly, I definitely agree that a big part of being a good PM, or maybe a good manager in general, is making good, or even just reasonable, decisions on a regular basis. There are seemingly endless small choices that need to be made that add up to a lot, and many of these choices (I've found) aren't going to be dead obvious. A decision, however, needs to be made and it's important that to make a decent one without dedicating too much time to it based off of your understanding of the goals of the project, your experience, and your gut instinct.
I'd love to read such a follow-up post. I'm doing exactly what you describe, and for the right reason, with this essay as something of a guide, so thanks for the help and validation!
"In my experience, relatively few students appreciate how much they're learning in my course while they're in it."
How true! I hated CS61A when I was in it, and I thought nothing was practical and everything was a trivial example. Sorry Brian! I failed to grasp the depth of all the 'trivial' examples. I never appreciated the complexities of the class until I started being a TA for it, and I never truly loved the class until I lectured it.
New York, NY. 1010data -- Full Stack Engineers / Front End Developers / Security Devs / Analysts
1010data is a database specializing in big data analytics. We build a trillion-row spreadsheet, served over the browser, capable of analyzing that amount of data in seconds, not hours.
We're in need of front end devs and full stack engineers working on the front end to develop a better product to help customers analyze their data, visualize it, and make sense of it all. Other teams around the company are hiring as well, so if you're an interested, please see: http://www.1010data.com/about-us/careers. I work on the UI team now, feel free to email me at george@1010data.com if you've got any questions.
As for me, I've been working here for two years now. My favorite thing about the company is that it's a culture that encourages you to figure out the best place to use and develop your talents and to wear different hats. A few of the things I've worked on below:
* Building a grid capable of serving to a browser a virtually infinite amount of data.
* Building tools to help visualize that data.
* Learning to manage client relationships.
* Figuring out ways to classify and understand customers based on their shopping patterns and habits.
* Attending trade shows and conferences to demo our product and generate leads.
* Enjoying Scotch-o-clock with teammates and friends.
It's a fun and awesome place to work. We sponsor H1B, and interns in their Junior+ year are welcome to apply.