Hacker Newsnew | past | comments | ask | show | jobs | submit | uuwp's commentslogin

The same way we feed the variable size sequence of characters or sound samples into this RNN. Instead of raw samples at the 16 kHz rate, we'll have one sequence of 1 sample per second, another sequence of 1 sample per 0.5 seconds and so on. We can go as far as 1 sample per 1/48000 sec, but I don't think it's practical (but this is what these music generators do).


What do you mean by “sample” when you say “sequence of 1 sample per second”?


We can think of a ML model that takes 1 second of sound as input and produces a vector of fixed length that describes this sound:

S[0..n] = the raw input, 48000 bytes per second of sound F[1][k..k+48000] -> [0..255], maps 1 second of sound to a "sound vector". F[2][k..k+96000] -> ..., same, but takes 2 seconds of sound as input

Now instead of the raw input S, we can use the sequences F[1], F[2], etc. Supposedly, F[10] would detect patterns that change every 10 seconds. It's common in soundtracks to have some background "mood" melody that changes a bit every 10-15 seconds, then a more loud and faster melody that changes every 5 seconds and so on, up to some very frequent patterns like F[0.2] that's used in drum'n'bass or electronic music in general.

This is how music is composed by people, I guess. Most of the electronic music can be decomposed into 5-6 patterns that repeat with almost mathematical precision. The artist only randomly changes params of each layer during the soundtrack, e.g. layer #3 with a period of 7 seconds slightly changes frequency for the next 20 seconds, etc.

Masterpieces have the same multilayered structure, except that those subpatterns are more complex.


We can think of a ML model that takes 1 second of sound as input and produces a vector of fixed length that describes this sound

You mean like an autoencoder?

Ok, assuming we have those sequences (F1, F2, F10, etc), how would you combine them to train the model?


I'm not an ML guy, so can't say if this is an autoencoder.

We can combine multiple sequences in any way we want. Obviously, we can come up with some nice looking "tower of lstms" where each level of that tower processes the corresponding F[i] sequence: sequence F1 goes to level T1 which is a bunch of LSTMs; then F2 and the output of T1 go to T2 and so on. The only thing that I think matters is (1) feed all these sequences to the model and (2) have enough weights in the model. And obviously a big GPU farm to run experiments.


Ok, but if we are using a hierarchical model like multilayer lstm, shouldn’t we expect it to learn to extract the relevant info at multiple time scales? I mean, shouldn’t the output of T1 already contain all the important info in F2? If not, what extra information do you hope to supply there via F2?


T1 indeed contains all the info needed, but T1 also has limited capacity and can't capture long patterns. T1 would need to have 100s of billions weights to capture minute long patterns. I think this idea is similar to the often used skip connections.


But the job of T1 is not to capture long term patterns, it’s to extract useful short scale features for T2 so that T2 could extract longer term patterns. T3 would hopefully extract even longer scale patterns from T2 output, and so on. That’s the point of having the lstm hierarchy, right?

Why would you try to manually duplicate this process by creating F1, F2, etc?

The idea of skip connections would be like feeding T1 output to T3, in addition to T2. Again, I’m not sure what useful info F sequences would supply in this scenario.


This sounds reasonable, but I think in practice the capacity of T1 won't be enough to capture long patterns and the F2 sequence is supposed to help T2 to restore the lost info about the longer pattern. The idea is to make T1 really good at capturing small patterns, like speech in pop music, while T2 would be responsible for background music with longer patterns.

Don't we already do this with text translation? Why not to let one model read a printed text pixel by pixel and the other model produce a translation, also pixel by pixel? Instead we choose to split printed text into small chunks (that we call words), give every chunk a "word vector" (those word2vec models) and produce text also one word at a time.


This proof is largely irrelevant in the real world. An interesting question would be how much can be approximated with a model that has 1 MB worth of weights and can use only relu/tanh/softmax activations.


The first paragraph of the conclusion addresses that this is merely a proof of what is possible, not what is practical:

> The explanation for universality we've discussed is certainly not a practical prescription for how to compute using neural networks! In this, it's much like proofs of universality for NAND gates and the like. For this reason, I've focused mostly on trying to make the construction clear and easy to follow, and not on optimizing the details of the construction. However, you may find it a fun and instructive exercise to see if you can improve the construction.


Of course they can, because an NN can be any function. If we can pick tanh as the "activation" then we can as easily pick arctan as the activation ans say our NN computes arctan. What an achievement! A better question is whether conv+relu based NNs can approximate any function. But that's most likely false because there are many weird functions that are impossible to compute, not even approximate (I'm talking about those curious counter-examples in math).


This is a non sequitur in this context. The universality described here depends only on changing connection weights, not the neuronal activation functions. An important caveat is the approximated function must be continuous, but that covers a very large family.


I don't think every continuous function can be approximated this way because we can make an infinitely complex, but continuous function that would have any n-th derivative also continuous. I'm thinking about those weird zeta-riemann-style functions. In order to approximate such a function we'd need a huge model that couldn't be computed or stored even by a universe-size perfect computer.


It’s a theorem, so it’s been proven: https://en.wikipedia.org/wiki/Universal_approximation_theore...

Another caveat that I forgot in my previous comment is the domain has to be compact (closed and bounded). But if so, then it doesn’t really matter how weird your continuous function is, because compactness of the domain guarantees uniform continuity, i.e. your delta only depends on epsilon and not x in the epsilon-delta criterion of continuity. That allows you to partition the domain into patches of diameter delta, in which very simple functions are sufficient to approximate within epsilon.


It's not so much about the density. It's about ludicrous mortgage that they signed up in past and want their house to at least maintain its price.


It's not the corporations, but the current mortgage system. Corp workers with 100-200k pay pair up and put their entire life earnings (30 years) towards a house. Banks back this deal and the couple gets a house they can't really afford. Same with our education system and healthcare. If the gov made a law that mortgage can't exceed 1 year income, prices would adjust to that number.


More like a 2M house and a 120K car. 500K/year isn't enough to live in a 10M house because it's 20 years of work (30, accounting for other expenses) and that assumes no cut in income. To get into a 10M house one needs to start a very successful business, which is quite possible after all the experience at such a job.


The advice that I would give to myself in my early 20s is (1) nobody cares about your career except you: people come and go, change companies and have their own careers; (2) the only way to move up the ranks is by shifting between companies and getting multiple competing offers; this applies even to outstanding devs; so you are expected to move every 3-4 years if you don't want to get stuck at your entry level pay.


It's also far easier for an L4 to get competing offers for amount far exceeding what L5s make. That's also how many (most?) people get the L5 level: they negotiate with HRs when they have the leverage (competing offers). Getting to L5 via the promo committee is probably the hardest way to get there.


650 is L6 which is maybe 5% of the Google's population. The median pay would be around 300.


I don't get this statement. Google doesn't force anyone to work overnight. It doesn't force you to work at all, to be honest. Some people can spend a workday skiing and then arrive at 6 pm for dinner. TC is 400K. And once you're bored or feel undercompensated, you just go to FB, Netflix or Snapchat and get a 30-50% pay rise.


But you're an inconsequential cog, and in order to get a decent bonus and _any_ RSU refresh at all (which is the majority of that 400K btw) you have to jump through insane hoops and shave yaks all day. At some point it feels like Dostoevsky's labor camp description: you dig a hole and then you fill it back up. Except this is a very comfy labor camp, with 3 meals a day, and you can leave if you want to.

But some of us like to actually make things, and have a sense of purpose, and other things higher up on the Maslow's pyramid of needs. For them Google of 2019 is mostly not a good place, unless they end up on teams (and in positions on those teams) where they can do work that's meaningful to them, rather than copy proto buffers in some soon-to-be deprecated backend. Meaningful work is scarce there, and has been for at least the last decade, and a lot of people are competing for it.


> But you're an inconsequential cog

In the grand scheme of things everybody is an inconsequential cog. You, me and everybody you know are average people who will grind away at whatever thing we happen do. You aren't gonna change the world. I'm not gonna change the world. Accept this and move on.

> Meaningful work is scarce there

"Meaningful work" is in the eye of the beholder. Learning to find joy in whatever task you are working on is an important skill to learn.

Being a very highly paid "inconsequential cog" at a mega-corp and working below market at some dinky startup can be the difference between actually affording to buy a house. It can mean you get to retire years earlier than you would have otherwise. It can mean putting your kids through a top notch education program. It buys you a lot of things.


The entirety of one's world view is defined by their perception. If you can convince yourself you're not a cog at Google, hey, more power to you, enjoy those golden handcuffs. But if not, there are plenty of options out there which let you pay mortgage, put your kids through college, and "buy a lot of things". It doesn't have to be FANG.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: