What is the optimization algorithm for these if not backpropagation? This page does mention the use of hyperparameters for optimization and compares itself to an adaptive stochastic gradient descent method, which makes me think that it is using backpropagation.