> To submit to arxiv you need to be approved by someone as a legitimate researcher, or beg for reviews from people you’ve never met without any anonymity.
Well that’s good for you isn’t it! What about everyone else? Not everyone is automatically approved because of things like their email address. That’s my point - they focus on reputation and credentials rather than the blind value of the work. It’s less accessible.
The confusion comes from the 'journal' aka topic on arXiv, which can have different settings for when an author can submit. Some are more stringent than others. It is not an arXiv wide issue.
You use gridsearch for hyperparamter optimization and state that at some point you would like to add a bayesian approach. One simple change that could boost the performance, would be to use random search inplace of gridsearch. Grid search is known to perform worse than random search in cases where not all hyperparameters are of similar importance [1]. Intuitively, in grid search a lot of evaluations evaluate the same setting of an important hyperparameter with changing settings of not important ones.
It depends on the number of evaluations. The more evaluations the stronger the model built by the TPE algorithm. With very few evaluations, we would expect TPE to match random sampling. This effect can for example be seen in the plots of the "Bayesian Optimization and Hyperband" paper [1, 2], where the plotted "Bayesian Optimization" approach is TPE.
Also, there might be model bias: for example if the objective function is stochastic (e.g., a reinforcement learning algorithm that only converges sometimes) or not very smooth, TPE might exploit areas that are not actually good based on one good evaluation. In those cases TPE might perform worse than random search! To alleviate the effect of model bias in model based hyperparameter optimization (e.g., TPE) and to obtain convergence guarantees, people often sample every k-th hyperparameter setting from a prior distribution (random search) (this is also the case for the plots in [1, 2].
If you are wondering which HPO algorithm you should use (or HPO in general), I would highly recommend the first part of the AutoML tutorial at NeurIPS2018 [3] given by my advisor.
The NFL only applies to settings where your task distribution is uniform random over all possible tasks. It is my intuition that this kind of task distribution is almost surely not something we would encounter.
Your algorithm description sounds like a variant of greedy local search with a NN heuristic. While I personaly do not know of a paper that does this kind of local search (I have not looked), I do want to add that there is a rich literature on "learning to learning", including "learning to optimize" approaches, e.g., https://arxiv.org/abs/1606.04474.
The field of automatic machine learning (abbreviated as AutoML) concerns all endeavours to automate the process of machine learning. To provide a sense of what could constitute AutoML, let me post a list from the "Call for Papers" of the International Workshop on Automatic Machine Learning (ICML 2018) [1]:
* Model selection, hyper-parameter optimization, and model search
* Neural architecture search
* Meta learning and transfer learning
* Automatic feature extraction / construction
* Demonstrations (demos) of working AutoML systems
* Automatic generation of workflows / workflow reuse
* Automatic problem "ingestion" (from raw data and miscellaneous formats)
* Automatic feature transformation to match algorithm requirements
* Automatic detection and handling of skewed data and/or missing values
* Automatic acquisition of new data (active learning, experimental design)
* Automatic report writing (providing insight on automatic data analysis)
* Automatic selection of evaluation metrics / validation procedures
* Automatic selection of algorithms under time/space/power constraints
* Automatic prediction post-processing and calibration
* Automatic leakage detection
* Automatic inference and differentiation
* User interfaces and human-in-the-loop approaches for AutoML
> I don't see "Automatic design of novel algorithms" in this list. Can AutoML produce something as novel as a GAN, CapsNet, WaveNet, Transformer, Neural ODE, etc? Is that even considered to be one of its goals. In my opinion, there's a clear separation between a group of people trying to improve AutoML so that it's more useful in doing all those tasks on the list, and a group of people trying to invent next gen ML algorithms or DL architectures.
I agree with you from the view of the current state of the art methods and the current state of the AutoML / fundamental ML research communities. Current methods are very limited, but I can not think of a reason why a sufficiently general searchspace of architectures/pipelines could not produce something like a GAN or a WaveNet.
I do not think that designing algorithms as novel as the ones you listed is currently a goal of AutoML, as that is not something we have an attack for. However, I do think that with increasing capabilities, the field of AutoML will seek to automate every step of the machine learning pipeline - including the design of algorithms. E.g., once/if there are attacks to apply NAS for yielding truly novel architectures, I think NAS researchers will be happy to do just that -- wouldn't you call that AutoML then?
I don't see "Automatic design of novel algorithms" in this list.
Can AutoML produce something as novel as a GAN, CapsNet, WaveNet, Transformer, Neural ODE, etc? Is that even considered to be one of its goals?
In my opinion, there's a clear separation between a group of people trying to improve AutoML so that it's more useful in doing all those tasks on the list, and a group of people trying to invent next gen ML algorithms or DL architectures.
The parent did not specifically talk about NNs. As I understand it AutoML could apply to all statistical endeavours that involve estimation (classical or bayesian).
> “AutoML could apply to all statistical endeavours that involve estimation”
Yes, this is the part that sounds like parody to me. At least, as a working statistician, I can tell you that the concept of AutoML could not apply to the far majority of things I work on.
Could you give an example? I have a hard time understanding what you could mean, as Algorithm Configuration & Selection is such a general framework. If you are solely talking about the current state of the art, I would agree that techniques from AutoML do not have the generality and autonomity of an expert human.
For example, look into Chapter 5 on logistic regression from the Gelman & Hill book on hierarchical models & regression.
It walks through an example with arsenic data in wells and a problem of estimating how distance, education and some other factors relate to a person’s willingness to travel to a clean well for water.
Deciding on how to standardize the input features, how to rescale for regression coefficients to be interpretable in meaningful human units, how to interpret statistics of the fitted model to decide whether a feature is helping or hurting by adding it (since this cannot be deduced from raw accuracy metrics alone), how to interpret deviance residual plots for outlier analysis, etc.
All those things have nothing to do with changing the architecture of the model, except possibly including or excluding features, and in that example there were no hyperparameters to tune, and the inference problem would not make sense for hyperparameter tuning on raw accuracy outputs anyway, since the goal was not optimizing prediction but rather understanding impact of features that have semantic meaning in the contexf of possible policy choices that could be adopted.
By way of contrast, applying an automated subset selection algorithm to automatically choose the features would be a naive idea with likely bad results in that case, and setting up an optimization framework that would optimize over possible transformations or standardizations of the inputs seems equally dubious compared with expert, context-aware human judgment.
And this is a very trivial example. If you modify a problem like this to address causal inference goals, or add some type of cost optimization on top of it, it becomes more and more complex, but exactly in a way that a tool like AutoML can’t help with.
In other words, making an AutoML that can truly apply to all types of estimation or inference problems is no easier than solving strong AI computer vision and natural language problems entirely, since you need contextual reasoning and creative proposals for inventing features and sleuthing the goodness of fit of a certain model architecture in light of the human-level inference goal you’re trying to reach.
Automatically building a scikit learn estimator might include many conditional hyperparameters and also a very large amount of them (<100) [1]. However, performing joint architecture and hyperparameter search can be framed to be on a much simpler search space, e.g., for a recent paper that aims to automate the design of RNA molecules, we formulated a 14 dimensional search space which includes very little conditional hyperparameters [2].
The tools included in the repository are very broadly applicable and only a few of them are specifically targeted at neural architecture search.
I did not have to do something like this.