“This is an unprecedented overflow”: Why the progress of his field alarms Yurii Nesterov
ETH Zurich, supported by the NCCR Automation, recently had the honour of hosting Yurii Nesterov from UC Louvain in Belgium. Professor Nesterov was a guest speaker at the John von Neumann Symposium on Game Theory, and we were pleased to sit down with him for an hour to talk about the many challenges facing today’s mathematicians.
First, a little background. In Optimization, the aim is often to find the minimal possible value for a quantity that depends on (possibly many) decision variables and constraints. For example, trying to find the lowest cost for a delivery route might depend on the distance covered, the amount of petrol used, and the time taken, and it is constrained by road rights, such as one-way streets or night-time closures. When there are many such decision variables or constraints, finding the minimum is typically possible only by using a computer to laboriously calculate the cost for a particular set of decision variables, check if any nearby set of decisions give a lower cost, and repeat until all nearby decisions give a higher cost. For a full basic explainer, see "What is gradient descent and why should I care?"
Nesterov found a way to perform this kind of optimization much faster, opening up many new possibilities in the fields of Data Science, Automation and Control. His work has earned him major prizes and led to new algorithmic methods based on “Nesterov momentum”. But what out of all that he feels most proud of is not a question he’s comfortable with.
“I have been working in optimization for 45 years,” he says. “This is a long time. I have had moments where I managed to understand some things, but I don’t think I can say, ‘This is the best.’ This nice thing about our field – Optimization with Numerical Methods – is that you can really see the progress you’ve achieved. Before you started to think about something, you had methods which were able to somehow solve the problem, and it took an hour to find the solution. And then after your contribution, it’s reduced to one minute. So you can see that you did something useful.”
And yet, there is still so much more to be discovered.
“I see only more progress”
“This is a very special time for Optimization,” Nesterov believes. “It is a turning point. Several years ago, we started to study the higher order methods, and it appears that they cannot be explained by standard Optimization Theory. And now we understand that in the standard Optimization we did only the first step by developing the first-order methods. So – if we think about methods of the next level, which are much faster and more efficient – we need in some sense to develop a new and general theory, which covers all possible situations and methods.”
With “next level”, he is referring to calculations that are based on more than the first derivative. Derivatives are useful as they tell you the downhill slope of the optimization landscape, but they are also costly to compute. So, the trick is to keep the benefits and reduce the cost. Now we already have several very encouraging examples. However, there is a lot of work, and discovery, still ahead.
“This was really unexpected. Usually, after 50 years of intensive developments, you would think we are close to the end. It’s not true. Of course, it is good for the next generation, but for mine it’s a bit disappointing!”
And yet, while the vast vistas of work still to be done and problems still to be solved are exciting, Nesterov is worried about the fragmentation of his discipline.
“It’s changing,” he admits. “In some sense, it is an unprecedented overflow. There is too much. In this field, over the past decades, we have got many different methods, and all of them are still working. So we don’t say, ‘Forget about this, now there’s a better method.’ There are very many methods that are useful and are necessary in order to understand what happens in this field. If you think about education, how you can teach all of that to students, I have no idea. For the existing educational structure, it is definitely too much.”
He wants to see more focus on Optimization as an independent academic discipline. “We need a special program where students have more time to study such things than they have in standard universities. It’s a very interesting field, which was probably underestimated at the beginning of Numerical Mathematics, 50 or 60 years ago. If you look now, the place of Optimization in the structure of science – it is always complementary. You can find departments like Optimization and Statistics, Optimization and Operations Research, Optimization and Optimal Control. You always find Optimization in addition to something. But now Optimization is much bigger than all these additions. It’s growing enormously and I don’t see any end to this process. We get more and more interesting problems, so we need more and more efficient methods for the next decades. I see only more progress.”
Nesterov would like to see specialized optimization institutes, where students have the opportunity to investigate a vast range of applications – “from models of rational human behavior and intuitive optimization to optimal design of mechanical structures”. And despite his observation that there is already too much to study, Nesterov expects to see more and more problems treated with Convex Optimization Methods.
Non-convexity is just the first step – never the answer
“When I started to work in Optimization, everything was non-convex,” he says. “People didn’t think about convexity at all. So my activity was to prove that convexity was very important; this is a natural concept which ensures solvability, which ensures the efficiency of the methods. Now, in many important models, we can see a hidden convexity. It’s not visible right away, but if you introduce the correct variables and system coordinates, it becomes convex.
“If there is no convexity, we can find only the local minimum. We don’t know how many other local minima there are, and if they are better or not. In a sense, this contradicts the intrinsic goal of research. Before we studied the situation, we didn’t know the best answer. And afterwards, with a non-convex problem, we still don’t know. We cannot be sure in anything. So what’s the point? It’s very easy to get non-convex problems but for me, this means that we didn’t think enough. The final step, to say this problem is solved – this is where we get convex formulations. Non-convexity is just the first step.”
It’s an interesting point, and a challenging one. And of course it is very relevant to a lot of the work that’s being done now. “This is happening with Neural Networks. It’s a very popular field, with many papers and so on. But the models are non-convex. People are applying different algorithms for non-convexity – we still don’t understand what happens. There is no convex model in this field. It will be done when this convex model can be found.”
Optimization is, for Nesterov, “a kind of philosophy of Numerical Mathematics”.
“Many things can be explained by Optimization. Many processes in society, or even in nature. We don’t see them, because we are participants of these processes. But they are really very efficient. Look at the rational behaviour of people in the social life. This is the most interesting question for the future: finding how all that could work, considering a forgivable weakness of the normal people even in Arithmetic.”
That is what attracted him to Mathematics: the real-world applicability of numerical modelling; the ability even to predict the future, check results against real events and adapt the model in response. And it is why he can’t imagine having been anything other than a mathematician.
“What is important is to prove that you are right,” he argues. “It’s only Mathematics which allows that. Other fields are just in a verbal form, which often is not very convincing. When you prove a theorem, then you get a solid basement once and forever.”
The curse of technology
While the field of Optimization has thrived as a direct result of the increase in computing power now available, Nesterov does not see computers as the ultimate solution. In fact, they come with their own problems and allures of (possibly unjustified) trust.
“If you’re using them properly, they help, but we should explain to students that you shouldn’t trust computers blindly,” he warns. “Computers give you the final result of computations, and if this process is not stable, if there is some noise or whatever, the computer can be absolutely wrong. Back to the subject of Complexity Theory, for example. Perhaps there is a result that says, in a certain time, we can be close to the minimum of the function. But how can you trust that? It is impossible to guarantee, without additional conditions on the function, on the reasonable approximation of the parameters, etc. You get something from the computers, but you must understand that it could be unreliable. This is the situation with AI now, what makes it dangerous. People trust the answers too much; they are encouraged to stop thinking. You need a critical mind. You need to check your answers using alternative methods.”
He is concerned, too, that students in the digital age have developed an entirely different way of thinking. “Students often don’t even try to think, they try to search. They go to Google. Maybe this is good, maybe not; we will see from future results. Clearly, the possibilities for independent problem solving are going down, but with the substantial help from computers maybe the final results will still improve. We don’t know yet.”
And while computers may be doing the thinking for students, they pose a different challenge for researchers.
“Every morning I open my mailbox and see 50 new emails. I need at least to read them, just in order to understand whether I should respond. Some lead to an exchange. If I answer all of them, the next morning I get 100. It’s enormously different from the situation in my early time. I got a letter once a month. I had to go to another floor to fetch it. It was a beautiful time. Maybe AI can do something to help here.”
He echoes this nostalgia for a slower pace when speaking of the pressure on today’s academics. “We have very many people now working in the same direction. This was different when I started 40 years ago. There was no rush; you had time to think and to write a paper, check different variants etc. Now it’s just impossible. You go to a conference and present something, you can be sure that tomorrow hundreds of people will try to improve on it.”
But while he believes the number of people involved now results in a less efficient process, it is inevitable. “They want to do research, they need to publish. You can’t stop this process. Before, if you didn’t publish, it didn’t affect your salary. Now you are forced to do something to get an increase. It becomes part of your professional activity. Not very comfortable to me.”
Nesterov’s proposed solution? Maybe a new Citation Index is needed to quantify the value of published research. “The impact of one paper should be divided by the number of authors,” he argues. “Even in our field, where usually a paper has normally two or three authors, now there may be 50 or 60. It is meaningless. We need to know who is a good researcher, who isn’t. It should be done in a proper way; we should have a new Citation Index. I’m surprised that this is not done yet. The existing indexes motivate people to increase the number of papers, not the amount of personal valuable research. They are clearly counterproductive.”
Do we really need a numerical picture of the Science? For Nesterov, clearly numbers are the only solution that really works. Especially in the forthcoming era of Artificial Intelligence. And this will definitely help in understanding the real beauty and power of Mathematics.
Article by Robynn Weldon