Chapter five

Local minima, and why your therapist keeps telling you to try something new.

There is a particular kind of being stuck that mathematics has a name for, and that name explains, retroactively, almost every piece of bad advice you have ever ignored.

The name is local minimum.

Imagine, because we are going to use this picture for the rest of the chapter, a hilly landscape. Not flat. Not catastrophic. Just a normal piece of terrain, with mountains and valleys and small dimples and large bowls, the way most of the world actually looks if you walk far enough across it. Now imagine, on this landscape, a single small ball. The ball, being a ball, rolls downhill. It rolls until it stops. Where does it stop?

If you have not had the pleasure of doing this on purpose, the answer is, the lowest place it can find without having to roll uphill first. The ball does not know what is on the other side of the small hill it has, on its way down, declined to climb. The ball cannot see. The ball only knows that, right here, every direction it could roll is up, and therefore the place it is currently in is, locally, the lowest available point.

This is a local minimum. It is the lowest point in the immediate neighborhood. It is, very specifically, not the lowest point on the whole landscape. There is, somewhere over the small hill, a much deeper valley the ball will never reach, because the ball would have to roll up to get there. The ball, being a ball, does not do that. The ball stays where it is. The ball calls the place it is in the bottom, because every available move makes things worse.

I want to be precise about something. The ball is not stupid. The ball is doing exactly what physics requires of it. The ball is, in a strict local sense, behaving optimally. It has, given the laws governing its motion and the information available at its position, made the best possible choice at every moment. The trouble is that the best possible choice at every moment has carried it to a place that is, taken in the larger view, not where it ought to be. The optimization succeeded. The result was wrong.

This is, I want to say plainly, what depression often is. Not always. Not in every case. But often, and in a way that mathematics describes more accurately than most of the language we have been given for it. The depressed mind is a ball that has rolled into a local minimum. Every move it could make from where it currently sits would, in the short term, feel worse. Getting up feels worse. Calling a friend feels worse. Going outside feels worse. Eating breakfast feels worse. The mind, doing precisely what minds are designed to do, has correctly identified that all available moves are uphill, and has, sensibly, chosen the only option that does not require effort, which is to stay where it is. The staying is the bug. The staying is also, in a strict mathematical sense, the right answer to the question the mind has been asking.

The mind has been asking the wrong question. The mind has been asking which move, right now, makes me feel better. The correct question is which move, eventually, gets me to a deeper valley. These are not the same question. The first one has a clear answer at every moment. The second one requires the ball to do something that, mathematically, balls cannot do on their own.

Here, because we will need it, is the picture.

Figure 5.1 The ball is at a local minimum. The deeper valley is real. The only way to reach it is to go up first.

The shape of the figure is the entire chapter. The ball is comfortable where it is, in the sense that it cannot fall any further. It is also, simultaneously, in the wrong place, because there is a deeper valley to its right that it cannot see from where it sits. The only move available to it is up. Up feels like the wrong direction. Up is the wrong direction, in every short window. Up is also the only way out.

∗ ∗ ∗

I want to write down, briefly, the algorithm that produces this trap, because the algorithm has a name, and naming it makes everything that follows clearer.

The algorithm is called gradient descent, and it is, with apologies to the rest of mathematics, the single most important algorithm in modern machine learning, modern optimization, and the modern human nervous system. It is also extraordinarily simple. Here it is.

def gradient_descent(start, landscape, step_size, steps):
    """
    Roll downhill. Stop when you can no longer go down.

    Inputs:
        start:     where you begin on the landscape
        landscape: a function that, given a position,
                   returns how high you are there
        step_size: how big each move is
        steps:     how many moves to try

    Returns:
        the position where you ended up.

    Known issue: returns a local minimum, which it cannot
    distinguish from a global one. The function does not
    know there is a deeper valley over the hill. The
    function does not know there is a hill. The function
    only knows which direction is down, right here.
    """
    position = start

    for _ in range(steps):
        # estimate the slope at the current position
        slope = (landscape(position + 0.001)
                 - landscape(position - 0.001)) / 0.002

        # move in the downhill direction
        position = position - step_size * slope

        # if the slope is essentially zero, we have stopped
        if abs(slope) < 1e-6:
            break

    return position

This is real, working code. You could paste it into a Python file, define a landscape with several valleys, and watch the function find one of them, more or less at random depending on where you started it, and refuse to leave even when you tell it there is a deeper valley a short walk away. The algorithm has no mechanism for going uphill. The algorithm was not built for that. The algorithm was built for the much simpler task of find a low place, and it does that task extremely well.

The human mind, asked to optimize its own mood, runs roughly this algorithm. It tries small moves. It keeps the ones that feel better. It discards the ones that feel worse. Over time, by doing this thousands of times a day for years, it settles into whatever local arrangement of habits, beliefs, and avoidances produces the least immediate pain. That arrangement is, in a strict mathematical sense, the local minimum of its own happiness function. The mind has worked very hard to find it. The mind is proud of having found it. The mind will defend it, in many cases, against any external suggestion that the arrangement is suboptimal, because the suggestion is, in the immediate frame, asking the mind to do something that would make things worse.

This is, again, not a moral failing. This is the algorithm working exactly as designed. The mind is not lazy. The mind is, in the most precise sense, locally optimal.

∗ ∗ ∗

Mathematicians, after several decades of staring at this problem, came up with a fix. The fix is called, depending on which textbook you read, simulated annealing, or stochastic gradient descent with restarts, or, in the deep learning literature, simply momentum. The names vary. The idea is one idea.

The idea is that you sometimes, on purpose, make a move that is worse.

Not randomly. Not without limit. But on a schedule, and with a tunable willingness to climb a small hill in the hope that there is a deeper valley on the other side. You take a step up, knowing it is up. You feel briefly worse. You keep going. After a while, if you have chosen well, you find yourself rolling down again, into a new valley you could not have reached by descent alone. The new valley is deeper. The temporary uphill was the price.

If you have ever had a good therapist, or a kind friend who knows your patterns, or a book that knocked something loose in you, you have, in mathematical terms, been pushed up the side of a small hill. The push felt bad. The push was meant to. The push was an attempt to dislodge you from a local minimum that you had, by years of careful local optimization, found and then defended.

The reason your therapist keeps telling you to try something new is not that they are short on ideas. It is that the algorithm you are running cannot, by its own internal logic, escape the place it has found. The algorithm needs an input it cannot generate on its own. The input is a willingness to feel briefly worse. The input arrives, usually, from outside. From another person, from a deliberate practice, from a system that has been designed, by people who understand the problem, to apply a measured upward push in a measured way until the ball is over the hill and rolling again.

∗ ∗ ∗

I want to talk about a specific time in my life when I was the ball.

I described, briefly and without detail, in Chapter 2, the eight months of clinical depression that followed the end of my marriage. I do not want to describe the eight months. I want to describe, instead, the geometry of them, because the geometry is what the chapter is about and the geometry is what the mathematics fits.

I was, during those months, in a local minimum of extraordinary depth and remarkable stability. The minimum was not, by any external measure, comfortable. It was simply locally optimal. Every available move would have made things worse in the short window I was capable of evaluating. Getting up was worse than not getting up. Eating was worse than not eating. Speaking was worse than not speaking. Leaving the apartment was worse than staying. I was not, in any meaningful sense, choosing to stay. I was running gradient descent on the only landscape I could perceive, and gradient descent had found its answer, and the answer was here, and the algorithm had no mechanism for considering elsewhere.

The mathematics in my own head, the equations I had spent years building, was useless to me during this period. I want to say this clearly, because the version of this book that pretends otherwise would be lying. Calculus does not get you out of a local minimum of clinical depression. Calculus does not call you in the morning. Calculus does not return your texts.

What got me out was external. What got me out was, in the precise mathematical sense, a push.

A debt acknowledged

The push, in my case, was administered by Anna Chandy, a counsellor practicing in Bangalore, and by the work of The Live Love Laugh Foundation. This chapter, and the man who wrote it, exist in their current form because of what they did. I owe both of them more than I can mathematically express, which is a thing a mathematician does not say lightly.

Anna is a practitioner of Transactional Analysis, which is a framework for understanding the human mind that, in its own vocabulary, says something almost exactly like what this chapter has been saying in mathematical vocabulary. Transactional Analysis observes that adults run, in their heads, old scripts written by much younger versions of themselves, and that the scripts continue to execute, long past the situations they were designed for, because the adult has not been given the tools to halt them. The job of a Transactional Analyst, very roughly, is to help you see the script you are running, name it, and write a new one. This is, in optimization terms, exactly the simulated annealing move. You are made to see, briefly, that the local minimum you are in is a script. The seeing is the temporary uphill move. The new script is the new valley.

I will not, in this book, attempt to teach Transactional Analysis. Anna has written her own book about it, called Battles in the Mind, which is good, and she has spent decades training counsellors in the framework, and the work is hers to teach, not mine. I will only say that the vocabulary of TA fits the vocabulary of this book in a way that is not, I think, a coincidence. Both frameworks treat the mind as a system running algorithms it did not consciously author, and both frameworks propose that the way out is to make the algorithm legible to its operator. Anna does this in the language of ego states, scripts, and permissions. This book does it in the language of functions, derivatives, and gradients. The same patient, examined by two different specialists, gets two different but compatible diagnoses. Both of them, applied carefully, work.

The Live Love Laugh Foundation is, separately, an organization in India that has done more to make professional mental health support accessible and unembarrassing than any institution I am aware of. They run helplines. They run rural programs. They train counsellors. They normalize, in a country that has been slow to do so, the idea that asking for help is not a moral failure but a sensible thing to do when the algorithm you are running has produced a local minimum you cannot, by yourself, escape. If you are an Indian reader and you are stuck, in the specific sense this chapter has been describing, their website is thelivelovelaughfoundation dot org, and the helpline section of that website is, in my opinion, the single most useful URL in the country for someone in the room I have been describing.

I want to be careful with one thing, because the version of this paragraph that did it badly would be embarrassing. I am not recommending Anna or LLL because they are friends, or because I owe them publicity, or because the book needed a feel-good moment. I am recommending them because they are, factually, the institutions that gave me the upward push my own algorithm could not generate. If you are stuck in the way I was stuck, you will need an external input. They are two examples of where that input can be found. There are many others. The point is not these two specifically. The point is that the input has to come from somewhere outside the algorithm, and a good therapist or a good organization is one of the most reliable places to find it.

∗ ∗ ∗

I want to close the chapter with a small honest observation about the geometry of escape, because the cheap version of this argument is wrong, and the cheap version is, again, what most books would do.

The cheap version says: just push yourself out of your comfort zone. The cheap version has been printed on a million posters, and the cheap version, in practice, almost never works, because the person being told this has no mechanism for choosing how much uphill to climb, no mechanism for knowing whether there is a deeper valley on the other side, and no mechanism for handling the situation when the uphill move turns out to lead nowhere.

Real escape from a local minimum is not a single heroic uphill move. Real escape is a tuned, careful, repeated process. You move up a little. You see how it feels. You either continue or you return. You try a different direction. You move up a little more. You let someone watch you, because the algorithm cannot watch itself. You repeat this, on the order of months, sometimes years, until either the valley you are in becomes shallower (which sometimes happens, simply by the landscape changing over time) or you find yourself, almost by accident, rolling down into something new.

The role of a therapist, of a foundation, of a friend, of a book, is to be the part of the system that can do what gradient descent cannot. They can see further than you can. They can hold steady while you climb. They can tell you, with some confidence, that the small hill you are currently on is in fact a small hill and not the beginning of an infinite mountain. They cannot move you. The ball still has to roll. But they can tell you, without lying, that the hill ends, and they can tell you, more or less honestly, what is on the other side.

That, in mathematical language, is what therapy is for.

∗ ∗ ∗

A small exercise

Name your local minimum.

Pick something in your life that you have, for some time, known is not where you want to be. A job. A pattern of avoidance. A habit. A relationship. A specific Tuesday you have been having for fifteen years.

Now ask, with the calm of a mathematician examining a function, the following question. What is the smallest possible move I could make away from this, that I have not made, because the move would feel briefly worse?

You do not have to make the move. You only have to name it. The naming is half the work. The naming is the moment the local minimum becomes visible to the mind that has been sitting in it, and a local minimum that has become visible has, in some quiet way, already begun to lose its grip.

Write the name down. Put it somewhere you will see it tomorrow. Tomorrow is not the day you make the move. Tomorrow is the day you notice you still have not made the move, and that the local minimum is, in a way it was not yesterday, no longer invisible. That is the entire exercise. The hill has not moved. You, however, have begun to see it.

Chapter 6 is about a different shape of trap, which is the trap of chasing a value that, by its mathematical nature, retreats faster than you can approach it. The value is called happiness. The shape of the trap is called an asymptote. The chapter will explain, in plain calculus, why optimizing directly for happiness is the most reliable known method of not getting any.

For now, the page closes here. The ball is in the valley. The valley is not the bottom. The hill is real. The hill is also, eventually, finite. Somebody, somewhere, has been over the hill before, and they are, on the whole, willing to tell you what they saw.

← Chapter 4 Chapter 6 →