Compression, and the lossy art of telling your story.

A small thought experiment, to begin with.

You have lived, by the time you are reading this, somewhere on the order of one billion seconds. Each of those seconds contained, in technical terms, an extraordinary amount of information. Visual fields full of colored pixels. Audio streams with thousands of distinguishable frequencies. Proprioceptive signals from every muscle. Olfactory data. Tactile data. Interior states of mood, hormonal load, gut bacteria population. Each second is a few megabytes of raw experiential data, by any reasonable estimate. Multiply by a billion seconds, and the lifetime is, in storage terms, on the order of petabytes.

Someone asks you, at a party, how your week was.

You have approximately forty seconds in which to answer, before the asker's attention wanders or another conversation absorbs them. Forty seconds of speech, at a normal speaking rate, is about a hundred and fifty words, which is, in bits, somewhere around six hundred bytes.

You have a few petabytes of raw data, and a six hundred byte channel, and you have to fit one into the other. The ratio is unfavorable. The ratio is, on inspection, an act of compression at scales that no engineering team has ever attempted in earnest.

This is, in technical terms, what telling your story is. It is what answering how are you is. It is what introducing yourself at a party is. It is what writing a CV is. It is what writing this chapter is. Every act of self-description, from the tiniest small-talk to the longest memoir, is a lossy compression problem, and the chapter you are reading is an attempt to be honest about what that means, and useful about what you can do with the fact.

∗ ∗ ∗

The piece of mathematics we need, briefly, comes from a paper Claude Shannon published in 1948 called A Mathematical Theory of Communication. The paper is, on most lists, one of the three or four most consequential pieces of twentieth-century mathematics. It founded the field of information theory, and it gave us, among many other things, the formal definitions of information, entropy, and compression, all of which we are about to need.

Shannon's central insight, simplified to a fault, was this. The amount of information in a message is not the number of letters or sounds it contains. The amount of information is the degree to which the message could not have been predicted in advance. A message that says exactly what you expected it to say carries very little information. A message that says something you could not have predicted carries a great deal. The unit Shannon proposed for measuring this was the bit, the smallest possible piece of information, corresponding to the answer to a single yes-or-no question.

Information theory then asks: given a source that produces messages, what is the minimum number of bits needed to faithfully represent its output? This minimum is called the entropy of the source, and Shannon proved that any compression scheme can, in principle, approach but not exceed it without losing information.

This gives us a clean distinction, which the rest of the chapter will lean on.

Lossless compression is compression in which the original message can be exactly recovered from the compressed version. ZIP files are lossless. The file you compressed is exactly the file you get back when you decompress it. Lossless compression works by finding redundancies in the data, like repeated patterns, and representing them more efficiently. There is, however, a hard limit. Shannon's theorem says you cannot, even in principle, losslessly compress a message below its entropy. The entropy is the floor. Below the floor, you must, mathematically, lose some of the original.

Lossy compression is compression in which some information is deliberately discarded in exchange for a much smaller representation. JPEG images are lossy. MP3 audio is lossy. Most spoken human communication is lossy. The discarding is, in well-designed lossy schemes, not random. The discarding is strategic: you throw away the parts of the signal that the receiver is least likely to need or notice, and you keep the parts that carry the most meaning per bit. A good lossy compression scheme is one that throws away a great deal and still produces, on the other end, something the receiver experiences as faithful enough.

The choice between lossless and lossy is not a moral choice. It is an engineering choice. It depends on what the channel can carry and what the receiver actually needs. A doctor's notes about a patient should probably be lossless. A summary of the patient's case for a hallway conversation between colleagues should be lossy. Both are useful. Both are appropriate. The lossy version is not a worse version of the lossless one. It is a different artifact, with different costs and different uses.

the source forty years several petabytes of raw experience unspeakable in full unwriteable in full unhearable in full lossless compression still petabytes exact, complete, unreadable lossy compression six hundred bytes approximate, useful, sayable the lossy version is not a worse version. it is a different artifact.

Figure 16.1   The source is a life, in full. The lossless representation, even if it were possible, would be useless to a listener. The lossy representation, by discarding most of the source, becomes the version a human conversation can actually carry.

∗ ∗ ∗

Now I want to apply this to the act of telling your own story.

The first, immediate observation is that you cannot tell your story in full. The full story, even if you had decades of uninterrupted speaking time, would not fit through any available channel, because the channel is your voice and the receiver is a human listener with a finite attention span and limited working memory. The compression is not, in any meaningful sense, optional. You are going to compress whether you choose to think about it or not. The only choice is whether to do it deliberately or to let it happen by default.

The second observation, which follows immediately, is that every telling of your story is a different compression. The version you tell a new colleague is not the version you tell a close friend. The version you tell a therapist is not the version you tell your sister. Each version selects different parts of the underlying signal, throws away different parts, and arrives at a different summary. None of the versions is, in the strict sense, false. They are different compressions of the same source, optimized for different channels and different receivers.

The anxious mind is, here, particularly prone to a specific kind of error, which I want to name carefully. The anxious mind, having told a particular compression of its story many times to many people, will, after enough repetitions, come to believe that the compression is the story. The thing you have been telling yourself about your childhood, your career, your relationships, is not the underlying signal. It is a particular compression you have settled on, with particular things kept and particular things discarded, and the discarding is so old and so habitual that you no longer remember what was discarded. The compression has, with time, become the only version you can access.

This is, in technical terms, what bad therapy can sometimes do. It can entrench a particular compression by encouraging you to repeat it. This is also, in technical terms, what good therapy can sometimes do, in the opposite direction. A good therapist, in the sense I have come to understand the work, is somebody who helps you generate different compressions of the same underlying signal, deliberately, until you have access to several of them, and can choose which to use in which context, and can notice that the original signal is much larger than any single compression has been able to capture.

I will not say more about therapy here, because Chapter 5 said what needed to be said. I will only say that one of the things the right therapist does, in addition to everything else, is to help you renegotiate the compression of your own life.

∗ ∗ ∗

I owe you a small piece of honesty about this very book, because the book is, in technical terms, an extended compression of my own life, and the chapter we are in is the one where it would be cowardly not to admit that.

Chapter 2 of this book is approximately twenty-two hundred words. Twenty-two hundred words is, in bits, on the order of a kilobyte. The events the chapter compresses cover roughly four decades. The compression ratio, conservatively, is on the order of ten thousand to one. Most of what happened, by that compression ratio, was thrown away.

I want to tell you, briefly, what was thrown away. The chapter did not mention my sister beyond a single factual sentence. My sister was in the same house I grew up in. She had her own experience of that house, her own version of the years there, her own relationship with the same uncle. None of that is in the chapter. I made the decision to leave it out for a specific reason, which is that her story is not mine to tell, but the decision was a piece of compression, and the compression has a cost. A reader who reads Chapter 2 might come away thinking the household contained only me, when it did not.

The chapter did not mention specific friendships from the years between eighteen and twenty-one, the years I spent in Bhilai working at the computer institute. Those years contained people. The people were real. Some of them are still in my life. Some of them are not. The chapter discarded all of them, because the channel was a chapter about my brain becoming what it became, and the friendships, real as they were, would have made the signal harder to read for the reader who was, at that moment, trying to understand a single specific thread.

The chapter did not mention the books I read between eighteen and twenty-five. I have, over the years, read more carefully than I have done almost anything else, and the books I read in those years are, in some real sense, the second parents of the man writing this paragraph. The chapter mentioned, in passing, that I taught myself mathematics. It did not say which books. The which-books is, in some quiet way, the entire interior story of my twenties, and it is, in the chapter, almost entirely absent.

None of this is, in the strict sense, a confession of dishonesty. The chapter was not lying. The chapter was performing a specific lossy compression, designed to convey the maximum useful information for a specific purpose, which was to earn the credentials for the rest of the book. The compression was, I believe, correct for that purpose. But the compression is not the life. The life had a sister in it. The life had friendships. The life had books. The life had thousands of afternoons and tens of thousands of conversations and hundreds of small moments that I have not, even once, in any version of telling the story, mentioned to anyone.

I tell you this not because I want to add the missing material now. The chapter is over. The chapter is its own artifact. I tell you this because the chapter is, in microcosm, what your own story is. You have been telling yourself a compression. The compression is not the life. The life is bigger than the compression. There is, in your case, also a sister, also some friendships, also some books, also some afternoons that the version of yourself you have been telling has, without thinking about it, left on the cutting room floor.

∗ ∗ ∗

Here is a small piece of code that demonstrates the principle on a tiny text, because seeing compression happen in numbers makes the abstract claim concrete.

import zlib

def lossless_ratio(text):
    """
    Return the compression ratio achieved by zlib's lossless
    compression. A ratio of 0.5 means the compressed version
    is half the size of the original.
    """
    original = text.encode('utf-8')
    compressed = zlib.compress(original, level=9)
    return len(compressed) / len(original)


def lossy_summary(text, target_length):
    """
    A toy lossy compression: keep only the words that are
    most likely to carry meaning, and discard the rest.
    Real lossy compression on text is much more sophisticated;
    this is a cartoon for illustration.
    """
    common = {'the', 'a', 'an', 'and', 'or', 'but', 'is', 'was',
              'are', 'were', 'be', 'been', 'being', 'to', 'of',
              'in', 'on', 'at', 'by', 'for', 'with', 'as', 'i',
              'you', 'he', 'she', 'it', 'we', 'they', 'this',
              'that', 'have', 'has', 'had'}
    words = text.split()
    kept = [w for w in words if w.lower() not in common]
    return ' '.join(kept[:target_length])


sample = (
    "My mother died when I was four. My sister was one. "
    "My father drank, gambled, and slept around, in roughly "
    "that order of priority, and in roughly that order of "
    "competence. My sister and I were handed to his younger "
    "brother, my uncle, in the manner one might hand off a "
    "difficult parcel one had not asked to receive."
)

print(f"Original length:        {len(sample)} characters")
print(f"Lossless ratio (zlib):  {lossless_ratio(sample):.2f}")
print(f"Lossy summary (10 words):")
print(f"  '{lossy_summary(sample, 10)}'")

# Sample output:
# Original length:        319 characters
# Lossless ratio (zlib):  0.74
# Lossy summary (10 words):
#   'mother died four. sister one. father drank, gambled, slept around,'

The lossless compression by zlib achieves a ratio of about 0.74 on this short paragraph, which is to say, the compressed version is roughly three-quarters the size of the original, and the original can be exactly reconstructed from it. The lossy summary, by contrast, is a tiny fraction of the original size, conveys the basic skeleton of what happened, and could not, even in principle, be used to reconstruct the original. Both representations are useful for different things. The lossy version is the version you would say at a party. The lossless version is the version you would archive.

The Python lossy summary above is, I want to be honest, a cartoon. Real lossy compression on text is much more sophisticated. But the cartoon makes the point. The summary throws away grammatical glue, articles, pronouns, and keeps the content words that carry the most semantic mass per token. A real conversation about your life does something analogous. You throw away the texture, the qualifications, the days that were neither good nor bad, the ordinary Tuesdays. You keep the high-information events: the deaths, the moves, the relationships, the turning points. The story you can tell in forty seconds is, in technical terms, the high-information backbone of your life. The rest, including most of the days you have actually lived, is on the cutting room floor.

∗ ∗ ∗

I want to close the chapter with three small operational consequences, because the chapter is incomplete without them.

First, your story is not your life. The compression has, at high ratio, removed most of the original. The compression is useful, and it is correct, and it is, in many cases, kind, because the listener does not have the channel to receive the full signal. But the compression is not the source. When you find yourself believing the compression is the truth, remind yourself that the truth is, in storage terms, several petabytes larger than anything you have ever said about it. There is, in the underlying signal, room for more than the story you have been telling.

Second, you have permission to tell different compressions. The version of your story that you tell a new friend does not have to be the version you tell a therapist. The version you tell a colleague does not have to be the version you tell your sister. These are different channels with different listeners and different purposes, and using a different compression for each is not, in any meaningful sense, inconsistency. It is engineering. A good engineer uses the right compression for the right channel. The reader who finds themselves anxious about telling slightly different versions of their life to different people can, I think, take comfort in the fact that the variation is not dishonesty. The variation is the appropriate response to a variable transmission medium.

Third, and this is the hardest one, you are allowed to update the compression. The story you have been telling about yourself was generated, in many cases, by a much younger version of you, who chose the compression with the information available at the time. The information available now is different. The listener you are talking to today is different. The purposes you have for telling the story have evolved. The compression should, at long intervals, be redone. Not the underlying source. The source has been what it has been. But the projection of the source into a few thousand words, the chosen highlights, the chosen emphases, the chosen things that get included and the chosen things that get left out, are all up for renegotiation. The version of your story you have been telling for twenty years is not, on inspection, a more honest version than a freshly compressed one. It is simply the older one. The older compression has the advantage of being practiced. It has the disadvantage of having been optimized for a listener and a purpose that may no longer exist.

∗ ∗ ∗

A small exercise

Tell three versions.

Pick a single event from your life. A real one, not a representative one. Something that actually happened, that has appeared in some version of the story you tell about yourself.

Now write three different one-paragraph descriptions of it. The first, for a colleague you have just met. The second, for a close friend who knows the rough outlines of your life. The third, for a therapist who is interested in the underlying pattern rather than the surface event.

Notice, when you have finished, that all three are about the same event. Notice, also, that all three are different. The first compressed the event for a low-bandwidth channel and a listener who does not yet have context. The second compressed it for a higher-bandwidth channel and a listener who has context. The third compressed it for a different purpose entirely, namely the diagnosis of pattern rather than the recounting of incident.

None of the three is, on inspection, the event. The event was something the three compressions are all approximations of. The point of the exercise is not to find the best version. The point of the exercise is to notice that the act of telling has degrees of freedom that, until now, you have probably been spending unconsciously.

Chapter 17 takes a final step in Part two, into combinatorial optimization, and into the question of why some problems in life resist all our efforts to solve them efficiently, not because we are unintelligent but because they belong to a particular class of mathematical problems that has, in a precise technical sense, no efficient solution. The chapter is named for the traveling salesman, but it is really about the question of how to live well in a world full of decisions that, individually, are too hard to optimize.

For now, the page closes here. Your life is a source. Your story is a compression. The compression is not the source. The compression is useful. The compression is even, in many cases, kind. But it has room, at the edges, for the parts of the life that the older versions left out. Those parts are still there, in the source. They have not gone anywhere. They are, on most days, waiting for a chapter you have not yet written.

← Chapter 15                   Chapter 17 →