Why neural networks won’t replace actors

ForkLog

3 years ago

Why neural networks won't replace actors

Alvy Ray Smith — a pioneer of computer animation, co-founder of Pixar, who left the legendary studio over disagreements with Steve Jobs. In the book «Пиксель. История одной точки», the translation of which appeared from Individuum, he describes how technologies emerged that changed our notions of art and the world more broadly. We publish an excerpt about how as early as 2000 Smith had to reassure actors worried about progress.

In 1996, Pixar employees received a special technical achievement award from the American Academy of Motion Picture Arts and Sciences, the first of many to follow. Technical awards are given at a ceremony as glamorous as the Oscars ceremony — the same tuxedos and gowns, limousines and movie stars, a lavish banquet and brief thank-you speeches. The difference is that television does not broadcast it to the world, and famous journalists do not interview nominees on the red carpet. The Academy reasonably believes that the broad public is hardly interested in fog machines or cobwebs and other technical achievements for which the award has been given over many years.

The event is always hosted by a movie star. In 1996 it was Richard Dreyfuss, known to us from many roles, especially as Curt in American Graffiti by George Lucas. I, Ed Catmull and a few other Pixar employees nominated for the award sat at a table. Just a few months earlier the triumphant premiere of Toy Story had taken place.

At the start of his ceremonial speech Dreyfuss noted that actors and technical specialists depend on each other, and that this other, non-televised Oscars ceremony is very important for actors like him. “We all, actors and techies, go together into the future,” he said. But then he added irony. He pointed to our table and said: “Note, guys from Pixar, that I said together!” A nervous chuckle ran around the room. Many actors, evidently, had heard overbold statements from my colleagues in the industry that “from day to day we will replace living actors with simulations.”

In 2000 I was invited to write an article for the journal Scientific American on exactly this topic — about the possibility of replacing living actors. In it I expressed the thought that there is something special in humans. We still cannot not only replace but explain it.

I call this creativity, but the term is not quite precise. I mean what Turing, technology and Fourier did, what programmers, engineers and developers of models do, what animators and actors do.

That is what Turing did when he invented machine computation and a computer with a stored program, seemingly from nothing. It was a remarkable creative leap, one of the greatest in history. It is a form of theoretical-creative endeavour — in a tower of ivory. Kotelnikov did the same, creating the counting theorem, making yet another great creative leap. And, of course, he built on the great creative idea of Fourier.

That is what programmers do, or what enables them to create from a very long list of seemingly meaningless computer instructions a program that does something meaningful — for example, it computes Toy Story. The continual improvement of incredibly fast computers, described by Moore’s Law, is another example of this. Another example is the creation inside a computer of complex models, say, characters, using geometry and a shading language.

And this is what animators do when they breathe life into their characters and make us believe that a stack of triangles is self-aware and feels pain. This is artistic creativity. They are engaged by and actors who persuade us that a mind inhabits their bodies, belonging to completely different people. In fact, actors and animators believe they use the same skill. Pixar, in interviews, selects animators who possess acting abilities.

What I wrote in 2000 remains relevant today, two decades later: we have no idea how to replace living actors. But we can replace the appearance of an actor on screen. The image on the screen embodying the actor is called an avatar. We can replace the actor on screen with a convincing avatar — even in a close-up, conveying emotion. I know this is possible and has been done more than once. Look at Brad Pitt in The Curious Case of Benjamin Button (2008), where Brad Pitt is not Brad Pitt, but his avatar, a digital representation of his appearance. But the point is that the avatar is “driven” by a great actor, namely Brad Pitt himself. The avatar replaced not him or his skill, but only his screen appearance. Convincing emotions belong to the actor, not to any computer program.

I predicted in 2000 that we could shoot a film with living actors without using a film camera if actors controlled their avatars. The prediction, in which I extrapolated the results of the ongoing development of computer animation, came true eight years later in The Curious Case of Benjamin Button.

<…>

Then, in 2000, I performed a few magical passes and suggested that, since it took 20 years to go from the idea of a computer-animated film in 1975 to its realization in 1995, perhaps another 20 years would be needed to reach the first “camera-less” film, but not “without actors.” So, 2020, when I am making final edits to this chapter, has already arrived, so obviously my magical passes did not take effect. There is no evidence that it is possible to shoot an emotionally convincing film using only human avatars, with no real people in the frame. And, of course, there is no evidence of an approaching replacement of actors or animators by computer simulations. Richard Dreyfuss can relax; they are not foreseen in the foreseeable future.

<…>

A couple of years ago, when I was at the Royal College of Cambridge, where my wife conducted her sabbatical — in the very place where Alan Turing wrote his foundational work — a former colleague of mine in pixel-based games, John Bronskill, approached me. “Alvy, we won’t need to program any more!” he stunned me with that assertion. John made a name for himself by creating extensions for the Adobe Photoshop graphics editor, perhaps the most widely used pixel-based application in the professional world.

“What do you mean?” I asked. “Read this,” he said, handing me a science journal. It opened to a paper from UC Berkeley’s AI Research Laboratory. It described a neural network of a certain type, trained with 1000 unlabeled photographs of horses and 1000 unlabeled photographs of zebras. The horse photos contained varying numbers of horses of different colors arranged in arbitrary order. The zebra photos were also varied, though the colors of the zebras did not differ. All of these photos were digital, consisting of pixels. After the corresponding training (I won’t describe its technology), the network learned to perform the following remarkable trick: given an arbitrary input image of a zebra, the network replaced each zebra with a horse. In fact, it was simply recolouring the zebra in the colors of a horse or vice versa.

“How does this work? — I asked, and added: — I don’t even think this problem has a precise definition.”

John simply shrugged: “I don’t know. And no one knows. It just does it! It’s too hard to reverse-engineer.”

The same neural network is capable of other astonishing things. If trained on landscape photographs and paintings by Van Gogh, it will turn any nature shot into a painting in Van Gogh’s style. Or vice versa. Or in Monet’s style. Or it will transform summer landscapes into winter ones, or vice versa.

I mention this here to ask: what comes next in the Digital World? I confess I do not understand what is happening or how important it is in the long run. But let us ponder a little.

To this day, Turing allowed his universal machine — or a computer with a stored program — to perform operations on the program itself as data. That is the essence of his invention — a computer with a stored program. Is the ‘horse–zebra’ program among the operations in which the program modifies itself? Turing was drawn to such a possibility, as well as the prospect of artificial intelligence. Operating systems on modern computers typically forbid programs from modifying themselves, to avoid total chaos.

A neural network is modelled on a regular computer, so the program performing the modelling does not modify itself. But suppose the neural network were a true neural network, not merely a simulation. Could it be interpreted as a program that modifies itself? I think yes. Our brain is, undoubtedly, a neural network, and as far as we know it contains no store of programs separate from the store of data. And, probably, it does nothing beyond the bounds of Turing computation. We have found no other algorithmic process in the 80 years since the concept emerged.

In 1965 I entered Stanford’s graduate program, because it was among the two universities I knew where an intriguing new subject — artificial intelligence (today often abbreviated as AI) — was taught. It was also taught at MIT. I studied under John McCarthy, the father-founder of artificial intelligence at Stanford. And I spoke at length several times with Marvin Minsky of MIT, another father of this field.

After a couple of years I gave up working on AI, deciding that no breakthrough would happen in my lifetime. Perhaps I was premature, given that I likely have another two decades in reserve, but in the meantime I helped shoot the first digital film. Since I did that, I have had time to return to thinking about AI. Although, in truth, I never stopped thinking about it.

I was struck by John Bronskill’s remark. I had always assumed that when someone explained how AI works, I would understand it all. Yet in front of me was an example of machine learning — perhaps not yet developed enough to be called AI — and I understood nothing. Perhaps because the network modifies its own program? We know that, as a rule, one cannot be sure even about something as simple as whether a program will halt, so it is not surprising that we cannot understand how this “horse–horse” program works.

In any case, its ability to make a zebra from a horse is far from perfect. In the article shown to me by Bronskill, there was a famous photograph of Vladimir Putin riding a horse with his bare torso. As a result, the Russian president and the horse merged into a two-headed striped centaur.

The essence of today’s revolution lies in our inability to forecast it, to see more than one order of magnitude ahead. We simply must ride the wave, and see where it carries us — to exciting and even mysterious places.

Translation from English by Alexey Snigirov. Published for the edition: Alvy Ray Smith. Пиксель. История одной точки. Moscow: Individuum, 2023.