Joe Letteri is the Senior Visual Effects Supervisor at Weta Digital. He’s the man ultimately responsible for the outstanding quality of work from best digital effects powerhouse in the world today.
I rank Letteri as one of the true magicians of cinema, able to almost effortlessly convince me that I’m looking at something impossible when I’m not looking at such a thing at all. I still can’t quite believe King Kong, even. How was there never anything physical there? How was there never any photographic element to that great ape? He looks tangible.
I met up with Letteri recently to chat about Weta’s work. As The Hobbit: An Unexpected Journey is just arriving on DVD and Blu-ray – it’s out on Monday April 8th in the UK, already out in the US – we used this film’s cutting-edge images as a way to unlock a pretty structured run-down of the newest and best tricks and tactics of digital movie effects.
So here’s some of what Letteri told me, starting with the legacy of trickery from the very beginnings of motion picture trickery. There’s a glossary at the foot of the story for some of the more esoteric terms.
If you go back to The Lord of the Rings trilogy, back to Fellowship, one of the things that was charming about it was the way we created the Wizards and Hobbits at two different sizes using very old fashioned, forced-perspective kinds of tricks. That goes back to the beginning of cinema. Bring one character closer to camera, push the other one farther away, as long as they’re both in focus, you can maintain the illusion.
But now fast forward to The Hobbit and we need to do that in stereo 3D. In stereo that illusion breaks down really quite easily. If the camera moves any significant amount that no longer works. So, if you look at the shot in Bag End where Gandalf comes through the door and bangs his head on the chandelier, in Fellowship that was done with traditional forced perspective, we’d built two stages at two different scales. But in The Hobbit, it’s in stereo.
So we what we had to do was synchronise two different cameras so that they moved at different scales. One was on a green screen stage and one was moving through the set. The actors had to learn complex choreography, with Gandalf on the green screen and everybody else on the set, and we layered the images together in stereo. It took about a year to complete that one shot.
What had been a simple camera technique became something that takes months and months of work to do.
Take the character of Azog as a very different example. In the Lord of the Rings trilogy, any of the orcs that had dialogue would all have been actors in make-up and costume. But now we have this character who has scars cut deep into the flesh, almost to the bone. He can’t possibly exist but he needs to feel as real as though he were an actor on the set.
So Azog had to be created entirely digitally, with an actor captured on a performance capture stage and then rendered with this physical rendering system we use, creating his bones, his body tissue and the facial animation. We have integrate all of that into one character that has to look photorealistic. In just ten years we’ve gone from old fashioned, physical techniques to completely digital techniques, but they’re creating things that we couldn’t do, that you wouldn’t get to see, in any other way.
When we created Gollum ten years ago, you can look at what we had as a digital version of an animatronic character. There was an armature in him, and a kind of skin over a form and inside that form there were bladders that we could inflate to simulate muscle movement, the same thing you’d do with an animatronic.
But now, we build these characters so that they have biology. If you look at Gollum or Azog as they are in The Hobbit, there’s a skeleton inside them. And it’s a real skeleton, and all of the bones are the proper shape. All of the muscles are attached to the bones and represented physically with all the correct lengths, attachments, ligaments and so forth. Surround the muscles is a fascia layer that holds them altogether, but then there’s a layer of dense tissue on top of it, which represents organs or fat and fills out the rest of the form of the body. And then there’s skin that goes on top.
When you see Gollum in the Riddles in the Dark sequence in The Hobbit, watch him move and look at what’s happening, both to his skin and underneath his skin. You will see his bones and muscles moving under the skin. This is much different to treating the skin like a sack of cloth that wobbles over the body, which is how things have been done for most creatures. This simulated biology is as real as we can currently get these characters to be.
Back when we were starting on Gollum for The Two Towers, digital skin looked like dinosaur skin, big thick skin. Human skin doesn’t look that way at all, and has a kind of translucency to it that, if you don’t get it right the character will look like it’s made out of plastic. We identified the key component to making skin look correct as subsurface scattering, where light diffuses through the skin and makes it look translucent. We did this for the first time on Gollum for The Two Towers.
That became the basis for characters who have been created since then, both by us and by other studios. But we’ve since realised that the simplified approach we took, which was all that we understood in those days, didn’t allow for a lot of the very thin translucency you get in some areas, especially around the eyes and lips. This effect required us to really understand what light does after it has gone through the outer layer of skin.
So, we combined this new knowledge with another technique where we do these very high resolution scans to get skin detail on different parts of the body. Skin detail is partly constructed out of the way it moves and flows and folds, and so you need to respect that to get wrinkles, for example, to form in the right way. All of this now works together in characters like Gollum.
The eyes are one of the most complex character elements to create. The light transport inside the eyes is very complex. Eyes gather light from 180 degrees, they soak up light from the whole world and focus it down into these tiny spots. And they draw the attention of the viewers, as we’re used to looking into other people’s eyes, or even into animal’s eyes, to try and get an understanding of whether it’s friend or foe, if it’s going to attack or if it’s going to make us laugh. Most of the cues for communication are in the eyes.
And so we have these layered depths, from the lenses through the cornea and sclera, and it’s all created physically in our system to capture the way the light travels and bounces around the eyes.
Then you can add on top of that the really fine micro-movements of the eyes. That’s somewhere we could really take advantage of 48fps in this film. It’s one area in which, creatively, 48fps allowed us to do something much, much better than we could ever do with 24fps. If you look at Gollum in some of his closeups you get a sense of realism and performance that it is much better than we could have achieved any other way.
If we see an opportunity to put in a micro-movement that wasn’t in the original performance, we’ll take that opportunity. As much as we enjoy and respect working with actors like Andy Serkis, ultimately our responsibility is to create a character. An actor will do whatever they can, physically, to the limit that they can, but if we need to push that for whatever reason, whether it’s an exaggerated stunt or a leap or, at the other end of the scale, a micro-expression, then we’ll do that.
It’s been one of the hallmarks of Gollum going back to The Two Towers, that his micro expressions, and indeed expressions themselves, go back to the work that Paul Ekman had done. The Two Towers was, as far as I’m aware, the first time anybody coded the expressions of a character based on the FACS system. Great for expressions but not for dialogue, so we had to extend it to be able to put both expressions and dialogue together. We used micro-expressions very deliberately, fleeting changes hidden in there.
We don’t allow for cheating. Bones aren’t allowed to cross over into other bones. Traditional animation poses would dictate what the character needed to be, and that’s okay if you’re cartooning, it’s a gestural expression, it’s meant to convey an emotion, but if it’s something that breaks accurate physicality, you lose realism. We really don’t allow that to happen. If the physicality breaks the animators go back and rework it, and we make sure that we still convey the same emotion with the correct physicality.
As much as we talk about motion capture and performance capture what we are trying to do is distill the performance. So we use the original performance as much as we can but, partly because of differences in the design of the character and the body of the actors, our animators have to constantly change and reinterpret the movements. Some of the movements can be trained and learned so it becomes automatic for the actor, but a lot of it will always be animators making choices on top of the performance capture data, adjusting poses and so-forth.
We need to convey the key elements of any expression. Is it the tilt of the eyes and the head, or where the hand is? Or maybe it’s the shrug of the shoulder? Whatever it is, we try to maintain that, to convey the emotion.
It’s very difficult to understand emotion and how we interpret emotion, but we do want to help people understand that what we do is work with actors to create these characters but the specifics aren’t simple. It’s not plug and play. Performance capture really does require animation in collaboration with what the actors give us.
We can now separate the dynamics of performance from the filmed representation of it and that’s never before been possible. You can look at Andy Serkis side by side with Gollum and say “Yes, that’s Andy’s performance” but if you look at the subtleties of the motion, you realise that it’s not 1:1 with his recorded performance. The art is in making the character true to itself. The actor’s performance is responsible for the authorship of that, but we’re responsible for creating the character as you ultimately see it on screen.
What we’ve really realised lately is that trying to simulate things from the outside in, trying to mimic physicality almost as if you were drawing it frame by frame but in a three-dimensional sense, will not give you the necessary physical detail that you need to convey realism. All of these things, even the micro-expressions have to be driven by what the underlying physical form of the character can perform. That’s why we don’t allow poses to go the wrong way, or any physical dynamics to overshoot what they could do in reality.
Our research is being driven by our understanding of what is physically correct. It’s about how to merge what physics says something will need to be doing in order to look real with what we creatively need to make something look fantastic. It’s the area where those two overlap that where we do our work.
There must have been some times where we still cheated on this but, typically, we tried not to. That’s the philosophy across the whole of Weta. What we’re trying to do is to give you the illusion that what you’re seeing on film existed in front of a camera when the camera was rolling on location.
Thanks again to Joe for taking the time to talk to me. Coming up later today: his comments on Man of Steel, so stay tuned.
And another reminder that The Hobbit, cutting edge digital characters and all, is available across the UK on DVD and Blu-ray from Monday April 8th.
Here’s that glossary, in the order that the terms are used above.
Forced-perspective: Something further away from you appears to be smaller, so with no stereo depth cues to tell you otherwise, a filmmaker can use actors and props at different distances to change their apparent size. Moving the camera or not having sufficiently deep focus can spoil the effect.
Armature: A simplified skeletal structure. In an animatronic, this is the mechanical structure under the “skin.” Used here to give an idea of the simple skeleton inside the “old” Gollum.
48fps: The Hobbit: An Unexpected Journey was the first feature film shot at 48 frames per second and many of its screenings were in this format too. The Blu-ray version is at 24fps.
Micro-expressions: very fast, involuntary expressions, as famously identified and discussed by Paul Ekman. They are said to belie emotions that an individual is looking to hide or suppress.
Paul Ekman/FACS: Paul Ekman was a pioneering psychologist who has been able to catalogue facial expressions and their meaning using FACS, the Facial Action Coding System.