Debrief: Assholes, meat, and shit ideas (also Aussie internet sucks)

General / 17 February 2021

Well that was interesting! I just finished up working on the first feature film of my career and... it was what some might call "interesting" and what others might call "normal" 😅 . I can't get into the details unfortunately but let's just say that there's a reason my last blog post was in November and it's not just because I'm slack!

PSA

With that out of the way (believe it or not) I'm not writing this just to gain invisible sympathy from the one or two of you that are reading this. This post is a run down of some technical oddities I've stumbled upon over the course of the project. I've seen a distinct lack of people talking about this stuff online, and unfortunately it seems mostly down to one of three things:

  1. First, NDAs. A lot of what goes on during a project is unfortunately confidential and can't be shared. There just isn't a way around it.
  2. Secondly, people are busy. It's hard to find time to interact with the community when you're already working overtime and crunching (not condoning this by the way)
  3. Finally, some people don't want to share their secrets because of fears that someone will steal their job. This one's actually simple and there's already a well known name for people like this 👉 

a s s h o l e s ™

Seriously people, go fuck yourself with that smug ass shit. If giving away your "secret" means that your job is at risk of being taken, maybe you're just not good enough to do your job? Get better.

Meat

That project was a learning experience to say the least. We went into it thinking we would use Embergen for the fire effects, and for a bit it was alright. However, there was one main drawback. The way Embergen (at the time) handled exporting meant that it was very difficult to get things lined up properly in other software. Maybe there was a way around it, but Houdini got a major update, and we just thought "fuck it, let's do that it looks better" (We're small so can afford to be a bit more agile). 

GPU minimal solver

What I found was that the GPU minimal solver, though designed to be used for lookdev, is actually capable of outputting production quality results given appropriate hardware and proper constraints. As when simulating on the GPU using the standard OpenCL method, the biggest limiting factor when using the minimal solver is GPU memory. Luckily the cards we were using at the time (RTX6000 and P6000) had decent enough memory for most use cases, however I still hit limits with large sims.

The issues with the minimal solver didn't end there. There was also a problem where any setting for the start frame other than default would flat out not work. I worked around this using the time shift node, but that was a sad solution in all honestly. It worked, but luckily this problem has been fixed in a later update.

Another problem was with wind. I soon learned that a lot of the pyro forces and nodes don't work with OpenCL. By "a lot" I mean basically all of them. That includes wind. I worked around that by (don't cringe) changing the direction of the gravity. It was good enough for those use cases in particular, but it definitely wasn't accurate. The reasoning being is that though changing the direction of gravity does push the fire and smoke in a certain direction, it does so by affecting the buoyancy. Real wind doesn't do that. The correct behaviour is that hot air/smoke rises up and as it cools down, the effect of the wind appears more dominant. 

Are you ready for the solution? Two words: "wind" and "tunnel". If you dive into the pyro solver and then dive in one more layer, you should find the smoke object. One of the parameters on this node is called "wind tunnel" and all it does is add a wind force along whatever vector you specify. We did this, and it was easy. There's also another solution to the problem that we found but didn't use. Just create a new velocity field wherever you want it and source it in. It acts in the same way, but you have more control (at the cost of a more complicated setup).

PilotPDG

When it comes to this kind of work, my goal is to keep my machine's resource utilisation at 100% at all times. That means that if one GPU is being used and the other is idle, something is wrong (same goes for CPU). This thing is made to be used and damn it I'm going to use it! That's where PilotPDG comes into play.

The way I organised my project was that each shot got its own project folder and file. $JOB was set to a shared directory with assets that were to be used between all .HIP files. 

Here's where PilotPDG became very useful:

  1. PilotPDG supports cooking nodes from external .HIP files meaning, with one graph, you could hook up dependencies and queue up all of your simulations and renders at once.
  2. PilotPDG is lightweight compared to Houdini, meaning more VRAM can go toward actually cooking the nodes.
  3. You're separating cooking from developing. PilotPDG starts separate processes for each job whereas Houdini cooks in process.

Those are the benefits I found, here is how they helped us:

  1. Using environment edit and Python nodes, you can tell specific branches of your node tree to use specific hardware devices. There were times when I had multiple simulations that needed simulating and renders that needed rendering. As my machine had two GPUs, I created two branches for simulating. One on the RTX6000 and the other on the P6000. Then, when it came to rendering the branches combined to render using Redshift on both cards simultaneously. There was also the option of keeping the branches separate for the render too, however that is a bit more taxing on other parts of the system. For some machines, that would be the preferred option. To be clear, I did go that route sometimes.
  2. Normally with Redshift you need to restart Houdini if you want to swap render device. In PilotPDG, that's not the case. As it starts a new process each time you run a job (by default, it's not actually a requirement to work that way), it's effectively the same as a restart as far as Redshift is concerned. For me that meant that I could just have an instance of PilotPDG running and use it to render my current working file, swapping cards whenever it became necessary.
  3. Rendering with PilotPDG also meant that if the render crashed for any reason, the worst it could do is take down PilotPDG. Houdini, and by extension my working project, would be left unscathed.

PilotPDG wasn't without problems however. It was a lot more buggy than Houdini unfortunately and I found that swapping between multiple cooking graphs and generally just interacting with the UI had a non-zero percent chance of blowing it up. It also doesn't really feel like its own program in any way. Not only that, but it's really just Houdini with most of its features taken away from it. In practice that clearly wasn't an issue, but it just felt off. Also, opening a new network window would default to the OBJ context despite it not actually existing in that program.

Cloud Rendering

We soon came to the realisation that our render power wasn't quite enough for what we wanted to do, and we were faced with the reality that cloud rendering was the best way forward. Well, we would've been correct but in hindsight anything internet related in Australia rarely goes to plan.

We found a good (and cheap) cloud rendering platform based in Vietnam. They had weirdly good support (always checking in on us using WhatsApp) and most importantly cloud rendering servers set up with 6 x RTX 3090 GPUs!

We thought we hit the jackpot and a quick test render proved that these servers were godly. But here comes the issue. Even though we could finish rendering all of the shots we wanted to render in less than 3 hours, uploading and downloading was killer. We had 200 GB of cache files to upload, and I shit you not it took multiple days. Not to mention the time it took to download the finished renders! In the end, it was still faster than rendering locally (mainly because we were rendering other shots locally at the same time).

It was a taste of what could be, and it left us wishing for a service like that based in Australia. If there was one, it might even be a possibility that we could work directly off the cloud machines and forget about our local machines altogether!

In saying that, I have recently found out that there might actually be similar services in the country after all. I think I heard Digistor do something similar. If you know of any others, please let me know!

What I would do differently

The way I used PilotPDG was cool and all, but it was high maintenance and was error-prone. I was always finding that I would accidentally be rendering the incorrect file or frame range or some other silly mistakes. All because the system was too complicated and not automated enough.

I've started looking into Deadline as an alternative to PilotPDG. Even on a single machine, I think it could be useful as it solves most of the problems I was trying to solve with PilotPDG while being more simple in practice. Another benefit to Deadline is that it's scalable so if we need more render power, we just add a couple of licenses in and bring more nodes online. This could also negate the need to go to cloud rendering.

What's up next?

Next steps for me are Unreal virtual production and learning Blender! I'm going to pick up Blender as a replacement for Maya in my workflow. Maya is too expensive considering I rarely use it anymore and it's less versatile than Blender. The main benefit Maya has over Blender in my eyes is animation tools and I don't really do that anymore. If I wanted to, I'd check out Houdini because they've recently started adding some interesting animation tools to their package.

As a bit of a sneak peek into what I'll be posting next up, over the past couple of weeks I've been working on a janky virtual production setup using HTC Vive trackers and NDisplay. What I have working currently is two computers networked together. One machine provides: tracking data, two outputs, multi-user server, and VR scouting. The other machine is a render node, purely there to run four more outputs. I'm happy to say it's all working, and I'll provide some details next time I post!

Resources