Monte Carlo Estimation
Say we have some function \(f(x)\) that we wish to integrate. If we are not able to analytically integrate \(f(x)\), we must restort to numerical methods. We may be inclined to try numerical quadrature methods, and perform the integration as a finite sum of small slices of the function. This will work for simple functions, but for functions with high dimensionality (as is common in path tracing), this quickly becomes infeasible.
Instead of looking at every possible point on the function, we could instead randomly sample the function and then combine these samples to form an estimate of what the integral should be. This approach is known as monte carlo estimation, and is loosely formalized as:
In plane english, this tells us that we can approximate the integral of \(f(x)\) by calculating a weighted sum of a set of \(N\) random samples, each with a probability density function (PDF) of \(p(x_i)\).
Uniform Sampling Example
Lets consider a scenario where we wish to integrate the following function:
between 0 and 2. Obviously, we can very trivially find the anti-derivative of this function to be:
but lets assume for a moment that we are unable to compute this, and wish to use a monte carlo estimator.
To begin, lets uniformly sample values \(x\) from the domain \([0, 2)\). We know that the probability that we’ll select some number within the range 0 to 2, is 100% (because that is the domain we are sampling from!). We can write that as:
Additionally, because we are uniformly sampling the domain of \([0,2)\), we know that the selection of any \(x_i\) is just as likely as the selection of any other \(x_j\), which means that the PDF is constant (\(p(x) = c\)). Therefore we can solve for what the probability must be:
Plugging this constant probability into Eq. (4) gives us the following basic monte carlo estimator:
We can now run this for \(N=100\) samples, and we obtain the value of 4.5707. This is only about 4% different than the analytically known true value!
Importance Sampling
In the previous example, notice that the function \(f(x) = e^x - 1\) is near 0 when \(x\) is small, and grows exponentially as \(x\) increases. This poses a problem for our monte carlo estimator, as since we are uniformly sampling the domain, we are wasting many samples at points when \(f(x)\) is small. We should be trying to sample more where we know \(f(x)\) is larger, as those points will contribute more to the final integral. But how can we do that?
We don’t need to uniformly sample the domain. We can sample any function we like! But sampling from a function with a similar shape to \(f(x)\) will lower the variance of our estimator. Given that \(f(x)\) is small when \(x=0\), and grows larger exponentially, lets try to sample from a function proportional to \(x^2\), as this will exhibit the same qualitative behavior.
But how do we sample from an arbitrary function? The key is to utilize Inverse transform sampling. We know that we want a PDF \(p(x) \propto x^2\), and we can find this function by simply finding a normalization factor \(c\) that ensures it integrates to 1 over our domain:
Now that we have a PDF, the first step in inverse transform sampling is to calculate a Cumulative Distribution Function (CDF). We solve for the CDF (\(P(x)\)) as follows:
Now the final step in inverse transform sampling is to create an inverse function, such that we can feed it uniformly random samples \(zeta\), and obtain numbers which are distributed by our originally desired PDF. To do this, we simply perform the following:
Now, by uniformly sampling \(\zeta\), we can generate new sample points \(x\) with a PDF of \(p(x) = \frac{3}{8}x^2\). We can now use this to determine our new monte carlo estimator:
Where \(x\) is no longer uniformly sampled, but rather is sampled from our PDF that more closely matches the shape of \(f(x)\). If we for evaluate this for \(N=10\) samples (10 times fewer than we did when uniformly sampling \(x\)), we obtain an estimate of 4.4331. Which is only about 1% off the true value!
Importance sampling requires more work, but can yield more accurate results with fewer samples, so long as you choose a sampling distribution that decently approximates the function you’re integrating. Look back at the integral in the rendering equation presented in (3):
Consider what would happen if we had a small light source present in our scene. If we only sampled rays by uniformly sampling the hemisphere above an intersection point \(\vec{x}\), we’d need a truly enormous number of rays to have any decent chance of sampling rays which intersect the light. But we know the light is extremely important, as it is what contributes radiance to the scene! Hence, we can apply this importance sampling method to directly sample the light (which will know will greatly contribute to the overall radiance we’re calculating), while still accounting for the samples in a statistically rigorous way.