Daniele Panozzo: Geometry processing in the era of parallel computing

The improvements on modern processors are mainly in their parallel capabilities, in the form of vectorized instructions or additional processing cores. Unfortunately, their performance on a serial task is improving very slowly and it is unlikely that this trend will change in the future.
The majority of geometry processing algorithms are, by design, not able to efficiently use this new hardware: they have been invented to minimize the number of operations, since floating point computation has always been the major computational bottleneck. However, this is not true anymore: the bottleneck is currently in inconsistent memory accesses, while computation is cheap since it can be executed in parallel.
This is a major paradigm change that will require our community to revise many of our algorithmic foundations, from the data structures we use to encode a mesh to avoiding sparse linear algebra operations, since they cannot fully benefit from a parallel architecture.
In this talk, I will present a brief overview of modern processing primitives from a geometry processing point of view, describing the constraints that should be respected to fully utilize vectorized instructions, multi-core parallelization and their relation to cache coherence. As a practical example, I will demonstrate how to adapt well-known global parametrization-based remeshing approaches, which rely on sparse linear algebra, into a simple and fully parallelizable algorithm that can convert a mesh with 372 million triangles into a coarse quad mesh, taking less than 10 minutes on a parallel workstation with 16 cores. Surprisingly, designing the algorithm under these new hardware constraints forced us to design a simple method which can be implemented in less than 100 lines of self-contained Python code, compared to hundred of thousands of lines required for competing methods that rely on complex optimization