I am a postdoc researcher and Junior Professor, working at the Institute of Applied Mathematics, TU Dortmund, Germany. I obtained my PhD in spring 2010 with a thesis titled Fast and accurate finite element multigrid solvers for PDE simulations on GPU clusters. In August 2011 I have been appointed Junior Professor for "Hardware-oriented numerics for large systems" (Hardware-orientierte Numerik für große Systeme).
Broadly speaking, my research interests lie in the field of scientific computing for PDEs; more specifically, I work on parallelisation of modern numerical solution techniques for PDEs, in particular multigrid methods and finite element discretisations, that scale well in all aspects and across a wide range of heterogeneous hardware platforms. Challenges in this field are characterised by the need to carefully balance the inherent trade-off of various efficiency aspects, in particular, between optimal numerical properties, relaxations enforced by parallelisation, robustness, and good exploitation of hardware capabilities. In our group, this research area is termed hardware-oriented numerics. Application domains that my work is motivated by - and applied to in cooperations - include fluid dynamics, solid mechanics and geophysics. Some of the areas I work and have worked in include the following, listed in no particular order. Details can be found on my projects and publications pages.
I am an early adoptor of using GPUs for PDE problems, having started into this research area back in 2004. Together with Robert Strzodka and the FEAST group here in Dortmund, we have pioneered several innovative techniques towards using GPUs for the fast and accurate solution of ill-conditioned problems, with geometric multigrid solvers and finite element discretisations: Our work on mixed and emulated precision schemes overcame the initial drawback of GPUs in scientific computing, being limited to single precision only. We also developed a minimally invasive integration technique to use GPUs in large-scale MPI-based software packages, and we were among the first groups to demonstrate complex finite element multigrid driven simulations with more than one billion unknowns on large GPU clusters, with applications in solid mechanics and fluid dynamics.
More recent work is focused on mapping irregular computations onto GPUs, which includes techniques to extract sufficient fine-grained parallelism from seemingly sequential operations. Examples include sparse matrix-vector multiplication and tridiagonal solves, and the analysis of parallelisation techniques for preconditioners and multigrid smoothers with good numerical properties, to name just a few.
I am always keen on advertising the use of GPUs in all areas of scientific computing, including application domains that I do not pursue research in. I regularly serve as a reviewer for GPU-related articles and conference proceedings, as a member of program committees, and as an assistant editor of the community web site gpgpu.org. I have been honoured to give several invited conference tutorials and introductory talks on the topic. A collection of beginner-level sample code that I wrote is available at gpgpu.org, and on my GPGPU page.
Scalable and Robust Multigrid Solvers on Heterogeneous Hardware
I am convinced that heterogeneous designs will be at the core of future peta- and exascale systems: The era of MPI-only parallelisation has more or less come to and end, and due to the power wall, we can no longer ride the GHz scaling curve to benefit from automatic performance improvements of new hardware generation. On the contrary, further performance improvement will stem from concurrency, i.e., strong scaling within cluster nodes. This implies a huge amount of challenges that have to be met in designing novel numerical solution techniques (solvers, discretisations) that are aware of the hardware and expose a sufficient amount of parallelism on all levels, from the MPI level between cluster nodes, to many cores and accelerators within nodes, to vector units inside the processing units.
We are working on a non-overlapping hierarchical multigrid / multilevel domain decomposition method called ScaRC (Scalable Recursive Clustering. ScaRC works on globally unstructured locally structured grids, and has been coupled with several enhancements such as GPU acceleration and adaptivity via mesh deformation techniques. The solver concept ScaRC is part of our next-generation FEM toolkit FEAST. The idea is to obtain good scalability, robustness and global coupling by a global multilevel solver, and to exploit local structures excessively in the design of local multigrid solvers acting on subdomains and working as a preconditioner to the global scheme. Exploiting local structures as much as possible is also very important in view of the memory wall problem, which will only get worse over time and already limits performance of Finite Element codes by the available memory bandwidth rather than the computational peak capabilities of modern chips.
We recently started to look into resilient schemes, and into schemes that provide a sufficient amount of asynchronicity on future exa-like hardware. Also, we are redefining the notion of subdomains by no longer interpreting local patches as MPI ranks.
The Partnership for Advanced Computing in Europe (PRACE) recognised our efforts in 2008, and we were awarded the first PRACE award at ISC'08.
This research avenue started as a fun evening side project and has since evolved into a serious long-term collaboration, including my part-time digression into (seismic) wave propagation modeling, spectral element methods, and problems where ultra-scalability is not a fun exercise and workhorse for future applications (2D Poisson problems with billions of unknowns :)), but really needed by domain scientists. I have been honoured to contribute to the GPU-accelerated version of the SPECFEM3D software, which is widely used and internationally recognised, e.g. as part of the PRACE petascale benchmark suite.
One aspect that is often overlooked by practitioners and domain scientists in parallel computing is the growing importance of energy efficiency. Common agreement exist that at the exascale level, everything will be judged by its energy efficiency. Implications include all sorts of ecological and societal aspects, but also critical economical questions: At current scales, the initial acquisition cost of a large machine can be outweighed by the electricity bill to operate it over its lifetime. Recently, together with a couple of colleagues, we have surveyed the field and ported three applications to an ARM-based architecture. Lots of challenging questions arise in this field, e.g. if the longer execution time on such slow architectures results in a net gain in the ultimate energy-to-solution metric, or if gains are still possible when many more of these devices are plugged together to compensate for slow execution.
Mixed Precision Methods
The relation between computational precision and result accuracy is highly non-monotonic. My work has been focussed on emulation techniques and especially mixed precision approaches, and we have devised schemes that allow to perform up to 95% of the computations in low precision while not sacrificing the high accuracy of the results. In contrast to previous research (dating back in the 1960s), we have concentrated on the performance aspects of such schemes, especially for solvers of multigrid type. Such mixed precision schemes are also very hardware-efficient, and are becoming more and more important due to the memory wall problem: They provide a generic way to half bandwidth requirements, and thus deliver performance improvements on almost all hardware architectures.
Other Research Interests
Other research interests include GPU-accelerated techniques in physically-based modelling: Interactive simulation and rendering of ODE- and PDE-based effects such as water waves, cloth, particle systems, fire and smoke etc. Together with two students, I have also worked on parallelising Lattice Boltzmann methods on GPUs, the Cell BE and on multicores. Historically, I am also interested in hierarchical 3D hexahedral mesh generation for FEM multigrid solvers, and the geometry-mesh interface of CAD software and FEM solvers.
Over the years, I have collaborated with a bunch of people:
- The FEAST Group at TU Dortmund: Christian Becker, Sven H.M. Buijssen, Markus Geveler, Matthias Grajewski, Dirk Ribbrock, Thomas Rohkämper, Stefan Turek, Hilmar Wobker and Peter Zajac.
- Peter Bastian, Olaf Ippisch (Heidelberg) Christian Engwer, Mario Ohlberger (Münster), Oleg Iliev (Kaiserslautern): Multigrid-FEM-PDE solvers at exascale, EXA-DUNE project as part of the DFG SPP 1648: SPPEXA.
- Robert Strzodka: Mixed precision algorithms, GPU programming for PDEs, hardware-oriented numerics, scalable large-scale HW/SW interaction, and much more.
- Dimitri Komatitsch (University of Aix-Marseille), Gordon Erlebacher (Florida State University) and David Michéa (BGRM, France): GPU cluster computing for seismic wave propagation using high-order spectral elements.
- Alex Ramirez, Nikola Rajovic and Nikola Puzovic (Barcelona Supercomputing Center): Green Computing.
- Jan-Philipp Weiß (KIT, Karlsruhe): Preconditioners and multigrid smoothers on GPUs.
- Patrick McCormick and Jamaludin Mohd-Yusof (Los Alamos National Laboratory): GPU cluster computing for FEM.
Research in our group is (and has been) supported by the following grants:
- NVIDIA CUDA Teaching Center Program
- DFG SPP 1648: SPPEXA: EXA-DUNE - Flexible PDE Solvers, Numerical Methods, and Applications
- Mercur Research Center: Asynchrone und fehlertolerante parallele Mehrgitterverfahren für zukünftige HPC-Rechner (asynchronous and fault-tolerant parallel multigrid methods)
- DFG Paketantrag: Skalierbare, rekursiv konfigurierbare, massiv-parallele FEM-Mehrgitterlöser für heterogene Rechnerarchitekturen (scalable recursively configurable massively parallel FEM-multigrid for heterogeneous architectures)