Heidelberg University

Small Scale project, 2020 Jove Matrix Performance

Scenario

Researcher

Benjamin Thomitzni, PhD Student in the Theoretical and Computational Chemistry Group

Initial Problem

Matrix multiplication code has poor single core performance
Performance also doesn't scale beyond 4 threads.

Theoretical and Computational Chemistry Group

Outcome

What we did

Exploit symmetry to reduce required operations
Change data layout and re-order loops to avoid cache misses
Use a specialized linear algebra library to generate optimized code
Add thread-safe and performant parallelism using OpenMP

Results

Optimized code runs more than 4x faster on a single core with small test dataset
With a larger dataset this increases to 8x performance improvement on a single core
Near perfect parallel scaling on a 12-core machine with small test dataset
Near perfect parallel scaling on a 56-core machine with larger dataset

Deutsch

Contact