Optimization | Mathalope

Written an article on implementing a toy O(n^2) N-body simulation algorithm with High Performance Computing (HPC) and Intel Xeon Phi Architecture. In Part 1, we describe the code optimization journey to boost performance from 3.2 to 2831 GFLOPS on a single node. In Part 2, we distribute workload across 16 cluster nodes to further boost performance to 33208 GFLOPS. End result: capable of performing over 1 trillion (1,099,510,579,200) particle-to-particle interactions per time-step at sub-second level (~662 ms).

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Mathalope

Tag Archives: Optimization

How to setup Tensorflow Jupyter Notebook on Intel Nervana AI Cluster (Colfax) For Deep Learning

How to setup PyTorch Jupyter Notebook on Intel Nervana AI Cluster (Colfax) For Deep Learning

High Performance Computing (HPC) Running Intel Xeon Phi: N-body Simulation Example

Intel Colfax Cluster – Targeting a Specific Instruction Set / Intel Processor Architecture

Intel Colfax Cluster – Optimize a Numerical Integration Implementation with Parallel Programming and Distributed Computing

A Scientific Programming Sketchbook