HPC | Mathalope

Written an article on implementing a toy O(n^2) N-body simulation algorithm with High Performance Computing (HPC) and Intel Xeon Phi Architecture. In Part 1, we describe the code optimization journey to boost performance from 3.2 to 2831 GFLOPS on a single node. In Part 2, we distribute workload across 16 cluster nodes to further boost performance to 33208 GFLOPS. End result: capable of performing over 1 trillion (1,099,510,579,200) particle-to-particle interactions per time-step at sub-second level (~662 ms).

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Mathalope

Tag Archives: HPC

How to setup Tensorflow Jupyter Notebook on Intel Nervana AI Cluster (Colfax) For Deep Learning

How to setup PyTorch Jupyter Notebook on Intel Nervana AI Cluster (Colfax) For Deep Learning

High Performance Computing (HPC) Running Intel Xeon Phi: N-body Simulation Example

Intel Colfax Cluster – Estimate Theoretical Peak FLOPS for Intel Xeon Phi Processors

Intel Colfax Cluster – Optimize a Numerical Integration Implementation with Parallel Programming and Distributed Computing

Intel Colfax Cluster – Perform 18 billion billion operations on Xeon Phi (Knights Landing) Cluster Node in Sub-millisecond

Intel Colfax Cluster – How to visualize Knights Landing (knl) NUMA Nodes and High Bandwidth Memory modes

Intel Colfax Cluster – How to compile codes on a Coprocessor

How to interactively submit qsub job on a Xeon-phi (Knights Landing) enabled Cluster Node?

Intel Colfax Cluster – How to compile (C, C++, Fortran) codes with Intel Parallel Studio XE

A Scientific Programming Sketchbook