Mathalope

Written an article on implementing a toy O(n^2) N-body simulation algorithm with High Performance Computing (HPC) and Intel Xeon Phi Architecture. In Part 1, we describe the code optimization journey to boost performance from 3.2 to 2831 GFLOPS on a single node. In Part 2, we distribute workload across 16 cluster nodes to further boost performance to 33208 GFLOPS. End result: capable of performing over 1 trillion (1,099,510,579,200) particle-to-particle interactions per time-step at sub-second level (~662 ms).

M	T	W	T	F	S	S
« Jan
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Mathalope

3.5 years later… 140,000 visitors from 180+ countries. 2017 Year-end Reflection

atlas7.githib.io

JavaScript – Bind Intuition – Devil taking over Jim example

How to setup Tensorflow Jupyter Notebook on Intel Nervana AI Cluster (Colfax) For Deep Learning

How to setup PyTorch Jupyter Notebook on Intel Nervana AI Cluster (Colfax) For Deep Learning

Initialize Numpy Arrays with Tuple Unpacking Technique – np.random.rand and np.zeros examples

3.5 years later… 140,000 visitors from 180+ countries. 2017 Year-end Reflection

atlas7.githib.io

JavaScript – Bind Intuition – Devil taking over Jim example

How to setup Tensorflow Jupyter Notebook on Intel Nervana AI Cluster (Colfax) For Deep Learning

How to setup PyTorch Jupyter Notebook on Intel Nervana AI Cluster (Colfax) For Deep Learning

Initialize Numpy Arrays with Tuple Unpacking Technique – np.random.rand and np.zeros examples

High Performance Computing (HPC) Running Intel Xeon Phi: N-body Simulation Example

Intel Colfax Cluster – Targeting a Specific Instruction Set / Intel Processor Architecture

Intel Colfax Cluster – Estimate Theoretical Peak FLOPS for Intel Xeon Phi Processors

Intel Colfax Cluster – Optimize a Numerical Integration Implementation with Parallel Programming and Distributed Computing

A Scientific Programming Sketchbook