r2 - 16 Jan 2013 - 13:43:30 - PabloBriongosYou are here: TWiki >  Informatica Web  > MpiConf

MPI testing benchmarks

Intel MPI Benchmarks (IMB)

Para testear el rendimiento de MPI en nuestros CE's utilizamos la suite de Intel® MPI Benchmarks (IMB). En concreto utilizamos la versión 3.2. Esta suite provee un conjunto de benchmarks destinados a medir el rendimiento de las funciones mas importantes de MPI.

Los benchmarks son los siguientes:

  • PingPong: is the classical pattern used for measuring startup and throughput of a single message sent between two processes.
  • PingPing: measure startup and throughput of single messages, with the crucial difference that messages are obstructed by oncomingmessages.
  • Sendrecv: the processes form a periodic communication chain. Each process sends to the right and receives from the left neighbor in the chain.The turnover count is 2 messages per sample (1 in, 1 out) for each process.
  • Exchange: communications pattern that often occurs in grid splitting algorithms (boundary exchanges). The group of processes is seen as a periodic chain, and each process exchanges data with both left and right neighbour in the chain.
  • Reduce: It reduces a vector of length L = X/sizeof(float) float items.
  • Reduce_scatter: Benchmark for the MPI_Reduce_scatter function. It reduces a vector of length L = X/sizeof(float)float items
  • Allreduce: Benchmark for the MPI_Allreduce function. It reduces a vector of length L = X/sizeof(float) float items.
  • Allgather: Benchmark for the MPI_Allgather function. Every process inputs X bytes and receives the gathered X*(#processes) bytes.
  • Allgatherv: Functionally is the same as Allgather. However, with the MPI_Allgatherv function it shows whether MPI produces overhead due to the more complicated situation as compared to MPI_Allgather.
  • Scatter: Benchmark for the MPI_Scatter function. The root process inputs X*(#processes) bytes (X for each process); all processes receive X bytes.
  • Scatterv: Benchmark for the MPI_Scatterv function. The root process inputs X*(#processes) bytes (X for each process); all processes receive X bytes.
  • Gather: Benchmark for the MPI_Gather function. All processes input X bytes, and the root process receives X*(#processes) bytes (X from each process).
  • Gatherv: Benchmark for the MPI_Gatherv function. All processes input X bytes, and the root process receives X*(#processes) bytes (X from each process).
  • Alltoall: Benchmark for the MPI_Alltoall function. Every process inputs X*(#processes) bytes (X for each process) and receives X*(#processes) bytes (X from each process).
  • Alltoallv: Benchmark for the MPI_Alltoall function. Every process inputs X*(#processes) bytes (X for each process) and receives X*(#processes) bytes (X from each process).
  • Bcast: Benchmark for MPI_Bcast. A root process broadcasts X bytes to all.

Ejecución IMB benchmark en un CE

Para ejecutar los benchmarks en el CE, lanzaremos un job con el siguiente .jdl:

JobType = "normal";VirtualOrganisation = "ific";NodeNumber = 64;Executable = "mpi-start-wrapper.sh";Arguments = "IMB-MPI1 OPENMPI";StdOutput = "IMB-MPI1.out";StdError = "IMB-MPI1.err";InputSandbox = {"IMB-MPI1", "mpi-start-wrapper.sh" };OutputSandbox = {"IMB-MPI1.out", "IMB-MPI1.err"};Requirements = Member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment)

Donde en NodeNumber iremos variando el número de nodos (hasta 64 máximo). 

El ejecutable IMB-MPI1 lo generaremos con la suite.

Una vez finalizado el job, obtendremos su output. En IMB-MPI1.out podremos visualizar los resultados del benchmark.

Análisis rendimiento MPI 2010-2013

A partir de resultados obtenidos en 2010, y valiéndonos de un script que genera gráficas comparativas, hemos ejecutado el benchmark en enero de 2013 en la máquina ce02.ific.uv.es, y obtenemos estos resultados. Se aprecia un rendimiento bastante menor ahora que con respecto a 2010, conforme aumenta el numero de nodos, se degrada aun mas el rendimiento. A continuación algunas gráficas para el benchmark Allgather que muestran este problema para distinto número de nodos (en rojo 2013, en verde 2010): 

  • 4 nodos:

4nodos.png

  • 16 nodos:

16nodos.png

  • 32 nodos:

32nodos.png

  • 64 nodos:

64nodos.png

-- AlvaroFernandez - 16 Jan 2013

toggleopenShow attachmentstogglecloseHide attachments
Topic attachments
I Attachment Action Size Date Who Comment
pngpng 4nodos.png manage 60.9 K 16 Jan 2013 - 13:38 PabloBriongos  
pngpng 16nodos.png manage 59.1 K 16 Jan 2013 - 13:38 PabloBriongos  
pngpng 32nodos.png manage 58.2 K 16 Jan 2013 - 13:39 PabloBriongos  
pngpng 64nodos.png manage 60.6 K 16 Jan 2013 - 13:39 PabloBriongos  
Edit | WYSIWYG | Attach | PDF | Raw View | Backlinks: Web, All Webs | History: r2 < r1 | More topic actions
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback