17 de maig 2017

Papi_avail for SandyBridge E5-2680

Available PAPI preset and user defined events plus hardware information.
--------------------------------------------------------------------------------
PAPI Version             : 5.4.1.0
Vendor string and code   : GenuineIntel (1)
Model string and code    : Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz (45)
CPU Revision             : 6.000000
CPUID Info               : Family: 6  Model: 45  Stepping: 6
CPU Max Megahertz        : 2700
CPU Min Megahertz        : 1200
Hdw Threads per core     : 2
Cores per Socket         : 8
Sockets                  : 2
NUMA Nodes               : 2
CPUs per Node            : 16
Total CPUs               : 32
Running in a VM          : no
Number Hardware Counters : 11
Max Multiplex Counters   : 32
--------------------------------------------------------------------------------

================================================================================
  PAPI Preset Events
================================================================================
    Name        Code    Deriv Description (Note)
PAPI_L1_DCM  0x80000000  No   Level 1 data cache misses
PAPI_L1_ICM  0x80000001  No   Level 1 instruction cache misses
PAPI_L2_DCM  0x80000002  Yes  Level 2 data cache misses
PAPI_L2_ICM  0x80000003  No   Level 2 instruction cache misses
PAPI_L1_TCM  0x80000006  Yes  Level 1 cache misses
PAPI_L2_TCM  0x80000007  No   Level 2 cache misses
PAPI_L3_TCM  0x80000008  No   Level 3 cache misses
PAPI_TLB_DM  0x80000014  Yes  Data translation lookaside buffer misses
PAPI_TLB_IM  0x80000015  No   Instruction translation lookaside buffer misses
PAPI_L1_LDM  0x80000017  No   Level 1 load misses
PAPI_L1_STM  0x80000018  No   Level 1 store misses
PAPI_L2_STM  0x8000001a  No   Level 2 store misses
PAPI_STL_ICY 0x80000025  No   Cycles with no instruction issue
PAPI_BR_UCN  0x8000002a  Yes  Unconditional branch instructions
PAPI_BR_CN   0x8000002b  No   Conditional branch instructions
PAPI_BR_TKN  0x8000002c  Yes  Conditional branch instructions taken
PAPI_BR_NTK  0x8000002d  No   Conditional branch instructions not taken
PAPI_BR_MSP  0x8000002e  No   Conditional branch instructions mispredicted
PAPI_BR_PRC  0x8000002f  Yes  Conditional branch instructions correctly predicted
PAPI_TOT_INS 0x80000032  No   Instructions completed
PAPI_FP_INS  0x80000034  Yes  Floating point instructions
PAPI_LD_INS  0x80000035  No   Load instructions
PAPI_SR_INS  0x80000036  No   Store instructions
PAPI_BR_INS  0x80000037  No   Branch instructions
PAPI_TOT_CYC 0x8000003b  No   Total cycles
PAPI_L2_DCH  0x8000003f  Yes  Level 2 data cache hits
PAPI_L2_DCA  0x80000041  No   Level 2 data cache accesses
PAPI_L3_DCA  0x80000042  Yes  Level 3 data cache accesses
PAPI_L2_DCR  0x80000044  No   Level 2 data cache reads
PAPI_L3_DCR  0x80000045  No   Level 3 data cache reads
PAPI_L2_DCW  0x80000047  No   Level 2 data cache writes
PAPI_L3_DCW  0x80000048  No   Level 3 data cache writes
PAPI_L2_ICH  0x8000004a  No   Level 2 instruction cache hits
PAPI_L2_ICA  0x8000004d  No   Level 2 instruction cache accesses
PAPI_L3_ICA  0x8000004e  No   Level 3 instruction cache accesses
PAPI_L2_ICR  0x80000050  No   Level 2 instruction cache reads
PAPI_L3_ICR  0x80000051  No   Level 3 instruction cache reads
PAPI_L2_TCA  0x80000059  Yes  Level 2 total cache accesses
PAPI_L3_TCA  0x8000005a  No   Level 3 total cache accesses
PAPI_L2_TCR  0x8000005c  Yes  Level 2 total cache reads
PAPI_L3_TCR  0x8000005d  Yes  Level 3 total cache reads
PAPI_L2_TCW  0x8000005f  No   Level 2 total cache writes
PAPI_L3_TCW  0x80000060  No   Level 3 total cache writes
PAPI_FDV_INS 0x80000063  No   Floating point divide instructions
PAPI_FP_OPS  0x80000066  Yes  Floating point operations
PAPI_SP_OPS  0x80000067  Yes  Floating point operations; optimized to count scaled single precision vector operations
PAPI_DP_OPS  0x80000068  Yes  Floating point operations; optimized to count scaled double precision vector operations
PAPI_VEC_SP  0x80000069  Yes  Single precision vector/SIMD instructions
PAPI_VEC_DP  0x8000006a  Yes  Double precision vector/SIMD instructions
PAPI_REF_CYC 0x8000006b  No   Reference clock cycles

================================================================================
  User Defined Events
================================================================================
    Name        Code    Deriv Description (Note)
--------------------------------------------------------------------------------
Of 50 available events, 17 are derived.

avail.c    

05 de juliol 2016

Intel Basic assembly notation SIMD

 
Loading
  • movupd xmm0 ... (SSE move unaligned packed double into 128-bit )
  • vmovaps ymm0 ... (AVX move aligned packed single into 256-bit)

Operating
  • –vaddpd ymm1 ymm2 (AVX add packed double 256-bit)
  • –addsd(SSE Add scalar doubles–SSE, but NOT vector op!)

KEY
  • – v = AVX
  • – p, s = packed, scalar
  • – u, a = unaligned, aligned
  • – s, d = single, double


Source: http://www.cac.cornell.edu/education/training/ParallelFall2012 /Vectorization.pdf

04 de juliol 2016

Intel Data Alignment




  • SSE2 16 Byte
  • AVX 32 Bytes
  • Xeon Phi 64 Bytes


Alignment increases the efficiency of data loads and stores to and from the processor. When targeting the Intel® Supplemental Streaming Extensions 2 (Intel® SSE 2) platforms, use 16-byte alignment that facilitates the use of SSE-aligned load instructions. When targeting the Intel® Advanced Vector Extensions (Intel® AVX) instruction set, try to align data on a 32-byte boundary. (See Improving Performance by Aligning Data.) For Intel® Xeon Phi™ coprocessors, memory movement is optimal on 64-byte boundaries. (See Data Alignment to Assist Vectorization.)


https://software.intel.com/en-us/articles/explicit-vector-programming-best-known-methods

Intel Data Alignment




  • SSE2 16 Byte
  • AVX 32 Bytes
  • Xeon Phi 64 Bytes


Alignment increases the efficiency of data loads and stores to and from the processor. When targeting the Intel® Supplemental Streaming Extensions 2 (Intel® SSE 2) platforms, use 16-byte alignment that facilitates the use of SSE-aligned load instructions. When targeting the Intel® Advanced Vector Extensions (Intel® AVX) instruction set, try to align data on a 32-byte boundary. (See Improving Performance by Aligning Data.) For Intel® Xeon Phi™ coprocessors, memory movement is optimal on 64-byte boundaries. (See Data Alignment to Assist Vectorization.)


https://software.intel.com/en-us/articles/explicit-vector-programming-best-known-methods

19 de desembre 2013

Python 2.7 Compile

How to compile Python with static and dynamic libraries in a custom folder with UTF-16 enabled.

./configure --prefix=$HOME/usr/local --enable-shared --enable-unicode=ucs4 --with-pydebug
 
Otherwise you could get the following exception explained below:

http://docs.python.org/2.7/faq/extending.html#when-importing-module-x-why-do-i-get-undefined-symbol-pyunicodeucs2

25 d’octubre 2013

netCDF 4.3.0 with HDF4, HDF5 and parallel

The following arguments are required for netCDF in order to compile the code. This will allow the support of HDF4, HDF5 and parallel IO.

Compile HDF4 - 4.2.9

./configure --enable-shared --disable-netcdf --disable-fortran --prefix=/usr/local
 
Compile HDF5  - 1.8.11

CC=mpicc ./configure --enable-parallel --prefix=/usr/local --with-zlib=/usr/include --enable-hl --enable-shared

Compile netCDF - 4.3.0 with HDF4 and HDF5 with parallel

CPPFLAGS="-I/usr/local/include -I/usr/include/hdf" CXXFLAGS=-I"/usr/local/include -I/usr/include/hdf" FFFLAGS="-I/usr/local/include -I/usr/include/hdf" FCFLAGS="-I/usr/local/include -I/usr/include/hdf" LDFLAGS=-L/usr/local/lib FC=mpif90 CXX=mpicxx CC=mpicc ./configure --enable-hdf4 --enable-netcdf4 --enable-shared --enable-dap

11 d’abril 2012

Prolog Random

Petita nota de com cridar a la funció random amb Prolog:

A is random(23).

Enllaç a la doc