Skip to content
Snippets Groups Projects
  1. May 25, 2016
  2. May 24, 2016
  3. May 23, 2016
  4. May 20, 2016
    • Shiyin Kang's avatar
      Fix bug: static link to MKL failed. · 95010297
      Shiyin Kang authored
      $ ./configure --mkl-root=/opt/intel/mkl --static-math=yes
      ...
      Configuring MKL library directory: Found: /opt/intel/mkl/lib/intel64
      MKL configured with threading: sequential, libs:  -Wl,--start-group /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/mkl/lib/intel64/libmkl_core.a /opt/intel/mkl/lib/intel64/libmkl_sequential.a -Wl,--end-group
      MKL include directory configured as: /opt/intel/mkl/include
      Configuring MKL threading as sequential
      MKL threading libraries configured as   -lpthread -lm
      Using Intel MKL as the linear algebra library.
      /opt/intel/mkl/lib/intel64/libmkl_core.a(mkl_memory_patched.o): In function `mkl_serv_set_memory_limit':
      mkl_memory.c:(.text+0x49c): undefined reference to `dlsym'
      mkl_memory.c:(.text+0x4b2): undefined reference to `dlsym'
      mkl_memory.c:(.text+0x4c8): undefined reference to `dlsym'
      /opt/intel/mkl/lib/intel64/libmkl_core.a(mkl_memory_patched.o): In function `mkl_serv_allocate':
      mkl_memory.c:(.text+0x1251): undefined reference to `dlsym'
      mkl_memory.c:(.text+0x1267): undefined reference to `dlsym'
      ...
      95010297
    • Pavel Denisov's avatar
      Add missing dependencies to Makefiles · 5b3fccd5
      Pavel Denisov authored
      5b3fccd5
  5. May 19, 2016
    • Daniel Povey's avatar
    • Shiyin Kang's avatar
      1fdfed4c
    • Shiyin Kang's avatar
      2 CUDA kernels for TraceMatMat with/without transpose for all matrix size. · 70df8813
      Shiyin Kang authored
      New:
      LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<float>, for dim = 1024, speed was 10.1076 gigaflops.
      LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<float> [transposed], for dim = 1024, speed was 11.8711 gigaflops.
      LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<double>, for dim = 1024, speed was 7.10019 gigaflops.
      LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<double> [transposed], for dim = 1024, speed was 7.81977 gigaflops.
      
      Old:
      LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<float>, for dim = 1024, speed was 4.57783 gigaflops.
      LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<float> [transposed], for dim = 1024, speed was 7.96795 gigaflops.
      LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<double>, for dim = 1024, speed was 3.61182 gigaflops.
      LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<double> [transposed], for dim = 1024, speed was 6.39571 gigaflops.
      70df8813
    • Shiyin Kang's avatar
      9af66530
  6. May 18, 2016
    • Shiyin Kang's avatar
      A new copy transpose kernel with same performance as plain copy. · 13792af4
      Shiyin Kang authored
      LOG (TestCuMatrixTransposeCross():cu-matrix-speed-test.cc:91) For CuMatrix::TransposeCross<float>, for dim = 1024, speed was 14.0498 gigaflops.
      LOG (TestCuMatrixTransposeS():cu-matrix-speed-test.cc:72) For CuMatrix::TransposeS<float>, for dim = 1024, speed was 16.845 gigaflops.
      LOG (TestCuMatrixTransposeNS():cu-matrix-speed-test.cc:56) For CuMatrix::TransposeNS<float>, for dim = 1024, speed was 14.2464 gigaflops.
      LOG (TestCuMatrixTransposeCross():cu-matrix-speed-test.cc:91) For CuMatrix::TransposeCross<double>, for dim = 1024, speed was 10.4523 gigaflops.
      LOG (TestCuMatrixTransposeS():cu-matrix-speed-test.cc:72) For CuMatrix::TransposeS<double>, for dim = 1024, speed was 9.65529 gigaflops.
      LOG (TestCuMatrixTransposeNS():cu-matrix-speed-test.cc:56) For CuMatrix::TransposeNS<double>, for dim = 1024, speed was 8.52148 gigaflops.
      13792af4
    • Shiyin Kang's avatar
      Add code for cumatrix copy transpose benchmark · c765ba6e
      Shiyin Kang authored
      Add barrier for correct timing.
      
      Original performance:
      LOG (TestCuMatrixTransposeCross():cu-matrix-speed-test.cc:91) For CuMatrix::TransposeCross<float>, for dim = 1024, speed was 4.26727 gigaflops.
      LOG (TestCuMatrixTransposeS():cu-matrix-speed-test.cc:72) For CuMatrix::TransposeS<float>, for dim = 1024, speed was 5.97203 gigaflops.
      LOG (TestCuMatrixTransposeNS():cu-matrix-speed-test.cc:56) For CuMatrix::TransposeNS<float>, for dim = 1024, speed was 3.0816 gigaflops.
      LOG (TestCuMatrixTransposeCross():cu-matrix-speed-test.cc:91) For CuMatrix::TransposeCross<double>, for dim = 1024, speed was 3.95059 gigaflops.
      LOG (TestCuMatrixTransposeS():cu-matrix-speed-test.cc:72) For CuMatrix::TransposeS<double>, for dim = 1024, speed was 4.36189 gigaflops.
      LOG (TestCuMatrixTransposeNS():cu-matrix-speed-test.cc:56) For CuMatrix::TransposeNS<double>, for dim = 1024, speed was 2.39275 gigaflops.
      c765ba6e
    • wan guanglu's avatar
      correct the usage description · 0731b8f6
      wan guanglu authored
      0731b8f6
    • Shiyin Kang's avatar
      add barrier for correct timing. · 24b886a2
      Shiyin Kang authored
      24b886a2
    • kangshiyin's avatar
      a few more comments · a829139c
      kangshiyin authored
      a829139c
    • kangshiyin's avatar
    • sykang@sepc83's avatar
      A new CUDA kernel for CuMatrixBase<Real>::FindRowMaxId; · 074e0053
      sykang@sepc83 authored
      Old:
      LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<float>, for dim = 1024, speed was 3.99218 gigaflops.
      LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<double>, for dim = 1024, speed was 3.46283 gigaflops.
      
      New:
      LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<float>, for dim = 1024, speed was 66.2965 gigaflops.
      LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<double>, for dim = 1024, speed was 58.442 gigaflops.
      074e0053
  7. May 17, 2016
  8. May 16, 2016
  9. May 15, 2016
  10. May 13, 2016
  11. May 12, 2016
  12. May 11, 2016
  13. May 10, 2016
    • vesis84's avatar
      a4fff0d5
    • vesis84's avatar
      base/kaldi_error : refactoring the logging code · 24bef8dc
      vesis84 authored
      - some TODO's are to be decided:
        - Can we remove the: 'IsKaldiError()'? (It's very 'dirty' function. And it's used only in the table-I/O to suppress printing 'what' messages from KALDI_ERR. IMHO, it may not be a good idea to suppress this.)
        - With Kirill's log-handler, the log is sent and then there's no abort() for errors/asserts (seems like a bad idea, but it is the way it worked previously).
      24bef8dc
  14. May 09, 2016
  15. May 07, 2016
  16. May 05, 2016
  17. May 04, 2016
  18. May 03, 2016
Loading