- May 23, 2016
- May 20, 2016
-
-
Daniel Povey authored
Fix bug: static link to MKL 11.3.2 failed.
-
Daniel Povey authored
Add dimension check in online-nnet3 decoding code, so we get more mea…
-
Daniel Povey authored
Add missing dependencies to Makefiles
-
Shiyin Kang authored
$ ./configure --mkl-root=/opt/intel/mkl --static-math=yes ... Configuring MKL library directory: Found: /opt/intel/mkl/lib/intel64 MKL configured with threading: sequential, libs: -Wl,--start-group /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/mkl/lib/intel64/libmkl_core.a /opt/intel/mkl/lib/intel64/libmkl_sequential.a -Wl,--end-group MKL include directory configured as: /opt/intel/mkl/include Configuring MKL threading as sequential MKL threading libraries configured as -lpthread -lm Using Intel MKL as the linear algebra library. /opt/intel/mkl/lib/intel64/libmkl_core.a(mkl_memory_patched.o): In function `mkl_serv_set_memory_limit': mkl_memory.c:(.text+0x49c): undefined reference to `dlsym' mkl_memory.c:(.text+0x4b2): undefined reference to `dlsym' mkl_memory.c:(.text+0x4c8): undefined reference to `dlsym' /opt/intel/mkl/lib/intel64/libmkl_core.a(mkl_memory_patched.o): In function `mkl_serv_allocate': mkl_memory.c:(.text+0x1251): undefined reference to `dlsym' mkl_memory.c:(.text+0x1267): undefined reference to `dlsym' ...
-
Pavel Denisov authored
-
- May 19, 2016
-
-
Daniel Povey authored
-
Daniel Povey authored
-
Daniel Povey authored
added utils/combine_ali_dirs.sh (fixes #553).
-
Daniel Povey authored
some cosmetic changes: add comments to RNNLM rescoring utilities to r…
-
Daniel Povey authored
Speed up CuMatrix<Real>::Transpose() and transposed copy from matrix
-
Daniel Povey authored
smbr: Fixed minor bug in generating diagnostics egs
-
xiaohui-zhang authored
-
Shiyin Kang authored
-
Daniel Povey authored
2 CUDA kernels for TraceMatMat with/without transpose for all matrix size.
-
Shiyin Kang authored
New: LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<float>, for dim = 1024, speed was 10.1076 gigaflops. LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<float> [transposed], for dim = 1024, speed was 11.8711 gigaflops. LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<double>, for dim = 1024, speed was 7.10019 gigaflops. LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<double> [transposed], for dim = 1024, speed was 7.81977 gigaflops. Old: LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<float>, for dim = 1024, speed was 4.57783 gigaflops. LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<float> [transposed], for dim = 1024, speed was 7.96795 gigaflops. LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<double>, for dim = 1024, speed was 3.61182 gigaflops. LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<double> [transposed], for dim = 1024, speed was 6.39571 gigaflops.
-
Shiyin Kang authored
-
- May 18, 2016
-
-
Daniel Povey authored
add new results for Multi-splice version of online recipe of Librispeech, including those on test set.
-
Shiyin Kang authored
LOG (TestCuMatrixTransposeCross():cu-matrix-speed-test.cc:91) For CuMatrix::TransposeCross<float>, for dim = 1024, speed was 14.0498 gigaflops. LOG (TestCuMatrixTransposeS():cu-matrix-speed-test.cc:72) For CuMatrix::TransposeS<float>, for dim = 1024, speed was 16.845 gigaflops. LOG (TestCuMatrixTransposeNS():cu-matrix-speed-test.cc:56) For CuMatrix::TransposeNS<float>, for dim = 1024, speed was 14.2464 gigaflops. LOG (TestCuMatrixTransposeCross():cu-matrix-speed-test.cc:91) For CuMatrix::TransposeCross<double>, for dim = 1024, speed was 10.4523 gigaflops. LOG (TestCuMatrixTransposeS():cu-matrix-speed-test.cc:72) For CuMatrix::TransposeS<double>, for dim = 1024, speed was 9.65529 gigaflops. LOG (TestCuMatrixTransposeNS():cu-matrix-speed-test.cc:56) For CuMatrix::TransposeNS<double>, for dim = 1024, speed was 8.52148 gigaflops.
-
Shiyin Kang authored
Add barrier for correct timing. Original performance: LOG (TestCuMatrixTransposeCross():cu-matrix-speed-test.cc:91) For CuMatrix::TransposeCross<float>, for dim = 1024, speed was 4.26727 gigaflops. LOG (TestCuMatrixTransposeS():cu-matrix-speed-test.cc:72) For CuMatrix::TransposeS<float>, for dim = 1024, speed was 5.97203 gigaflops. LOG (TestCuMatrixTransposeNS():cu-matrix-speed-test.cc:56) For CuMatrix::TransposeNS<float>, for dim = 1024, speed was 3.0816 gigaflops. LOG (TestCuMatrixTransposeCross():cu-matrix-speed-test.cc:91) For CuMatrix::TransposeCross<double>, for dim = 1024, speed was 3.95059 gigaflops. LOG (TestCuMatrixTransposeS():cu-matrix-speed-test.cc:72) For CuMatrix::TransposeS<double>, for dim = 1024, speed was 4.36189 gigaflops. LOG (TestCuMatrixTransposeNS():cu-matrix-speed-test.cc:56) For CuMatrix::TransposeNS<double>, for dim = 1024, speed was 2.39275 gigaflops.
-
Daniel Povey authored
-
Daniel Povey authored
base/kaldi_error : the error messages are no longer printed 2x
-
Daniel Povey authored
A new CUDA kernel for CuMatrixBase<Real>::FindRowMaxId;
-
vesis84 authored
- the binary can be replaced (so we could eventually append posteriors, features, etc.)
-
Jan "yenda" Trmal authored
align-equal-compiled.cc: correct the usage description
-
wan guanglu authored
-
Shiyin Kang authored
-
kangshiyin authored
-
kangshiyin authored
-
sykang@sepc83 authored
Old: LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<float>, for dim = 1024, speed was 3.99218 gigaflops. LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<double>, for dim = 1024, speed was 3.46283 gigaflops. New: LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<float>, for dim = 1024, speed was 66.2965 gigaflops. LOG (TestCuFindRowMaxId():cu-matrix-speed-test.cc:264) For CuMatrix::FindRowMaxId<double>, for dim = 1024, speed was 58.442 gigaflops.
-
- May 17, 2016
-
-
freewym authored
-
Daniel Povey authored
Update nnet1-to-raw-nnet.cc
-
Daniel Povey authored
Update perturb_data_dir_volume.sh
-
Vimal Manohar authored
Add seed for random number generator in utils/data/perturb_data_dir.sh
-
vesis84 authored
-
tal1974 authored
Now GetParams requires vector allocated already. Changes in nnet\nnet-various.h at Apr 21, 2016
-
vesis84 authored
-
Yiming Wang authored
-
- May 16, 2016
-
-
Daniel Povey authored
improve speed of split_data.sh; includes change to filter_scps.pl (thanks to remi francis for noticing the issue)
-