- Sep 20, 2016
-
-
Jan "yenda" Trmal authored
-
ling0322 authored
-
Daniel Povey authored
Check that the `which` command exists
-
- Sep 19, 2016
-
-
Albert Vernon authored
check_dependencies.sh depends on `which`, but some distributions, such as those intended for use with Docker, do not include it. Check to see if is installed.
-
- Sep 17, 2016
-
-
Daniel Povey authored
-
Daniel Povey authored
-
Daniel Povey authored
Revert "introduce a new splice configuration for tdnn+xent on swbd as default…"
-
Daniel Povey authored
-
Daniel Povey authored
introduce a new splice configuration for tdnn+xent on swbd as default…
-
- Sep 16, 2016
-
-
Daniel Povey authored
fixed the usage of lmrescore_const_arpa.sh
-
薛丞宏 authored
-
- Sep 14, 2016
-
-
Daniel Povey authored
Speed up LogSoftmaxComponent::Backprop
-
Shiyin Kang authored
bench result: New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 16, speed was 0.0152883 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 16, speed was 0.00217375 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 32, speed was 0.0577221 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 32, speed was 0.00867094 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 64, speed was 0.267811 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 64, speed was 0.035306 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 128, speed was 0.878541 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 128, speed was 0.134737 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 256, speed was 2.8799 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 256, speed was 0.491975 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 512, speed was 6.20522 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 512, speed was 1.34159 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 1024, speed was 10.4197 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 1024, speed was 2.4438 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 2048, speed was 10.5138 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 2048, speed was 2.97796 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 4096, speed was 10.3679 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 4096, speed was 3.25972 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 16, speed was 0.0139596 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 16, speed was 0.00193458 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 32, speed was 0.0573372 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 32, speed was 0.0073193 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 64, speed was 0.197072 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 64, speed was 0.0282332 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 128, speed was 0.751801 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 128, speed was 0.111315 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 256, speed was 2.43203 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 256, speed was 0.394491 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 512, speed was 4.53031 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 512, speed was 0.930698 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 1024, speed was 5.43358 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 1024, speed was 1.52317 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 2048, speed was 5.47013 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 2048, speed was 1.84648 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 4096, speed was 5.23873 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 4096, speed was 1.87967 gigaflops. Conflicts: src/cudamatrix/cu-kernels-ansi.h src/cudamatrix/cu-kernels.h naming of diff log softmax
-
Shiyin Kang authored
-
Shiyin Kang authored
-
Daniel Povey authored
Replace implementation of atomic addition.
-
Daniel Galvez authored
Old version was based on atomicExch(), while this version uses CUDA's built-in atomicAdd(), added in SM 2.0. When tested in isolation (test code not provided in this commit), on a K10 (Kepler), the built-in atomicAdd() is two times faster than the old version of atomic_add() here, and on a 950M (Maxwell), 3 times faster. Speed up to forward backward, however, is marginal for an nnet3-chain-train call on the TEDLIUM version 1 dataset: Times reported on a K10. Note speedup in BetaDashGeneralFrame(), which is the only code calling the atomic add function. New code: [cudevice profile] AddRows 0.468516s AddVecVec 0.553152s MulRowsVec 0.614542s CuMatrix::SetZero 0.649105s CopyRows 0.748831s TraceMatMat 0.777907s AddVecToRows 0.780592s CuMatrix::Resize 0.850884s AddMat 1.23867s CuMatrixBase::CopyFromMat(from other CuMatrixBase) 2.04559s AddDiagMatMat 2.18652s AddMatVec 3.67839s AlphaGeneralFrame 6.42574s BetaDashGeneralFrame 8.69981s AddMatMat 29.9714s Total GPU time: 63.8273s (may involve some double-counting) ----- Old code: [cudevice profile] AddRows 0.469031s AddVecVec 0.553298s MulRowsVec 0.615624s CuMatrix::SetZero 0.658105s CopyRows 0.750856s AddVecToRows 0.782937s TraceMatMat 0.786361s CuMatrix::Resize 0.91639s AddMat 1.23964s CuMatrixBase::CopyFromMat(from other CuMatrixBase) 2.05253s AddDiagMatMat 2.18863s AddMatVec 3.68707s AlphaGeneralFrame 6.42885s BetaDashGeneralFrame 9.03617s AddMatMat 29.9942s Total GPU time: 64.3928s (may involve some double-counting) -----
-
- Sep 13, 2016
-
-
Daniel Povey authored
-
- Sep 11, 2016
-
-
Daniel Povey authored
Updates to SRE08 example
-
- Sep 08, 2016
-
-
Daniel Povey authored
-
Daniel Povey authored
nnet1: adding <ParametricRelu> component,
-
vesis84 authored
-
Daniel Povey authored
nnet3/report : Added plotting capability for parameter differences.
-
- Sep 07, 2016
-
-
Vijayaditya Peddinti authored
-
- Sep 06, 2016
-
-
Daniel Povey authored
nnet1: minor cosmetic change,
-
vesis84 authored
-
- Sep 05, 2016
-
-
Daniel Povey authored
WIP: Multi-database English LVCSR recipe
-
Korbinian Riedhammer authored
-
Allen Guo authored
-
Allen Guo authored
-
Daniel Povey authored
Removing little-used feature: time-reversed, and fwd-bkwd, decoding.
-
Daniel Povey authored
-
- Sep 02, 2016
-
-
David Snyder authored
-
David Snyder authored
-
David Snyder authored
-
- Sep 01, 2016
-
-
Daniel Povey authored
Various unrelated fixes: add --iter options to TIMIT sclite scoring; improve how syncfiles are removed in queue.pl; minor cosmetic and efficiency improvements in nnet3 code.
-
- Aug 31, 2016
-
-
Daniel Povey authored
Extract wav - perturb_data_dir_speed.sh implementation
-
Peter Smit authored
In scripts such as perturb-speed and perturb-volume scp lines are tranformed into piped command with the appropropriate sox command. The case that the scp file has file offsets was not handled. This commit both generalizes the wav-copy command to work also on xfilenames and fixes the two perturb scripts to use this command in case of file offsets.
-
- Aug 30, 2016
-
-
Daniel Povey authored
Speed up log softmax
-
Shiyin Kang authored
-