- Sep 14, 2016
-
-
Shiyin Kang authored
-
Shiyin Kang authored
-
Daniel Povey authored
Replace implementation of atomic addition.
-
Daniel Galvez authored
Old version was based on atomicExch(), while this version uses CUDA's built-in atomicAdd(), added in SM 2.0. When tested in isolation (test code not provided in this commit), on a K10 (Kepler), the built-in atomicAdd() is two times faster than the old version of atomic_add() here, and on a 950M (Maxwell), 3 times faster. Speed up to forward backward, however, is marginal for an nnet3-chain-train call on the TEDLIUM version 1 dataset: Times reported on a K10. Note speedup in BetaDashGeneralFrame(), which is the only code calling the atomic add function. New code: [cudevice profile] AddRows 0.468516s AddVecVec 0.553152s MulRowsVec 0.614542s CuMatrix::SetZero 0.649105s CopyRows 0.748831s TraceMatMat 0.777907s AddVecToRows 0.780592s CuMatrix::Resize 0.850884s AddMat 1.23867s CuMatrixBase::CopyFromMat(from other CuMatrixBase) 2.04559s AddDiagMatMat 2.18652s AddMatVec 3.67839s AlphaGeneralFrame 6.42574s BetaDashGeneralFrame 8.69981s AddMatMat 29.9714s Total GPU time: 63.8273s (may involve some double-counting) ----- Old code: [cudevice profile] AddRows 0.469031s AddVecVec 0.553298s MulRowsVec 0.615624s CuMatrix::SetZero 0.658105s CopyRows 0.750856s AddVecToRows 0.782937s TraceMatMat 0.786361s CuMatrix::Resize 0.91639s AddMat 1.23964s CuMatrixBase::CopyFromMat(from other CuMatrixBase) 2.05253s AddDiagMatMat 2.18863s AddMatVec 3.68707s AlphaGeneralFrame 6.42885s BetaDashGeneralFrame 9.03617s AddMatMat 29.9942s Total GPU time: 64.3928s (may involve some double-counting) -----
-
- Sep 13, 2016
-
-
Daniel Povey authored
-
- Sep 11, 2016
-
-
Daniel Povey authored
Updates to SRE08 example
-
- Sep 08, 2016
-
-
Daniel Povey authored
-
Daniel Povey authored
nnet1: adding <ParametricRelu> component,
-
vesis84 authored
-
Daniel Povey authored
nnet3/report : Added plotting capability for parameter differences.
-
- Sep 07, 2016
-
-
Vijayaditya Peddinti authored
-
- Sep 06, 2016
-
-
Daniel Povey authored
nnet1: minor cosmetic change,
-
vesis84 authored
-
- Sep 05, 2016
-
-
Daniel Povey authored
WIP: Multi-database English LVCSR recipe
-
Korbinian Riedhammer authored
-
Allen Guo authored
-
Allen Guo authored
-
Daniel Povey authored
Removing little-used feature: time-reversed, and fwd-bkwd, decoding.
-
Daniel Povey authored
-
- Sep 02, 2016
-
-
David Snyder authored
-
David Snyder authored
-
David Snyder authored
-
- Sep 01, 2016
-
-
Daniel Povey authored
Various unrelated fixes: add --iter options to TIMIT sclite scoring; improve how syncfiles are removed in queue.pl; minor cosmetic and efficiency improvements in nnet3 code.
-
- Aug 31, 2016
-
-
Daniel Povey authored
Extract wav - perturb_data_dir_speed.sh implementation
-
Peter Smit authored
In scripts such as perturb-speed and perturb-volume scp lines are tranformed into piped command with the appropropriate sox command. The case that the scp file has file offsets was not handled. This commit both generalizes the wav-copy command to work also on xfilenames and fixes the two perturb scripts to use this command in case of file offsets.
-
- Aug 30, 2016
-
-
Daniel Povey authored
Speed up log softmax
-
Shiyin Kang authored
-
Shiyin Kang authored
New: For CuMatrix::LogSoftmax<float>, for dim = 16, speed was 0.0138019 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 16, speed was 0.0133804 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 32, speed was 0.056202 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 32, speed was 0.052121 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 64, speed was 0.227829 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 64, speed was 0.186255 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 128, speed was 0.65638 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 128, speed was 0.65072 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 256, speed was 2.15268 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 256, speed was 1.64888 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 512, speed was 5.1179 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 512, speed was 3.85136 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 1024, speed was 10.8209 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 1024, speed was 6.76963 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 16, speed was 0.0133584 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 16, speed was 0.011373 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 32, speed was 0.0533796 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 32, speed was 0.0528196 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 64, speed was 0.202721 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 64, speed was 0.170107 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 128, speed was 0.627234 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 128, speed was 0.722198 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 256, speed was 1.89987 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 256, speed was 1.44478 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 512, speed was 4.14807 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 512, speed was 3.37973 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 1024, speed was 6.70849 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 1024, speed was 4.96657 gigaflops.
-
Daniel Povey authored
nnet3: lstm/make_configs.py : Removed a bug where label_delay was not…
-
Vijayaditya Peddinti authored
nnet3: lstm/make_configs.py : Removed a bug where label_delay was not being added to the xentropy branch in chain models.
-
- Aug 29, 2016
-
-
Daniel Povey authored
add frame_shift_opts to score
-
Xingyu Na authored
-
- Aug 27, 2016
-
-
Daniel Povey authored
Fix for online-decoder crash reported in PR #993 by @fanskyer
-
Daniel Povey authored
Add unit test for CuMatrix::AddMatMatDivMat
-
Daniel Povey authored
Fix oracle calculation for nnet3 scripts
-
Danijel Koržinek authored
-
Danijel Koržinek authored
-removed --per-utt from nnet3/decode.sh -removed lstm/decode.sh from local/nnet3/run_lstm.sh -reverted adding oracle.sh to nnet3
-
Danijel Koržinek authored
-
Shiyin Kang authored
-
- Aug 26, 2016
-
-
Daniel Povey authored
Exit on incorrect num arguments in validate_data_dir.sh
-