- Sep 20, 2016
-
-
Joachim authored
Using --additive-signals without also setting --snrs and --start-times produces a segmentation fault when calling AddNoise (snr_vector and start_time_vector empty). Will check that these arguments are set. Alternatively, could set more sensible default values for snrs and start_times, e.g. 0 for all additive signals in both snrs and start_times?
-
Jan "yenda" Trmal authored
-
- Sep 14, 2016
-
-
Shiyin Kang authored
bench result: New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 16, speed was 0.0152883 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 16, speed was 0.00217375 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 32, speed was 0.0577221 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 32, speed was 0.00867094 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 64, speed was 0.267811 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 64, speed was 0.035306 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 128, speed was 0.878541 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 128, speed was 0.134737 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 256, speed was 2.8799 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 256, speed was 0.491975 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 512, speed was 6.20522 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 512, speed was 1.34159 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 1024, speed was 10.4197 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 1024, speed was 2.4438 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 2048, speed was 10.5138 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 2048, speed was 2.97796 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 4096, speed was 10.3679 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 4096, speed was 3.25972 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 16, speed was 0.0139596 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 16, speed was 0.00193458 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 32, speed was 0.0573372 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 32, speed was 0.0073193 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 64, speed was 0.197072 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 64, speed was 0.0282332 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 128, speed was 0.751801 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 128, speed was 0.111315 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 256, speed was 2.43203 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 256, speed was 0.394491 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 512, speed was 4.53031 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 512, speed was 0.930698 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 1024, speed was 5.43358 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 1024, speed was 1.52317 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 2048, speed was 5.47013 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 2048, speed was 1.84648 gigaflops. New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 4096, speed was 5.23873 gigaflops. Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 4096, speed was 1.87967 gigaflops. Conflicts: src/cudamatrix/cu-kernels-ansi.h src/cudamatrix/cu-kernels.h naming of diff log softmax
-
Shiyin Kang authored
-
Shiyin Kang authored
-
Daniel Galvez authored
Old version was based on atomicExch(), while this version uses CUDA's built-in atomicAdd(), added in SM 2.0. When tested in isolation (test code not provided in this commit), on a K10 (Kepler), the built-in atomicAdd() is two times faster than the old version of atomic_add() here, and on a 950M (Maxwell), 3 times faster. Speed up to forward backward, however, is marginal for an nnet3-chain-train call on the TEDLIUM version 1 dataset: Times reported on a K10. Note speedup in BetaDashGeneralFrame(), which is the only code calling the atomic add function. New code: [cudevice profile] AddRows 0.468516s AddVecVec 0.553152s MulRowsVec 0.614542s CuMatrix::SetZero 0.649105s CopyRows 0.748831s TraceMatMat 0.777907s AddVecToRows 0.780592s CuMatrix::Resize 0.850884s AddMat 1.23867s CuMatrixBase::CopyFromMat(from other CuMatrixBase) 2.04559s AddDiagMatMat 2.18652s AddMatVec 3.67839s AlphaGeneralFrame 6.42574s BetaDashGeneralFrame 8.69981s AddMatMat 29.9714s Total GPU time: 63.8273s (may involve some double-counting) ----- Old code: [cudevice profile] AddRows 0.469031s AddVecVec 0.553298s MulRowsVec 0.615624s CuMatrix::SetZero 0.658105s CopyRows 0.750856s AddVecToRows 0.782937s TraceMatMat 0.786361s CuMatrix::Resize 0.91639s AddMat 1.23964s CuMatrixBase::CopyFromMat(from other CuMatrixBase) 2.05253s AddDiagMatMat 2.18863s AddMatVec 3.68707s AlphaGeneralFrame 6.42885s BetaDashGeneralFrame 9.03617s AddMatMat 29.9942s Total GPU time: 64.3928s (may involve some double-counting) -----
-
- Sep 13, 2016
-
-
Daniel Povey authored
-
- Sep 08, 2016
-
-
vesis84 authored
-
- Sep 06, 2016
-
-
vesis84 authored
-
- Sep 05, 2016
-
-
Daniel Povey authored
-
- Sep 01, 2016
-
-
Daniel Povey authored
Various unrelated fixes: add --iter options to TIMIT sclite scoring; improve how syncfiles are removed in queue.pl; minor cosmetic and efficiency improvements in nnet3 code.
-
- Aug 31, 2016
-
-
Peter Smit authored
In scripts such as perturb-speed and perturb-volume scp lines are tranformed into piped command with the appropropriate sox command. The case that the scp file has file offsets was not handled. This commit both generalizes the wav-copy command to work also on xfilenames and fixes the two perturb scripts to use this command in case of file offsets.
-
- Aug 30, 2016
-
-
Shiyin Kang authored
-
Shiyin Kang authored
New: For CuMatrix::LogSoftmax<float>, for dim = 16, speed was 0.0138019 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 16, speed was 0.0133804 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 32, speed was 0.056202 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 32, speed was 0.052121 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 64, speed was 0.227829 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 64, speed was 0.186255 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 128, speed was 0.65638 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 128, speed was 0.65072 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 256, speed was 2.15268 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 256, speed was 1.64888 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 512, speed was 5.1179 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 512, speed was 3.85136 gigaflops. New: For CuMatrix::LogSoftmax<float>, for dim = 1024, speed was 10.8209 gigaflops. Old: For CuMatrix::LogSoftmax<float>, for dim = 1024, speed was 6.76963 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 16, speed was 0.0133584 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 16, speed was 0.011373 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 32, speed was 0.0533796 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 32, speed was 0.0528196 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 64, speed was 0.202721 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 64, speed was 0.170107 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 128, speed was 0.627234 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 128, speed was 0.722198 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 256, speed was 1.89987 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 256, speed was 1.44478 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 512, speed was 4.14807 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 512, speed was 3.37973 gigaflops. New: For CuMatrix::LogSoftmax<double>, for dim = 1024, speed was 6.70849 gigaflops. Old: For CuMatrix::LogSoftmax<double>, for dim = 1024, speed was 4.96657 gigaflops.
-
- Aug 27, 2016
-
-
Shiyin Kang authored
-
- Aug 26, 2016
-
-
Shiyin Kang authored
-
Shiyin Kang authored
-
- Aug 24, 2016
-
-
Shiyin Kang authored
-
Shiyin Kang authored
-
Shiyin Kang authored
-
- Aug 23, 2016
-
-
Dan Povey authored
-
Shiyin Kang authored
-
Shiyin Kang authored
-
- Aug 21, 2016
-
-
Daniel Povey authored
-
- Aug 17, 2016
-
-
freewym authored
-
- Aug 12, 2016
-
-
Daniel Povey authored
-
Tom Ko authored
-
- Aug 11, 2016
-
-
Daniel Povey authored
-
vesis84 authored
-
- Aug 10, 2016
-
-
Daniel Povey authored
-
Tom Ko authored
-
- Aug 09, 2016
-
-
Daniel Povey authored
-
vesis84 authored
- using GetVerboseLevel(), - avoiding 'WriteIntegerVector' for writing to KALDI_LOG by introducing: 'operator<< (std::ostream, std::vector<T>)' in kaldi-error.h
-
- Aug 08, 2016
-
-
Daniel Povey authored
Adding nnet-latgen-faster-parallel program for multi-threaded decoding with nnet3; refactoring the nnet3 decodable code.
-
vesis84 authored
-
vesis84 authored
-
vesis84 authored
-
- Aug 07, 2016
-
-
Dan Povey authored
-
- Aug 05, 2016
-
-
Daniel Povey authored
-
vesis84 authored
- introducing interface 'MultistreamComponent', - handles stream-lengths and stream-resets, - rewritten most of training tools 'nnet-train-lstm-streams', 'nnet-train-blstm-streams', - introducing 'RecurrentComponent' with simple forward recurrency. - the LSTM/BLSTM components have clipping presets we recently found helpful for BLSTM-CTC system. - renaming tools and components (removing 'streams' from names) - updating the scripts for generating lstm/blstm prototypes - updating 'rm' lstm/blstm examples
-