Commits · 4271544ca626c675aea18d4a658b33b96a26ba28 · Simon Will / kaldi-commonvoice

Sep 20, 2016

Avoid segfault in wav-reverberate (#1042) · 4271544c

Joachim authored 8 years ago

Using --additive-signals without also setting --snrs and --start-times
produces a segmentation fault when calling AddNoise (snr_vector and
start_time_vector empty). Will check that these arguments are set.

Alternatively, could set more sensible default values for snrs and
start_times, e.g. 0 for all additive signals in both snrs and
start_times?

4271544c

fixing the incorrect check for "which" command (#1043) · b0b71220
Jan "yenda" Trmal authored 8 years ago

b0b71220

Sep 14, 2016

single-kernel impl for diff log softmax · bc79ed49

Shiyin Kang authored 8 years ago

bench result:
New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 16, speed was 0.0152883 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 16, speed was 0.00217375 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 32, speed was 0.0577221 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 32, speed was 0.00867094 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 64, speed was 0.267811 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 64, speed was 0.035306 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 128, speed was 0.878541 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 128, speed was 0.134737 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 256, speed was 2.8799 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 256, speed was 0.491975 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 512, speed was 6.20522 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 512, speed was 1.34159 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 1024, speed was 10.4197 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 1024, speed was 2.4438 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 2048, speed was 10.5138 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 2048, speed was 2.97796 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 4096, speed was 10.3679 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<float>, for dim = 4096, speed was 3.25972 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 16, speed was 0.0139596 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 16, speed was 0.00193458 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 32, speed was 0.0573372 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 32, speed was 0.0073193 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 64, speed was 0.197072 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 64, speed was 0.0282332 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 128, speed was 0.751801 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 128, speed was 0.111315 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 256, speed was 2.43203 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 256, speed was 0.394491 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 512, speed was 4.53031 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 512, speed was 0.930698 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 1024, speed was 5.43358 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 1024, speed was 1.52317 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 2048, speed was 5.47013 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 2048, speed was 1.84648 gigaflops.
New: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 4096, speed was 5.23873 gigaflops.
Old: For CuMatrix::DiffLogSoftmaxPerRow<double>, for dim = 4096, speed was 1.87967 gigaflops.

Conflicts:
src/cudamatrix/cu-kernels-ansi.h
src/cudamatrix/cu-kernels.h

naming of diff log softmax

bc79ed49

add speed test and unit test for DiffLogSoftmax · b885535e
Shiyin Kang authored 8 years ago

b885535e
mv diff log softmax code to CuMatrix · 7a525668
Shiyin Kang authored 8 years ago

7a525668

Replace implementation of atomic addition. · 6f20b397

Daniel Galvez authored 8 years ago

Old version was based on atomicExch(), while this version uses CUDA's
built-in atomicAdd(), added in SM 2.0. When tested in isolation (test
code not provided in this commit), on a K10 (Kepler), the built-in
atomicAdd() is two times faster than the old version of atomic_add()
here, and on a 950M (Maxwell), 3 times faster.

Speed up to forward backward, however, is marginal for an
nnet3-chain-train call on the TEDLIUM version 1 dataset:

Times reported on a K10. Note speedup in BetaDashGeneralFrame(),
which is the only code calling the atomic add function.

New code:

[cudevice profile]
AddRows	0.468516s
AddVecVec	0.553152s
MulRowsVec	0.614542s
CuMatrix::SetZero	0.649105s
CopyRows	0.748831s
TraceMatMat	0.777907s
AddVecToRows	0.780592s
CuMatrix::Resize	0.850884s
AddMat	1.23867s
CuMatrixBase::CopyFromMat(from other CuMatrixBase)	2.04559s
AddDiagMatMat	2.18652s
AddMatVec	3.67839s
AlphaGeneralFrame	6.42574s
BetaDashGeneralFrame	8.69981s
AddMatMat	29.9714s
Total GPU time:	63.8273s (may involve some double-counting)
-----

Old code:

[cudevice profile]
AddRows	0.469031s
AddVecVec	0.553298s
MulRowsVec	0.615624s
CuMatrix::SetZero	0.658105s
CopyRows	0.750856s
AddVecToRows	0.782937s
TraceMatMat	0.786361s
CuMatrix::Resize	0.91639s
AddMat	1.23964s
CuMatrixBase::CopyFromMat(from other CuMatrixBase)	2.05253s
AddDiagMatMat	2.18863s
AddMatVec	3.68707s
AlphaGeneralFrame	6.42885s
BetaDashGeneralFrame	9.03617s
AddMatMat	29.9942s
Total GPU time:	64.3928s (may involve some double-counting)
-----

6f20b397

Sep 13, 2016
- Small change to lattice-determinize-pruned to call fst::Connect() after determinization. · 2e8e8494
  Daniel Povey authored 8 years ago
  
  2e8e8494
Sep 08, 2016
- nnet1: adding <ParametricRelu> component, · 0cf63a88
  vesis84 authored 8 years ago
  
  0cf63a88
Sep 06, 2016
- nnet1: minor cosmetic change, · b789820c
  vesis84 authored 8 years ago
  
  b789820c
Sep 05, 2016
- Removing little-used feature: time-reversed, and forward-backward, decoding. · 5dfa20aa
  Daniel Povey authored 8 years ago
  
  5dfa20aa
Sep 01, 2016

Daniel Povey authored 8 years ago

Various unrelated fixes: add --iter options to TIMIT sclite scoring; improve how syncfiles are removed in queue.pl; minor cosmetic and efficiency improvements in nnet3 code.

f0fab215

Aug 31, 2016

Make wav-copy accept both xspecifiers and xfilenames · 278fcbe8

Peter Smit authored 8 years ago

In scripts such as perturb-speed and perturb-volume scp lines are
tranformed into piped command with the appropropriate sox command. The
case that the scp file has file offsets was not handled. This commit
both generalizes the wav-copy command to work also on xfilenames and
fixes the two perturb scripts to use this command in case of file
offsets.

278fcbe8

Aug 30, 2016

comment about aliasing in AddMatMatDivMat. · 81e20c4c
Shiyin Kang authored 8 years ago

81e20c4c

reimpl log softmax · 4c1a86d8

Shiyin Kang authored 8 years ago

New: For CuMatrix::LogSoftmax<float>, for dim = 16, speed was 0.0138019 gigaflops.
Old: For CuMatrix::LogSoftmax<float>, for dim = 16, speed was 0.0133804 gigaflops.
New: For CuMatrix::LogSoftmax<float>, for dim = 32, speed was 0.056202 gigaflops.
Old: For CuMatrix::LogSoftmax<float>, for dim = 32, speed was 0.052121 gigaflops.
New: For CuMatrix::LogSoftmax<float>, for dim = 64, speed was 0.227829 gigaflops.
Old: For CuMatrix::LogSoftmax<float>, for dim = 64, speed was 0.186255 gigaflops.
New: For CuMatrix::LogSoftmax<float>, for dim = 128, speed was 0.65638 gigaflops.
Old: For CuMatrix::LogSoftmax<float>, for dim = 128, speed was 0.65072 gigaflops.
New: For CuMatrix::LogSoftmax<float>, for dim = 256, speed was 2.15268 gigaflops.
Old: For CuMatrix::LogSoftmax<float>, for dim = 256, speed was 1.64888 gigaflops.
New: For CuMatrix::LogSoftmax<float>, for dim = 512, speed was 5.1179 gigaflops.
Old: For CuMatrix::LogSoftmax<float>, for dim = 512, speed was 3.85136 gigaflops.
New: For CuMatrix::LogSoftmax<float>, for dim = 1024, speed was 10.8209 gigaflops.
Old: For CuMatrix::LogSoftmax<float>, for dim = 1024, speed was 6.76963 gigaflops.
New: For CuMatrix::LogSoftmax<double>, for dim = 16, speed was 0.0133584 gigaflops.
Old: For CuMatrix::LogSoftmax<double>, for dim = 16, speed was 0.011373 gigaflops.
New: For CuMatrix::LogSoftmax<double>, for dim = 32, speed was 0.0533796 gigaflops.
Old: For CuMatrix::LogSoftmax<double>, for dim = 32, speed was 0.0528196 gigaflops.
New: For CuMatrix::LogSoftmax<double>, for dim = 64, speed was 0.202721 gigaflops.
Old: For CuMatrix::LogSoftmax<double>, for dim = 64, speed was 0.170107 gigaflops.
New: For CuMatrix::LogSoftmax<double>, for dim = 128, speed was 0.627234 gigaflops.
Old: For CuMatrix::LogSoftmax<double>, for dim = 128, speed was 0.722198 gigaflops.
New: For CuMatrix::LogSoftmax<double>, for dim = 256, speed was 1.89987 gigaflops.
Old: For CuMatrix::LogSoftmax<double>, for dim = 256, speed was 1.44478 gigaflops.
New: For CuMatrix::LogSoftmax<double>, for dim = 512, speed was 4.14807 gigaflops.
Old: For CuMatrix::LogSoftmax<double>, for dim = 512, speed was 3.37973 gigaflops.
New: For CuMatrix::LogSoftmax<double>, for dim = 1024, speed was 6.70849 gigaflops.
Old: For CuMatrix::LogSoftmax<double>, for dim = 1024, speed was 4.96657 gigaflops.

4c1a86d8

Aug 27, 2016
- add UnitTestCuMatrixAddMatMatDivMat · 7ff8e315
  Shiyin Kang authored 8 years ago
  
  7ff8e315
Aug 26, 2016
- enable debug stride mode for other CUDA tests except nnet2::DropoutComponent · a506cac1
  Shiyin Kang authored 8 years ago
  
  a506cac1
- Counting ensures adjacent strides are different · 791ece6f
  Shiyin Kang authored 8 years ago
  
  791ece6f
Aug 24, 2016
- Fix stride bug in cuda_calc_group_pnorm_deriv · 14727629
  Shiyin Kang authored 8 years ago
  
  14727629
- fix stride bug in _calc_group_max_deriv · 0642ff5e
  Shiyin Kang authored 8 years ago
  
  0642ff5e
- fix typo and bugs. · 2eecf9ca
  Shiyin Kang authored 8 years ago
  
  2eecf9ca
Aug 23, 2016
- Fix for online-decoder crash reported in PR #993 by @fanskyer · 70e4f780
  Dan Povey authored 8 years ago
  
  70e4f780
- Reformat CUDA kernel code to strictly follow the 80-char-per-line rule · e0a99449
  Shiyin Kang authored 8 years ago
  
  e0a99449
- Add random stride mode for MallocPitch · ccd2b85e
  Shiyin Kang authored 8 years ago
  
  ccd2b85e
Aug 21, 2016
- Fix bug in log-softmax kernel [regarding matrices with different strides] · 3050becf
  Daniel Povey authored 8 years ago
  
  3050becf
Aug 17, 2016
- added assert for the matrix dimension agreement in CuMatrixBase::AddMatMatElements() · a6f58b39
  freewym authored 8 years ago
  
  a6f58b39
Aug 12, 2016
- Some code changes to make reported real-time factors more precise. · 467b91b1
  Daniel Povey authored 8 years ago
  
  467b91b1
- Adding more comments in wav-reverberate.cc · aa4e0a97
  Tom Ko authored 8 years ago
  
  aa4e0a97
Aug 11, 2016
- Clarifying usage message of lattice-combine, and removing unused option. · 22bfec33
  Daniel Povey authored 8 years ago
  
  22bfec33
- updating the 'rm' (B)LSTM results, · d2dc80a8
  vesis84 authored 8 years ago
  
  d2dc80a8
Aug 10, 2016
- Minor documentation change, regarding how to set GridEngine schedule interval · 82a7ae6e
  Daniel Povey authored 8 years ago
  
  82a7ae6e
- Adding more checks in input arguments · 8d69299d
  Tom Ko authored 8 years ago
  
  8d69299d
Aug 09, 2016
- Add file accidentally omitted from commit 86417db, nnet3-latgen-faster-parallel.cc · e1dd41de
  Daniel Povey authored 8 years ago
  
  e1dd41de
- integrating changes proposed by Dan, · 72ba09e9
  vesis84 authored 8 years ago
  
  - using GetVerboseLevel(), - avoiding 'WriteIntegerVector' for writing to KALDI_LOG by introducing: 'operator<< (std::ostream, std::vector<T>)' in kaldi-error.h
  72ba09e9
Aug 08, 2016
- Adding nnet-latgen-faster-parallel program for multi-threaded decoding with... · 0bea1be3
  Daniel Povey authored 8 years ago
  
  Adding nnet-latgen-faster-parallel program for multi-threaded decoding with nnet3; refactoring the nnet3 decodable code.
  0bea1be3
- introducing 'ReadData' for getting 1 valid sentence · 232e1a52
  vesis84 authored 8 years ago
  
  232e1a52
- adding bwd-compatibility for LSTM/BLSTM models · eee1830d
  vesis84 authored 8 years ago
  
  eee1830d
- fixup of travis compilation issue · b068f1bc
  vesis84 authored 8 years ago
  
  b068f1bc
Aug 07, 2016
- Several unrelated small fixes, mostly cosmetic. · 475df25e
  Dan Povey authored 8 years ago
  
  475df25e
Aug 05, 2016

Various core script updates related to the data-cleanup · 95fedf38
Daniel Povey authored 8 years ago

95fedf38

nnet1: redesigning LSTM, BLSTM code, · e2247f32

vesis84 authored 8 years ago

- introducing interface 'MultistreamComponent',
  - handles stream-lengths and stream-resets,
- rewritten most of training tools 'nnet-train-lstm-streams',
  'nnet-train-blstm-streams',
- introducing 'RecurrentComponent' with simple forward recurrency.
- the LSTM/BLSTM components have clipping presets we recently
  found helpful for BLSTM-CTC system.
- renaming tools and components (removing 'streams' from names)
- updating the scripts for generating lstm/blstm prototypes
- updating 'rm' lstm/blstm examples

e2247f32