Commits · b885535eff53245d6b0862bbb52a048068dbbc53 · Simon Will / kaldi-commonvoice

Sep 14, 2016

add speed test and unit test for DiffLogSoftmax · b885535e
Shiyin Kang authored 8 years ago

b885535e
mv diff log softmax code to CuMatrix · 7a525668
Shiyin Kang authored 8 years ago

7a525668
Merge pull request #1025 from galv/atomic-add · 13e5cc8f
Daniel Povey authored 8 years ago
```
Replace implementation of atomic addition.
```
13e5cc8f

Replace implementation of atomic addition. · 6f20b397

Daniel Galvez authored 8 years ago

Old version was based on atomicExch(), while this version uses CUDA's
built-in atomicAdd(), added in SM 2.0. When tested in isolation (test
code not provided in this commit), on a K10 (Kepler), the built-in
atomicAdd() is two times faster than the old version of atomic_add()
here, and on a 950M (Maxwell), 3 times faster.

Speed up to forward backward, however, is marginal for an
nnet3-chain-train call on the TEDLIUM version 1 dataset:

Times reported on a K10. Note speedup in BetaDashGeneralFrame(),
which is the only code calling the atomic add function.

New code:

[cudevice profile]
AddRows	0.468516s
AddVecVec	0.553152s
MulRowsVec	0.614542s
CuMatrix::SetZero	0.649105s
CopyRows	0.748831s
TraceMatMat	0.777907s
AddVecToRows	0.780592s
CuMatrix::Resize	0.850884s
AddMat	1.23867s
CuMatrixBase::CopyFromMat(from other CuMatrixBase)	2.04559s
AddDiagMatMat	2.18652s
AddMatVec	3.67839s
AlphaGeneralFrame	6.42574s
BetaDashGeneralFrame	8.69981s
AddMatMat	29.9714s
Total GPU time:	63.8273s (may involve some double-counting)
-----

Old code:

[cudevice profile]
AddRows	0.469031s
AddVecVec	0.553298s
MulRowsVec	0.615624s
CuMatrix::SetZero	0.658105s
CopyRows	0.750856s
AddVecToRows	0.782937s
TraceMatMat	0.786361s
CuMatrix::Resize	0.91639s
AddMat	1.23964s
CuMatrixBase::CopyFromMat(from other CuMatrixBase)	2.05253s
AddDiagMatMat	2.18863s
AddMatVec	3.68707s
AlphaGeneralFrame	6.42885s
BetaDashGeneralFrame	9.03617s
AddMatMat	29.9942s
Total GPU time:	64.3928s (may involve some double-counting)
-----

6f20b397

Sep 13, 2016
- Small change to lattice-determinize-pruned to call fst::Connect() after determinization. · 2e8e8494
  Daniel Povey authored 8 years ago
  
  2e8e8494
Sep 11, 2016
- Merge pull request #1016 from david-ryan-snyder/sid-fix-2016-08-22 · f8c69763
  Daniel Povey authored 8 years ago
  
  Updates to SRE08 example
  f8c69763
Sep 08, 2016
- Modify validate_dict_dir.pl to check for <eps> in lexicon. · eb49517c
  Daniel Povey authored 8 years ago
  
  eb49517c
- Merge pull request #1021 from vesis84/New-Parametric-ReLU · 98bf4e72
  Daniel Povey authored 8 years ago
  
  nnet1: adding <ParametricRelu> component,
  98bf4e72
- nnet1: adding <ParametricRelu> component, · 0cf63a88
  vesis84 authored 8 years ago
  
  0cf63a88
- Merge pull request #1019 from vijayaditya/report_improvements · 1d418828
  Daniel Povey authored 8 years ago
  
  nnet3/report : Added plotting capability for parameter differences.
  1d418828
Sep 07, 2016
- nnet3/report : Added plotting capability for parameter differences. · e984dab8
  Vijayaditya Peddinti authored 8 years ago
  
  e984dab8
Sep 06, 2016
- Merge pull request #1018 from vesis84/nnet1_blstm_update · 06e3f8d0
  Daniel Povey authored 8 years ago
  
  nnet1: minor cosmetic change,
  06e3f8d0
- nnet1: minor cosmetic change, · b789820c
  vesis84 authored 8 years ago
  
  b789820c
Sep 05, 2016
- Merge pull request #771 from guoguo12/multi-recipe · e9852e61
  Daniel Povey authored 8 years ago
  
  WIP: Multi-database English LVCSR recipe
  e9852e61
- Proofread recipe · c2829830
  Korbinian Riedhammer authored 8 years ago
  
  c2829830
- Use Tedlium release 2 scripts/data · 712a7d63
  Allen Guo authored 8 years ago
  
  712a7d63
- Start recipe · 8a0ddc1b
  Allen Guo authored 8 years ago
  
  8a0ddc1b
- Merge pull request #1017 from danpovey/remove_reverse · 7cf8616c
  Daniel Povey authored 8 years ago
  
  Removing little-used feature: time-reversed, and fwd-bkwd, decoding.
  7cf8616c
- Removing little-used feature: time-reversed, and forward-backward, decoding. · 5dfa20aa
  Daniel Povey authored 8 years ago
  
  5dfa20aa
Sep 02, 2016
- sid-fix: updating egs/sre08/v1/README · b41ac1c2
  David Snyder authored 8 years ago
  
  b41ac1c2
- sid-fix: fixing memory requirement in run.sh · 7264a1fd
  David Snyder authored 8 years ago
  
  7264a1fd
- sid-fix: Adding i-vector length normalization to test i-vectors in egs/sre08/v1/run.sh. · c0e65938
  David Snyder authored 8 years ago
  
  c0e65938
Sep 01, 2016

Daniel Povey authored 8 years ago

Various unrelated fixes: add --iter options to TIMIT sclite scoring; improve how syncfiles are removed in queue.pl; minor cosmetic and efficiency improvements in nnet3 code.

f0fab215

Aug 31, 2016

Merge pull request #1005 from psmit/extract-wav-perturb · ed674ed5
Daniel Povey authored 8 years ago
```
Extract wav - perturb_data_dir_speed.sh implementation
```
ed674ed5

Make wav-copy accept both xspecifiers and xfilenames · 278fcbe8

Peter Smit authored 8 years ago

In scripts such as perturb-speed and perturb-volume scp lines are
tranformed into piped command with the appropropriate sox command. The
case that the scp file has file offsets was not handled. This commit
both generalizes the wav-copy command to work also on xfilenames and
fixes the two perturb scripts to use this command in case of file
offsets.

278fcbe8

Aug 30, 2016

Merge pull request #1013 from kangshiyin/log-softmax · b2c8497b
Daniel Povey authored 8 years ago
```
Speed up log softmax
```
b2c8497b
comment about aliasing in AddMatMatDivMat. · 81e20c4c
Shiyin Kang authored 8 years ago

81e20c4c

reimpl log softmax · 4c1a86d8

Shiyin Kang authored 8 years ago

New: For CuMatrix::LogSoftmax<float>, for dim = 16, speed was 0.0138019 gigaflops.
Old: For CuMatrix::LogSoftmax<float>, for dim = 16, speed was 0.0133804 gigaflops.
New: For CuMatrix::LogSoftmax<float>, for dim = 32, speed was 0.056202 gigaflops.
Old: For CuMatrix::LogSoftmax<float>, for dim = 32, speed was 0.052121 gigaflops.
New: For CuMatrix::LogSoftmax<float>, for dim = 64, speed was 0.227829 gigaflops.
Old: For CuMatrix::LogSoftmax<float>, for dim = 64, speed was 0.186255 gigaflops.
New: For CuMatrix::LogSoftmax<float>, for dim = 128, speed was 0.65638 gigaflops.
Old: For CuMatrix::LogSoftmax<float>, for dim = 128, speed was 0.65072 gigaflops.
New: For CuMatrix::LogSoftmax<float>, for dim = 256, speed was 2.15268 gigaflops.
Old: For CuMatrix::LogSoftmax<float>, for dim = 256, speed was 1.64888 gigaflops.
New: For CuMatrix::LogSoftmax<float>, for dim = 512, speed was 5.1179 gigaflops.
Old: For CuMatrix::LogSoftmax<float>, for dim = 512, speed was 3.85136 gigaflops.
New: For CuMatrix::LogSoftmax<float>, for dim = 1024, speed was 10.8209 gigaflops.
Old: For CuMatrix::LogSoftmax<float>, for dim = 1024, speed was 6.76963 gigaflops.
New: For CuMatrix::LogSoftmax<double>, for dim = 16, speed was 0.0133584 gigaflops.
Old: For CuMatrix::LogSoftmax<double>, for dim = 16, speed was 0.011373 gigaflops.
New: For CuMatrix::LogSoftmax<double>, for dim = 32, speed was 0.0533796 gigaflops.
Old: For CuMatrix::LogSoftmax<double>, for dim = 32, speed was 0.0528196 gigaflops.
New: For CuMatrix::LogSoftmax<double>, for dim = 64, speed was 0.202721 gigaflops.
Old: For CuMatrix::LogSoftmax<double>, for dim = 64, speed was 0.170107 gigaflops.
New: For CuMatrix::LogSoftmax<double>, for dim = 128, speed was 0.627234 gigaflops.
Old: For CuMatrix::LogSoftmax<double>, for dim = 128, speed was 0.722198 gigaflops.
New: For CuMatrix::LogSoftmax<double>, for dim = 256, speed was 1.89987 gigaflops.
Old: For CuMatrix::LogSoftmax<double>, for dim = 256, speed was 1.44478 gigaflops.
New: For CuMatrix::LogSoftmax<double>, for dim = 512, speed was 4.14807 gigaflops.
Old: For CuMatrix::LogSoftmax<double>, for dim = 512, speed was 3.37973 gigaflops.
New: For CuMatrix::LogSoftmax<double>, for dim = 1024, speed was 6.70849 gigaflops.
Old: For CuMatrix::LogSoftmax<double>, for dim = 1024, speed was 4.96657 gigaflops.

4c1a86d8

Merge pull request #1012 from vijayaditya/lstm_config_bugfix · 38d4c2af
Daniel Povey authored 8 years ago
```
nnet3: lstm/make_configs.py : Removed a bug where label_delay was not…
```
38d4c2af
nnet3: lstm/make_configs.py : Removed a bug where label_delay was not being... · f4b4e250
Vijayaditya Peddinti authored 8 years ago
```
nnet3: lstm/make_configs.py : Removed a bug where label_delay was not being added to the xentropy branch in chain models.
```
f4b4e250

Aug 29, 2016
- Merge pull request #1010 from naxingyu/missing-frame-shift-opts · 4f58f3a9
  Daniel Povey authored 8 years ago
  
  add frame_shift_opts to score
  4f58f3a9
- add frame_shift_opts to score · 78811399
  Xingyu Na authored 8 years ago
  
  78811399
Aug 27, 2016
- Merge pull request #1000 from danpovey/online_decoder_fix · dcc08c9d
  Daniel Povey authored 8 years ago
  
  Fix for online-decoder crash reported in PR #993 by @fanskyer
  dcc08c9d
- Merge pull request #1008 from kangshiyin/unit-test-AddMatMatDivMat · 1a89ed46
  Daniel Povey authored 8 years ago
  
  Add unit test for CuMatrix::AddMatMatDivMat
  1a89ed46
- Merge pull request #1009 from danijel3/master · a30a4d15
  Daniel Povey authored 8 years ago
  
  Fix oracle calculation for nnet3 scripts
  a30a4d15
- Removed lstm/decode from all the run_lstm scripts in egs and the lstm/decode script itself. · 95d27bff
  Danijel Koržinek authored 8 years ago
  
  95d27bff
- Fixing nnet3/decode scripts · f655f571
  Danijel Koržinek authored 8 years ago
  
  -removed --per-utt from nnet3/decode.sh -removed lstm/decode.sh from local/nnet3/run_lstm.sh -reverted adding oracle.sh to nnet3
  f655f571
- Fixed oracle calculation for nnet3 scripts. · 92f7379a
  Danijel Koržinek authored 8 years ago
  
  92f7379a
- add UnitTestCuMatrixAddMatMatDivMat · 7ff8e315
  Shiyin Kang authored 8 years ago
  
  7ff8e315
Aug 26, 2016
- Merge pull request #1007 from jfainberg/small_fix_validate_data · d3314308
  Daniel Povey authored 8 years ago
  
  Exit on incorrect num arguments in validate_data_dir.sh
  d3314308