Committing documentation changes RE neural nets.

git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@2612 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8

Committing documentation changes RE neural nets.
25000ee7 · Dan Povey · 2507afbc · 25000ee7 · 25000ee7 · 25000ee7
Commit 25000ee7 authored 11 years ago by Dan Povey
--- a/src/doc/data_prep.dox
+++ b/src/doc/data_prep.dox
@@ -617,10 +617,14 @@ a file called <DFN>lexicon.txt</DFN> which has the format
 \endverbatim
 Note: <DFN>lexicon.txt</DFN> will contain repeated entries for the same word,
 on separate lines,
-if we have multiple pronunciations for it.  The current scripts currently don't
-support pronunciation probabilities, but it wouldn't
-be that hard to add that support (it's just a question of modifying the
-scripts that produce <DFN>L.fst</DFN>).  
+if we have multiple pronunciations for it.  If you want to use pronunciation
+probabilities, instead of creating the file <DFN>lexicon.txt</DFN>, create a file
+called <DFN>lexiconp.txt</DFN> that has the probability as the second field.
+Note that it is a common practice to normalize the pronunciations probabilities so that 
+instead of summing to one, the most probable pronunciation for each word is one.  This
+tends to give better results.  For a top-level script that runs with
+pronunciation probabilities, search for <DFN>pp</DFN> in <DFN>egs/wsj/s5/run.sh</DFN>.
+
 Notice that in this input there is no notion of word-position dependency,
 i.e. no suffixes like <DFN>_B</DFN> and <DFN>_E</DFN>.  This is because it is the
 scripts <DFN>prepare_lang.sh</DFN> that adds those suffixes.

--- a/src/doc/dnn.dox
+++ b/src/doc/dnn.dox
+// doc/dnn.dox
+
+
+// Copyright 2013  Johns Hopkins University (author: Daniel Povey)
+
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+
+//  http://www.apache.org/licenses/LICENSE-2.0
+
+// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+// MERCHANTABLITY OR NON-INFRINGEMENT.
+// See the Apache 2 License for the specific language governing permissions and
+// limitations under the License.
+
+namespace kaldi {
+
+/**
+   \page dnn Deep Neural Networks in Kaldi
+
+  \section dnn_intro Introduction
+
+  Deep Neural Networks (DNNs) are the latest hot topic in speech recognition.
+  Since around 2010 many papers have been published in this area, and some of
+  the largest companies (e.g. Google, Microsoft) are starting to use DNNs in their
+  production systems.
+
+  An active area of research like this is difficult for a toolkit like Kaldi to
+  support well, because the state of the art changes constantly which means
+  code changes are required to keep up, and architectural decisions may need 
+  to be rethought.  
+
+  We currently have two separate codebases for deep neural nets in Kaldi.  One
+  is located in code subdirectories nnet/ and nnetbin/, and is primiarly maintained
+  by Karel Vesely.  The other is located in code subdirectories nnet-cpu/ and nnet-cpubin/,
+  and is primarily maintained by Daniel Povey (this code was originally based on an
+  earlier version of Karel's code, but has been extensively rewritten).  Neither codebase
+  is more ``official'' than the other.  Both are still being developed and we cannot
+  predict the long-term future, but in the immediate future our aim is to both have
+  the freedom to work on our respective codebases, and to borrow ideas from each other.
+
+  In the example directories such as egs/wsj/s5/, egs/rm/s5 and egs/swbd/s5, neural
+  net example scripts can be found.  Karel's example scripts can be found in local/run_dnn.sh
+  or local/run_nnet.sh, and Dan's example scripts can be found in local/run_nnet_cpu.sh.
+  Before running those scripts, the first stages of ``run.sh'' in those directories must
+  be run in order to build the systems used for alignment.
+
+  We will soon have detailed documentation pages on the two neural net setups.
+  For now, we summarize some of the most important differences:
+
+    - Karel's code was written for single-threaded SGD training accelerated with a GPU;
+      Dan's code uses multiple CPUs each with multiple threads.
+
+    - Karel's code already supports discriminative training; Dan's code does not yet.
+ 
+  Aside from this, there are many minor differences in architecture.
+
+  We hope to soon add more documentation for these libraries.
+  Karel's version of the code has some slightly out-of-date documentation available at \ref nnet1.
+
+*/
+
+
+}
--- a/src/doc/dnn_dan.dox
+++ b/src/doc/dnn_dan.dox
+// doc/dnn_dan.dox
+
+
+// Copyright 2013  Johns Hopkins University (author: Daniel Povey)
+
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+
+//  http://www.apache.org/licenses/LICENSE-2.0
+
+// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+// MERCHANTABLITY OR NON-INFRINGEMENT.
+// See the Apache 2 License for the specific language governing permissions and
+// limitations under the License.
+
+namespace kaldi {
+
+/**
+  \page dnn_dan Deep Neural Networks in Kaldi (Dan's setup)
+
+  \section dnn_dan_intro 
+
+  This documentaion covers Dan Povey's version of deep neural network code in Kaldi.
+  For an overview of all deep neural network code in Kaldi, see \ref dnn.
+
+  \section dnn_dan_code Code
+
+   This page is uder construction!
+
+  \section dnn_dan_algo Algorithms
+  
+
+  \section dnn_dan_scripts Scripts
+
+
+
+*/
+
+
+}
--- a/src/doc/mainpage.dox
+++ b/src/doc/mainpage.dox
@@ -67,6 +67,7 @@
   - \ref model
   - \ref transform
   - \ref feat
+   - \ref dnn
   - \ref online_programs
   - \ref tools


--- a/src/doc/nnet.dox
+++ b/src/doc/nnet.dox
-// doc/nnet.dox
+// doc/nnet1.dox


 // Copyright 2012 Karel Vesely
@@ -18,9 +18,10 @@

 namespace kaldi {
 /**
-  \page nnet the Neural Network library
+  \page nnet1 The Neural Network library
+
+  \section nnet1_nnet Overview

-  \section nnet
  Kaldi has a support for acoustic modelling by Multi-Layer Perceptron (MLP).
  The folder ^/trunk/src/nnet contains the neural networks classes
  and some additional classes, that are used during training.
@@ -40,7 +41,8 @@ namespace kaldi {
  the Bernoulli and Gaussian units.


-  \section design of nnet library
+  \section nnet1_design Design of the library
+
  The key aspect of the design is ``as simple extensibility as possible''. Exactly for this reason there has been created two interfaces: 
   - \ref Component : i-face for a general part that will be used to compose neural network, which does not contain trainable parameters
   - \ref UpdatableComponent : this i-face extends \ref Component by adding it methods needed for training parameters
@@ -81,7 +83,8 @@ namespace kaldi {
   - 3) add dynamic constructor call to factory-like function \ref Component::Read
   - 4) implement the the interface \ref Component:: or \ref UpdatableComponent::

-  \section nnetbin
+  \section nnet1_nnetbin Binary level tools
+
  The training tools are located in, the most important tools are: 
   - nnet-train-xent-hardlab-frmshuff : this tool is used during trainig, it optimizes the cross entropy of the posteriors and targets, while a Viterbi-path hard-labels are used as 1-of-M encoded targets, frame-level shuffling is done between the feature transform and the trained neural network.
   - nnet-forward : this tool is used during decoding, it propagated the features by the neural network and stores them on the output. This tool is often used in the feature extraction pipeline of the decoder. If the machine, where the decoder is run, is not equipped with a suitable GPU, the CPU is used instead.