Skip to content
Snippets Groups Projects
Commit 25000ee7 authored by Dan Povey's avatar Dan Povey
Browse files

Committing documentation changes RE neural nets.

git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@2612 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
parent 2507afbc
No related branches found
No related tags found
No related merge requests found
......@@ -617,10 +617,14 @@ a file called <DFN>lexicon.txt</DFN> which has the format
\endverbatim
Note: <DFN>lexicon.txt</DFN> will contain repeated entries for the same word,
on separate lines,
if we have multiple pronunciations for it. The current scripts currently don't
support pronunciation probabilities, but it wouldn't
be that hard to add that support (it's just a question of modifying the
scripts that produce <DFN>L.fst</DFN>).
if we have multiple pronunciations for it. If you want to use pronunciation
probabilities, instead of creating the file <DFN>lexicon.txt</DFN>, create a file
called <DFN>lexiconp.txt</DFN> that has the probability as the second field.
Note that it is a common practice to normalize the pronunciations probabilities so that
instead of summing to one, the most probable pronunciation for each word is one. This
tends to give better results. For a top-level script that runs with
pronunciation probabilities, search for <DFN>pp</DFN> in <DFN>egs/wsj/s5/run.sh</DFN>.
Notice that in this input there is no notion of word-position dependency,
i.e. no suffixes like <DFN>_B</DFN> and <DFN>_E</DFN>. This is because it is the
scripts <DFN>prepare_lang.sh</DFN> that adds those suffixes.
......
// doc/dnn.dox
// Copyright 2013 Johns Hopkins University (author: Daniel Povey)
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
// http://www.apache.org/licenses/LICENSE-2.0
// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.
namespace kaldi {
/**
\page dnn Deep Neural Networks in Kaldi
\section dnn_intro Introduction
Deep Neural Networks (DNNs) are the latest hot topic in speech recognition.
Since around 2010 many papers have been published in this area, and some of
the largest companies (e.g. Google, Microsoft) are starting to use DNNs in their
production systems.
An active area of research like this is difficult for a toolkit like Kaldi to
support well, because the state of the art changes constantly which means
code changes are required to keep up, and architectural decisions may need
to be rethought.
We currently have two separate codebases for deep neural nets in Kaldi. One
is located in code subdirectories nnet/ and nnetbin/, and is primiarly maintained
by Karel Vesely. The other is located in code subdirectories nnet-cpu/ and nnet-cpubin/,
and is primarily maintained by Daniel Povey (this code was originally based on an
earlier version of Karel's code, but has been extensively rewritten). Neither codebase
is more ``official'' than the other. Both are still being developed and we cannot
predict the long-term future, but in the immediate future our aim is to both have
the freedom to work on our respective codebases, and to borrow ideas from each other.
In the example directories such as egs/wsj/s5/, egs/rm/s5 and egs/swbd/s5, neural
net example scripts can be found. Karel's example scripts can be found in local/run_dnn.sh
or local/run_nnet.sh, and Dan's example scripts can be found in local/run_nnet_cpu.sh.
Before running those scripts, the first stages of ``run.sh'' in those directories must
be run in order to build the systems used for alignment.
We will soon have detailed documentation pages on the two neural net setups.
For now, we summarize some of the most important differences:
- Karel's code was written for single-threaded SGD training accelerated with a GPU;
Dan's code uses multiple CPUs each with multiple threads.
- Karel's code already supports discriminative training; Dan's code does not yet.
Aside from this, there are many minor differences in architecture.
We hope to soon add more documentation for these libraries.
Karel's version of the code has some slightly out-of-date documentation available at \ref nnet1.
*/
}
// doc/dnn_dan.dox
// Copyright 2013 Johns Hopkins University (author: Daniel Povey)
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
// http://www.apache.org/licenses/LICENSE-2.0
// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.
namespace kaldi {
/**
\page dnn_dan Deep Neural Networks in Kaldi (Dan's setup)
\section dnn_dan_intro
This documentaion covers Dan Povey's version of deep neural network code in Kaldi.
For an overview of all deep neural network code in Kaldi, see \ref dnn.
\section dnn_dan_code Code
This page is uder construction!
\section dnn_dan_algo Algorithms
\section dnn_dan_scripts Scripts
*/
}
......@@ -67,6 +67,7 @@
- \ref model
- \ref transform
- \ref feat
- \ref dnn
- \ref online_programs
- \ref tools
......
// doc/nnet.dox
// doc/nnet1.dox
// Copyright 2012 Karel Vesely
......@@ -18,9 +18,10 @@
namespace kaldi {
/**
\page nnet the Neural Network library
\page nnet1 The Neural Network library
\section nnet1_nnet Overview
\section nnet
Kaldi has a support for acoustic modelling by Multi-Layer Perceptron (MLP).
The folder ^/trunk/src/nnet contains the neural networks classes
and some additional classes, that are used during training.
......@@ -40,7 +41,8 @@ namespace kaldi {
the Bernoulli and Gaussian units.
\section design of nnet library
\section nnet1_design Design of the library
The key aspect of the design is ``as simple extensibility as possible''. Exactly for this reason there has been created two interfaces:
- \ref Component : i-face for a general part that will be used to compose neural network, which does not contain trainable parameters
- \ref UpdatableComponent : this i-face extends \ref Component by adding it methods needed for training parameters
......@@ -81,7 +83,8 @@ namespace kaldi {
- 3) add dynamic constructor call to factory-like function \ref Component::Read
- 4) implement the the interface \ref Component:: or \ref UpdatableComponent::
\section nnetbin
\section nnet1_nnetbin Binary level tools
The training tools are located in, the most important tools are:
- nnet-train-xent-hardlab-frmshuff : this tool is used during trainig, it optimizes the cross entropy of the posteriors and targets, while a Viterbi-path hard-labels are used as 1-of-M encoded targets, frame-level shuffling is done between the feature transform and the trained neural network.
- nnet-forward : this tool is used during decoding, it propagated the features by the neural network and stores them on the output. This tool is often used in the feature extraction pipeline of the decoder. If the machine, where the decoder is run, is not equipped with a suitable GPU, the CPU is used instead.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment