https://wiki.cl.uni-heidelberg.de/bin/view/Main/Resources/WaCkypedia
A 2009 dump of the English Wikipedia (about 800 million tokens), in the same format as PukWaC, including POS/lemma information, as well as a full dependency parse (parsing performed with the MaltParser).
@toyota I like your proposal of Wikipedia dump.
( https://wiki.cl.uni-heidelberg.de/bin/view/Main/Resources/WaCkypedia )
We could also use English Gigaword or a British national Corpus, because is even larger than Wikipedia Dump (100 million words), but the diversity of Wikipedia Dump must be highter
We could also use the Dataset from the Task 11 Sem2013 that we are doing. It is already labelled and includes gold standard annotations for Web search results: