## Usage ```javascript var lda = require( '@stdlib/nlp/lda' ); ``` #### lda( docs, K\[, options] ) [Latent Dirichlet Allocation][lda] via collapsed Gibbs sampling. To create a model, call the `lda` function by passing it an `array` of `strings` and the number of topics `K` that should be identified. ```javascript var model; var docs; docs = [ 'I loved you first', 'For one is both and both are one in love', 'You never see my pain', 'My love is such that rivers cannot quench', 'See a lot of pain, a lot of tears' ]; model = lda( docs, 2 ); // returns {} ``` After initialization, model parameters are estimated by calling the `.fit()` method, which performs collapsed Gibbs sampling. The model object contains the following methods: #### model.fit( iter, burnin, thin ) ```javascript model.fit( 1000, 100, 10 ); ``` The `iter` parameter denotes the number of sampling iterations. While a common choice, one thousand iterations might not always be appropriate. Empirical diagnostics can be used to assess whether the constructed Markov Chain has converged. `burnin` denotes the number of estimates that are thrown away at the beginning, whereas `thin` controls the number of estimates discarded in-between iterations. #### model.getTerms( k\[, no = 10] ) Returns the `no` terms with the highest probabilities for chosen topic `k`. ```javascript var words = model.getTerms( 0, 3 ); /* returns [ { 'word': 'both', 'prob': 0.06315008476532499 }, { 'word': 'pain', 'prob': 0.05515729517235543 }, { 'word': 'one', 'prob': 0.05486669737616135 } ] */ ```

## Examples ```javascript var sotu = require( '@stdlib/datasets/sotu' ); var roundn = require( '@stdlib/math/base/special/roundn' ); var stopwords = require( '@stdlib/datasets/stopwords-en' ); var lowercase = require( '@stdlib/string/lowercase' ); var lda = require( '@stdlib/nlp/lda' ); var speeches; var words; var terms; var model; var str; var i; var j; words = stopwords(); for ( i = 0; i < words.length; i++ ) { words[ i ] = new RegExp( '\\b'+words[ i ]+'\\b', 'gi' ); } speeches = sotu({ 'range': [ 1930, 2010 ] }); for ( i = 0; i < speeches.length; i++ ) { str = lowercase( speeches[ i ].text ); for ( j = 0; j < words.length; j++ ) { str = str.replace( words[ j ], '' ); } speeches[ i ] = str; } model = lda( speeches, 3 ); model.fit( 1000, 100, 10 ); for ( i = 0; i <= 80; i++ ) { str = 'Year: ' + (1930+i) + '\t'; str += 'Topic 1: ' + roundn( model.avgTheta.get( i, 0 ), -3 ) + '\t'; str += 'Topic 2: ' + roundn( model.avgTheta.get( i, 1 ), -3 ) + '\t'; str += 'Topic 3: ' + roundn( model.avgTheta.get( i, 2 ), -3 ); console.log( str ); } terms = model.getTerms( 0, 20 ); for ( i = 0; i < terms.length; i++ ) { terms[ i ] = terms[ i ].word; } console.log( 'Words most associated with first topic:\n ' + terms.join( ', ' ) ); terms = model.getTerms( 1, 20 ); for ( i = 0; i < terms.length; i++ ) { terms[ i ] = terms[ i ].word; } console.log( 'Words most associated with second topic:\n ' + terms.join( ', ' ) ); terms = model.getTerms( 2, 20 ); for ( i = 0; i < terms.length; i++ ) { terms[ i ] = terms[ i ].word; } console.log( 'Words most associated with third topic:\n ' + terms.join( ', ' ) ); ```