Automatic labeling of topic models.
The alogirithm is described in Automatic Labeling of Multinomial Topic Models
We model the abstracts of
NIPS 2014(NIPS abstracts from 2008 to 2014 is available under
Meanwhile, we contrain the labels to be tagged as
JJ,NN and use the top 200 most informative labels.
>>> python label_topic.py --line_corpus_path datasets/nips-2014.dat --preprocessing wordlen tag --label_tags NN,NN JJ,NN --n_cand_labels 200 ... Topical words: -------------------- Topic 0: model data framework clustering information distributions two number world propose noise real work small Topic 1: learning algorithm time problem online regret information decision conditional new stochastic algorithms selection problems Topic 2: algorithm algorithms problem results learning optimal show function class functions graph bounds based general Topic 3: learning training networks data tasks features neural kernel performance classification model datasets feature deep Topic 4: matrix method sparse convex problems methods dimensional problem rank analysis propose regression norm gradient Topic 5: model models inference approach data linear based gaussian method methods process sampling structure time Topical labels: -------------------- Topic labels: Topic 0: neural population, inference algorithm, likelihood estimator, stochastic optimization, matrix recovery, paper develop, empirical study, covariance matrix Topic 1: bandit problem, near-optimal regret, function approximation, paper consider, general class, multi-armed bandit, value function, statistical learning Topic 2: logarithmic factor, statistical learning, convergence rate, communication cost, other hand, main result, solution quality, function approximation Topic 3: pascal voc, major challenge, natural language, paper introduce, object recognition, policy search, empirical study, classification accuracy Topic 4: low-rank tensor, low-rank matrix, matrix recovery, coordinate descent, problem finding, direction method, statistical learning, risk minimization Topic 5: inference algorithm, introduce novel, exponential family, probabilistic inference, neural population, value function, policy search, other hand
>>> python label_topic.py --line_corpus_path datasets/nips-2014.dat --preprocess wordlen tag --label_tags NN,NN
For more details:
>>> python label_topic.py --help
Please refer to
The current version goes through the following steps
intra topical coverageheuristic(which is now disabled)