January | 2008

Archive for January, 2008

About Hidden Topic Markov Models (HTMM)

Published January 4, 2008 Uncategorized 3 Comments

One of the techniques that researchers in AI use to make inferences from input such as text or speech is the Hidden Markov Model (HMM) – “hidden” because the state of the process is not directly visible, so state transitions need to be inferred from how they influence the observed variables. HMMs are basically simple Bayesian dynamic networks. Markov chains are formed by parsing input such as text into paired sets, then making predictions about the likely sets that will follow, based on probability analysis. Amit Gruber and his colleagues have taken this idea and applied it to the idea of topic searches within a search engine.

The algorithm is called Hidden Topic Markov Models (HTMM) and the approach is to model the “topics of words in the document as a Markov chain”, according to Gruber. He explains that the algorithm assumes that all words in the same sentence have the same topic, and that successive sentences are more likely to have the same topics. This approach produces “better perplexity in modeling documents with only a modest increase in learning and inference complexity.” Amit Gruber is currently engaged in a research internship with Google Research, where he has produced a new implementation of the algorithm in C++.

See an explanation of the HTMM approach and a video of Amit’s seminar here.

T3

Archive for January, 2008

About Hidden Topic Markov Models (HTMM)

Follow Blog via Email

Archives

Pages

Blogroll

Blog Stats

Tag Cloud for T3