Latent Semantic Indexing – Program Creek

Latent Semantic Indexing(LSI) is a common technique in natural language processing area. This article is about how LSI works by comparing the pure key-word-based search.

What is LSI?

Latent Semantic Indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called Singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. – wiki

For example, Paris and Hilton are associated with a woman instead of a city and a hotel, Tiger and Woods are associated with golf.

Regular Keyword Search vs. LSI

By using regular keyword search, a document either contains the given word or not, and there is no middle ground.

LSI adds an important step to the document indexing process. LSI examines a collection of documents to see which documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with less words in common to be less close.

When you search an LSI-indexed database, the search engine looks at similarity values it has calculated for every content word, and returns the documents that it thinks best fit the query. Because two documents may be semantically very close even if they do not share a particular keyword, LSI does not require an exact match to return useful results. Where a plain keyword search will fail if there is no exact match, LSI will often return relevant documents that don’t contain the keyword at all.

An LSI Example

If we use LSI to index a collection of articles and the words â€œprogramâ€ and â€œcodeâ€ appear together frequently enough, the search algorithm will notice that the two terms are semantically close. A search for â€œprogramâ€ will therefore return a set of articles containing that phrase, but also articles that contain just the word â€œcodeâ€. LSI does not understand the word distance, but by examining a sufficient number of documents, it knows the two terms are related. It then uses that information to provide an expanded set of results with better recall than a plain keyword search.

The diagram below describe the effect between LSI and keyword search. W stands for a document.

Latent Semantic Indexing

Reference:

1. seobook
2. misconceptions

2 thoughts on “Latent Semantic Indexing”

ASU87

August 25, 2016 at 1:16 am

http://nasa-1.morosakato.com

jual obat nyeri sendi di tasikmalaya
http://nasa-2.morosakato.com

jual obat nyeri sendi di jakarta
http://nasa-3.morosaakto.com

jual obat nyeri sendi di pontianak
http://nasa-5.morosakato.com

jual obat nyeri sendi di samarinda
http://nasa-6.morosakato.com jual obat nyeri sendi di pekanbaru

http://nasa-7.morosakato.com

jual obat nyeri sendi di tangerang
http://nasa-8.morosakato.com
jual obat nyeri sendi di bekasi
http://nasa-9.morosakato.com
jual obat nyeri sendi di depok
http://supercantik.net/

jual cream pemutih wajah cream HN

http://bidan-cantik.com/ jual cream HN

http://distributortunggal.com/

jual cream pemutih wajah
http://prodakku.com/

jual cream HN

http://utamasehat.com/

jual cream HN

http://mekarjayamassage.com/

jual obat kesehatan

http://madusuper.com/ jual madu asli
http://pusatservicekomputer.com tempat service komputer bekasi
http://vendorkaoscikarang.blogspot.co.id tempat sablon kaos cikarang
Nikolay Stoyanov

April 8, 2016 at 8:06 am

Nice article. There are some indication that RankBrain algorithm will eliminate the need for LSI keywords. What is your view on that?

2 thoughts on “Latent Semantic Indexing”

Leave a Comment