Lesk算法

Lesk算法迈克·莱斯克于1986年提出的词义消歧算法。[1]

概述

Lesk算法是基于词汇会与上下文有相同的主题这个假设,简化版的算法将有歧义的词汇在字典中的定义与上下文进行比较。修改后的算法被用于WordNet[2]。以下为一个实作范例:

  1. 对于有歧义的单字,计算同时出现在上下文与字典定义中词汇的数量。
  2. 选择次数最高的词汇解释。

用于说明该算法的常见的范例是词汇“pine cone”,以下提供的字典定义:

PINE 
1. kinds of evergreen tree with needle-shaped leaves
2. waste away through sorrow or illness
CONE 
1. solid body which narrows to a point
2. something of this shape whether solid or hollow
3. fruit of certain evergreen trees

显而易见交集次数最高的是Pine#1⋂Cone#3 = 2。

参见

参考文献

  1. ^ Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In SIGDOC '86: Proceedings of the 5th annual international conference on Systems documentation, pages 24-26, New York, NY, USA. ACM.
  2. ^ Satanjeev Banerjee and Ted Pedersen. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet页面存档备份,存于互联网档案馆, Lecture Notes in Computer Science; Vol. 2276, Pages: 136 - 145, 2002. ISBN 3-540-43219-1