Skip to content

Base on IK Analyzer release 5.0. Optimize the original IKAnalyzer to support lucene 4.0 - 4.6. Upper version none test, please give me feed back

License

Notifications You must be signed in to change notification settings

JunfengYang/IKAnalyzerOptimize

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

IKAnalyzerOptimize

Base on IK Analyzer release 5.0. Optimize the original IKAnalyzer to support lucene 4.0 - 4.6. Upper version none test, please give me feed back.

I'm mainly used in Solr 4.4 and 4.10 .

New features:

  1. A IKAnalyzerSolrFactory to support solr version higher than 4.0 and extends the TokenizerFactory to support filter factory.
  2. Support both simplified Chinese and traditional Chinese.
  3. A ChineseSpaceFilterFactory to remove space after "ShingleFilterFactory" For example, "计算机算法导论" when doing the shingle for auto-complete, It will generate terms "计算机", “计算机 算法”, “计算机 算法 导论”, “算法”, “算法 导论”,“导论” So, it we input the prefix of the whole input term like "计算机算", we want it return "计算机算法", “计算机算法导论” which not contain space between phrases.
  4. English word segmentation for possessive case. "apple's" will be "apple's" and "apple". "its'" will be "its"
  5. For solr version upper than 4.8, class "org.apache.lucene.util.AttributeSource.AttributeFactory" to "org.apache.lucene.util.AttributeFactory"

Example in solr schema.xml:

<fieldType name="text_ik" class="solr.TextField" positionIncrementGap="100" >
  <analyzer type="index" >
    <tokenizer class="org.wltea.analyzer.lucene.IKAnalyzerSolrFactory" useSmart="false" />
  </analyzer>
  <analyzer type="query" >
    <tokenizer class="org.wltea.analyzer.lucene.IKAnalyzerSolrFactory" useSmart="true" />
  </analyzer>
</fieldType>

About

Base on IK Analyzer release 5.0. Optimize the original IKAnalyzer to support lucene 4.0 - 4.6. Upper version none test, please give me feed back

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages