IKAnalyzerOptimize

Base on IK Analyzer release 5.0. Optimize the original IKAnalyzer to support lucene 4.0 - 4.6. Upper version none test, please give me feed back.

I'm mainly used in Solr 4.4 and 4.10 .

New features:

A IKAnalyzerSolrFactory to support solr version higher than 4.0 and extends the TokenizerFactory to support filter factory.
Support both simplified Chinese and traditional Chinese.
A ChineseSpaceFilterFactory to remove space after "ShingleFilterFactory" For example, "计算机算法导论" when doing the shingle for auto-complete, It will generate terms "计算机"， “计算机算法”， “计算机算法导论”， “算法”， “算法导论”，“导论” So, it we input the prefix of the whole input term like "计算机算", we want it return "计算机算法"， “计算机算法导论” which not contain space between phrases.
English word segmentation for possessive case. "apple's" will be "apple's" and "apple". "its'" will be "its"
For solr version upper than 4.8, class "org.apache.lucene.util.AttributeSource.AttributeFactory" to "org.apache.lucene.util.AttributeFactory"

Example in solr schema.xml:

<fieldType name="text_ik" class="solr.TextField" positionIncrementGap="100" >
  <analyzer type="index" >
    <tokenizer class="org.wltea.analyzer.lucene.IKAnalyzerSolrFactory" useSmart="false" />
  </analyzer>
  <analyzer type="query" >
    <tokenizer class="org.wltea.analyzer.lucene.IKAnalyzerSolrFactory" useSmart="true" />
  </analyzer>
</fieldType>

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src/org/wltea/analyzer		src/org/wltea/analyzer
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IKAnalyzerOptimize

About

Uh oh!

Releases 1

Packages

Languages

License

JunfengYang/IKAnalyzerOptimize

Folders and files

Latest commit

History

Repository files navigation

IKAnalyzerOptimize

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages