Apache Lucene Indexer Search with CJKAnalyzer


Apache Lucene Indexer Search with CJKAnalyzer


I am using Apache lucene Indexer Search to search text, and I am using
CJKAnalyzer. It search provided word by character, It means
If I Search for Japanese word "ぁxまn" , then its showing all
the words which is having any character of the provided Japanese word.
But I dont want this I want search whole word or the
word which is having above mentioned word.



e.g. If I indexed 3 words. i.e "ぁxまn" , "ぁxま", "まn"


case 1 : If I search for "ぁxまn" then it should only give one result.
case 2 : If I search for "ぁx" then it should give two result.



Now In my case If I search for the word "ぁxまn" then its giving three results which is wrong.



------------------- Indexing code ---------------------------------


writer = getIndexWriter();
List<Document> documents = new ArrayList<>();
Document document1 = createDocument(1, "ぁxまn", "Richard");
writer.addDocument(document1);
writer.commit();



private static Document createDocument(Integer id, String firstName, String lastName)
{
Document document = new Document();
document.add(new StringField("id", id.toString() , Field.Store.YES));
document.add(new TextField("firstName", firstName , Field.Store.YES));
document.add(new TextField("lastName", lastName , Field.Store.YES));
document.add(new TextField("website", website , Field.Store.YES));
return document;
}


private static IndexWriter createWriter() throws IOException
{
FSDirectory dir = FSDirectory.open(Paths.get(INDEX_DIR).toFile());
IndexWriterConfig config = new
IndexWriterConfig(Version.LUCENE_44,new CJKAnalyzer());
IndexWriter writer = new IndexWriter(dir, config);
return writer;
}



--------call to Search ------


TopDocs foundDocs2 = searchByFirstName("*ぁxまn*", searcher);
-------------------------------------------------------------
private static TopDocs searchByFirstName(String firstName, IndexSearcher searcher) throws Exception
{

MultiFieldQueryParser mqp = new MultiFieldQueryParser(new String{"firstName"}, new CJKAnalyzer());
mqp.setAllowLeadingWildcard(true);
Query q =mqp.parse(firstName);
TopDocs hits = searcher.search(q, 10);
return hits;
}





can you add your indexing code? what kind of fields do you use?
– dom
Jul 2 at 8:03





@dom I have added indexing code.
– OnkarG
Jul 3 at 4:18





alright and your search code? do you use the same analyzer for searching too?
– dom
Jul 3 at 6:59





@dom yes I am using same analyzer for searching.
– OnkarG
Jul 3 at 7:31





hmm two things: try to analyse your index with luke github.com/DmitryKey/luke . And why using a multifieldQueryParser?
– dom
Jul 3 at 7:36









By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

api-platform.com Unable to generate an IRI for the item of type

How to set up datasource with Spring for HikariCP?

Display dokan vendor name on Woocommerce single product pages