-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
For this simple query in Spanish, with empty stopwords (or with stopwords; it doesn't matter):
rake.generate("Cuantos años tienes?", {stopwords: []})
I get the error:
TypeError: Cannot read property 'forEach' of null
at phraseList.forEach
at Array.forEach
at Rake.calculatePhraseScores
If I omit the stopwords, then there is no error, but the word "años" is incorrectly split up:
rake.generate("Cuantos años tienes?")
=> [ 'ños tienes', 'Cuantos' ]
I think the code is treating the ñ as a word-break character, leading to the word being split in the second example, and leading to the single character ñ being used as a whole phrase in the function calculatePhraseScores, which leads to the error in the first example. The wordList regex seems to be looking only for 0-9a-z as acceptable word characters, which will be incomplete.
Metadata
Metadata
Assignees
Labels
No labels