Abstract: Nowadays, we are witnessing an explosion of weighted symbolic data to consider quantitative and/or qualitative measures. The exploration of such data is hard because of their structure and volume. They are often evaluated according to an aggregation function (product or sum) to find all the subsets that reflect the behavior of the data in a representative way. To overcome this problem in the data mining area, most state-of-the-art methods are based on the extraction of high utility itemsets, thus with the sum of utilities. Utility itemset extraction algorithms are methods for discovering knowledge in a database where the items are weighted. Their usefulness has been widely demonstrated in many real-world applications. The traditional algorithms return the set of all patterns with a utility above a minimum utility threshold which is difficult to fix, while top-k algorithms tend to lack diversity in the produced patterns. In this paper, we consider the sum and the product of the items' weight to evaluate the utility of a pattern in the transaction in which it is included. We propose a generic algorithm named HiWIS to sample itemsets where each itemset is drawn with a probability proportional to its aggregate utility in the database and under length constraints to avoid long and rare itemsets with low weighted items. The originality of our method stems from the fact that it combines length constraints with qualitative and quantitative utilities. Experiments show that HiWIS extracts thousands of high aggregate utility patterns in a few seconds from different databases.
-
Notifications
You must be signed in to change notification settings - Fork 0
A HAISampler extension to take into account the product utility
License
HAISampler/HiWIS
About
A HAISampler extension to take into account the product utility
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published