Skip to content

HAISampler/HiWIS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

High Weighted Itemset Sampling

Abstract: Nowadays, we are witnessing an explosion of weighted symbolic data to consider quantitative and/or qualitative measures. The exploration of such data is hard because of their structure and volume. They are often evaluated according to an aggregation function (product or sum) to find all the subsets that reflect the behavior of the data in a representative way. To overcome this problem in the data mining area, most state-of-the-art methods are based on the extraction of high utility itemsets, thus with the sum of utilities. Utility itemset extraction algorithms are methods for discovering knowledge in a database where the items are weighted. Their usefulness has been widely demonstrated in many real-world applications. The traditional algorithms return the set of all patterns with a utility above a minimum utility threshold which is difficult to fix, while top-k algorithms tend to lack diversity in the produced patterns. In this paper, we consider the sum and the product of the items' weight to evaluate the utility of a pattern in the transaction in which it is included. We propose a generic algorithm named HiWIS to sample itemsets where each itemset is drawn with a probability proportional to its aggregate utility in the database and under length constraints to avoid long and rare itemsets with low weighted items. The originality of our method stems from the fact that it combines length constraints with qualitative and quantitative utilities. Experiments show that HiWIS extracts thousands of high aggregate utility patterns in a few seconds from different databases.

About

A HAISampler extension to take into account the product utility

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages