High Weighted Itemset Sampling

Abstract: Nowadays, we are witnessing an explosion of weighted symbolic data to consider quantitative and/or qualitative measures. The exploration of such data is hard because of their structure and volume. They are often evaluated according to an aggregation function (product or sum) to find all the subsets that reflect the behavior of the data in a representative way. To overcome this problem in the data mining area, most state-of-the-art methods are based on the extraction of high utility itemsets, thus with the sum of utilities. Utility itemset extraction algorithms are methods for discovering knowledge in a database where the items are weighted. Their usefulness has been widely demonstrated in many real-world applications. The traditional algorithms return the set of all patterns with a utility above a minimum utility threshold which is difficult to fix, while top-k algorithms tend to lack diversity in the produced patterns. In this paper, we consider the sum and the product of the items' weight to evaluate the utility of a pattern in the transaction in which it is included. We propose a generic algorithm named HiWIS to sample itemsets where each itemset is drawn with a probability proportional to its aggregate utility in the database and under length constraints to avoid long and rare itemsets with low weighted items. The originality of our method stems from the fact that it combines length constraints with qualitative and quantitative utilities. Experiments show that HiWIS extracts thousands of high aggregate utility patterns in a few seconds from different databases.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
HiWIS		HiWIS
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

High Weighted Itemset Sampling

About

Uh oh!

Releases

Packages

Languages

License

HAISampler/HiWIS

Folders and files

Latest commit

History

Repository files navigation

High Weighted Itemset Sampling

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages