-
Notifications
You must be signed in to change notification settings - Fork 2
Initial attempt for a recommendation service based on the frequently purchased items by different users. #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sayoojbk
wants to merge
14
commits into
fcgl:master
Choose a base branch
from
sayoojbk:vowpal-wabbit
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
01d79fb
Merge pull request #1 from fcgl/master
sayoojbk b978caf
Add pymongo, add database models for tables, add dev directory with e…
1jeanpaul1 a6c7f75
Add missing tables, add missing keys
1jeanpaul1 abcb807
Merge branch 'implement-databaseCollections' of https://github.com/1j…
5faf336
Merge pull request #2 from fcgl/master
sayoojbk 06b9ae7
Merge branch 'master' of https://github.com/sayoojbk/recommendation i…
373989b
implemented vowpal wabbit code to be refactored for deployment
fc4a8c0
Merge pull request #3 from fcgl/master
sayoojbk 24d8e2b
Merge pull request #4 from fcgl/master
sayoojbk 5367ed0
added a recommendation service based on the frequently purhcased items.
3df222d
removed unwanted links.md file
34ee59d
Merge branch 'vowpal-wabbit' of https://github.com/sayoojbk/recommend…
5ae3eba
deleted some unnecessarily committed files.
977ba82
made changes to build fail issues.
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| - The /toggle endpoint updates the users recommendation based on the most popular items in the city. | ||
| If you look at the /toggle endpoint. That's what runs the process that generates the user recommendations. | ||
|
|
||
| - There is a boolean switch that changes the process that is ran (popular/machine learning). | ||
| We need to have your recommendation algorithm implementation ran everytime that endpoint is called. | ||
|
|
||
|
|
||
| ### So we need the following: | ||
|
|
||
| 1. Functions to query all the data you need for your algorithm (in order for this to work in production we need to be careful with memory. Does your algorithm algorithm work in chunks? Can it work by processing 10,000 rows of data at a time. Or does it need all the data at once? Another solution would be to use a cluster computing framework, which might be the better way to go about it) | ||
|
|
||
| 2. Function that takes in the queried data and begins the recommendation algorithm you've made | ||
|
|
||
| 3. Function that populates the UserRecommendation table and associates the UserRecommendation ID with it's corresponding Users. (when I was doing research it mentioned that some users are likely to get very similar recommendations. So in order to save data for production they would give the same recommendations to three users that are very similar. So in the implementation the UserRecommendation ID can be associated with more than 1 user. This doesn't have to be the case if you don't want, we can have a OnetoOne relationship with UserRecommendation and the User). How the popular recommendation currently works is: it creates one UserRecommendation object, and it gives that ID to every User in our database. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,266 @@ | ||
| # Copyright (c) Microsoft Corporation. All rights reserved. | ||
| # Licensed under the MIT License. | ||
|
|
||
| """ | ||
| This file provides a wrapper to run Vowpal Wabbit from the command line through python. | ||
sayoojbk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| It is not recommended to use this approach in production, there are python bindings that can be installed from the | ||
| repository or pip or the command line can be used. This is merely to demonstrate vw usage in the example notebooks. | ||
| """ | ||
|
|
||
| import os | ||
| from subprocess import run | ||
| from tempfile import TemporaryDirectory | ||
| import pandas as pd | ||
|
|
||
| from reco_utils.common.constants import ( | ||
| DEFAULT_USER_COL, | ||
| DEFAULT_ITEM_COL, | ||
| DEFAULT_RATING_COL, | ||
| DEFAULT_TIMESTAMP_COL, | ||
| DEFAULT_PREDICTION_COL, | ||
| ) | ||
|
|
||
|
|
||
| class VW: | ||
| """Vowpal Wabbit Class""" | ||
|
|
||
| def __init__( | ||
| self, | ||
| col_user=DEFAULT_USER_COL, | ||
| col_item=DEFAULT_ITEM_COL, | ||
| col_rating=DEFAULT_RATING_COL, | ||
| col_timestamp=DEFAULT_TIMESTAMP_COL, | ||
| col_prediction=DEFAULT_PREDICTION_COL, | ||
| **kwargs, | ||
| ): | ||
| """Initialize model parameters | ||
|
|
||
| Args: | ||
| col_user (str): user column name | ||
| col_item (str): item column name | ||
| col_rating (str): rating column name | ||
| col_timestamp (str): timestamp column name | ||
| col_prediction (str): prediction column name | ||
| """ | ||
|
|
||
| # create temporary files | ||
| self.tempdir = TemporaryDirectory() | ||
| self.train_file = os.path.join(self.tempdir.name, "train.dat") | ||
| self.test_file = os.path.join(self.tempdir.name, "test.dat") | ||
| self.model_file = os.path.join(self.tempdir.name, "vw.model") | ||
| self.prediction_file = os.path.join(self.tempdir.name, "prediction.dat") | ||
|
|
||
| # set DataFrame columns | ||
| self.col_user = col_user | ||
| self.col_item = col_item | ||
| self.col_rating = col_rating | ||
| self.col_timestamp = col_timestamp | ||
| self.col_prediction = col_prediction | ||
|
|
||
| self.logistic = "logistic" in kwargs.values() | ||
| self.train_cmd = self.parse_train_params(params=kwargs) | ||
| self.test_cmd = self.parse_test_params(params=kwargs) | ||
|
|
||
| @staticmethod | ||
| def to_vw_cmd(params): | ||
| """Convert dictionary of parameters to vw command line. | ||
|
|
||
| Args: | ||
| params (dict): key = parameter, value = value (use True if parameter is just a flag) | ||
|
|
||
| Returns: | ||
| list[str]: vw command line parameters as list of strings | ||
| """ | ||
|
|
||
| cmd = ["vw"] | ||
| for k, v in params.items(): | ||
| if v is False: | ||
| # don't add parameters with a value == False | ||
| continue | ||
|
|
||
| # add the correct hyphen to the parameter | ||
| cmd.append(f"-{k}" if len(k) == 1 else f"--{k}") | ||
| if v is not True: | ||
| # don't add an argument for parameters with value == True | ||
| cmd.append("{}".format(v)) | ||
|
|
||
| return cmd | ||
|
|
||
| def parse_train_params(self, params): | ||
| """Parse input hyper-parameters to build vw train commands | ||
|
|
||
| Args: | ||
| params (dict): key = parameter, value = value (use True if parameter is just a flag) | ||
|
|
||
| Returns: | ||
| list[str]: vw command line parameters as list of strings | ||
| """ | ||
|
|
||
| # make a copy of the original hyper parameters | ||
| train_params = params.copy() | ||
|
|
||
| # remove options that are handled internally, not supported, or test only parameters | ||
| invalid = [ | ||
| "data", | ||
| "final_regressor", | ||
| "invert_hash", | ||
| "readable_model", | ||
| "t", | ||
| "testonly", | ||
| "i", | ||
| "initial_regressor", | ||
| "link", | ||
| ] | ||
|
|
||
| for option in invalid: | ||
| if option in train_params: | ||
| del train_params[option] | ||
|
|
||
| train_params.update( | ||
| { | ||
| "d": self.train_file, | ||
| "f": self.model_file, | ||
| "quiet": params.get("quiet", True), | ||
| } | ||
| ) | ||
| return self.to_vw_cmd(params=train_params) | ||
|
|
||
| def parse_test_params(self, params): | ||
| """Parse input hyper-parameters to build vw test commands | ||
|
|
||
| Args: | ||
| params (dict): key = parameter, value = value (use True if parameter is just a flag) | ||
|
|
||
| Returns: | ||
| list[str]: vw command line parameters as list of strings | ||
| """ | ||
|
|
||
| # make a copy of the original hyper parameters | ||
| test_params = params.copy() | ||
|
|
||
| # remove options that are handled internally, ot supported or train only parameters | ||
| invalid = [ | ||
| "data", | ||
| "f", | ||
| "final_regressor", | ||
| "initial_regressor", | ||
| "test_only", | ||
| "invert_hash", | ||
| "readable_model", | ||
| "b", | ||
| "bit_precision", | ||
| "holdout_off", | ||
| "c", | ||
| "cache", | ||
| "k", | ||
| "kill_cache", | ||
| "l", | ||
| "learning_rate", | ||
| "l1", | ||
| "l2", | ||
| "initial_t", | ||
| "power_t", | ||
| "decay_learning_rate", | ||
| "q", | ||
| "quadratic", | ||
| "cubic", | ||
| "i", | ||
| "interactions", | ||
| "rank", | ||
| "lrq", | ||
| "lrqdropout", | ||
| "oaa", | ||
| ] | ||
| for option in invalid: | ||
| if option in test_params: | ||
| del test_params[option] | ||
|
|
||
| test_params.update( | ||
| { | ||
| "d": self.test_file, | ||
| "i": self.model_file, | ||
| "quiet": params.get("quiet", True), | ||
| "p": self.prediction_file, | ||
| "t": True, | ||
| } | ||
| ) | ||
| return self.to_vw_cmd(params=test_params) | ||
|
|
||
| def to_vw_file(self, df, train=True): | ||
| """Convert Pandas DataFrame to vw input format file | ||
|
|
||
| Args: | ||
| df (pd.DataFrame): input DataFrame | ||
| train (bool): flag for train mode (or test mode if False) | ||
| """ | ||
|
|
||
| output = self.train_file if train else self.test_file | ||
| with open(output, "w") as f: | ||
| # extract columns and create a new dataframe | ||
| tmp = df[[self.col_rating, self.col_user, self.col_item]].reset_index() | ||
|
|
||
| if train: | ||
| # we need to reset the rating type to an integer to simplify the vw formatting | ||
| tmp[self.col_rating] = tmp[self.col_rating].astype("int64") | ||
|
|
||
| # convert rating to binary value | ||
| if self.logistic: | ||
| max_value = tmp[self.col_rating].max() | ||
| tmp[self.col_rating] = tmp[self.col_rating].apply( | ||
| lambda x: 2 * round(x / max_value) - 1 | ||
| ) | ||
| else: | ||
| tmp[self.col_rating] = "" | ||
|
|
||
| # convert each row to VW input format (https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Input-format) | ||
| # [label] [tag]|[user namespace] [user id feature] |[item namespace] [movie id feature] | ||
| # label is the true rating, tag is a unique id for the example just used to link predictions to truth | ||
| # user and item namespaces separate features to support interaction features through command line options | ||
| for _, row in tmp.iterrows(): | ||
| f.write( | ||
| "{rating} {index}|user {userID} |item {itemID}\n".format( | ||
| rating=row[self.col_rating], | ||
| index=row["index"], | ||
| userID=row[self.col_user], | ||
| itemID=row[self.col_item], | ||
| ) | ||
| ) | ||
|
|
||
| def fit(self, df): | ||
| """Train model | ||
|
|
||
| Args: | ||
| df (pd.DataFrame): input training data | ||
| """ | ||
|
|
||
| # write dataframe to disk in vw format | ||
| self.to_vw_file(df=df) | ||
|
|
||
| # train model | ||
| run(self.train_cmd, check=True) | ||
|
|
||
| def predict(self, df): | ||
| """Predict results | ||
|
|
||
| Args: | ||
| df (pd.DataFrame): input test data | ||
| """ | ||
|
|
||
| # write dataframe to disk in vw format | ||
| self.to_vw_file(df=df, train=False) | ||
|
|
||
| # generate predictions | ||
| run(self.test_cmd, check=True) | ||
|
|
||
| # read predictions | ||
| return df.join( | ||
| pd.read_csv( | ||
| self.prediction_file, | ||
| delim_whitespace=True, | ||
| names=[self.col_prediction], | ||
| index_col=1, | ||
| ) | ||
| ) | ||
|
|
||
| def __del__(self): | ||
| self.tempdir.cleanup() | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.