Conversation
|
I'm having an issue where the scrapper only pulls down the first 200 games and formats things a little funky. Also its not pulling down any of the other regional sales just total sales. Example output: Rank,Name,Genre,Platform,Publisher,Developer,Vgchartz_Score,Critic_Score,User_Score,Total_Shipped,Total_Sales,NA_Sales,PAL_Sales,JP_Sales,Other_Sales,Release_Date,Last_Update |
|
Hi @rodri270 I'm speaking from memory, but I seem to recall that there is a json configuration file to configure the request size and paging (number of results x number of pages) and the fields that are retrieved. Have a look at: https://github.com/hechmik/vgchartzScrape/blob/master/cfg/resources.json The output format is the one the ofiginal software had CSV, but as the code uses pandas it should be easy to export it in another way. |
|
I appreciate the heads up thanks a lot. I was able to find that and make the changes from there. Now the issue I'm having is I'm getting "Unexpected error: (<class 'urllib.error.HTTPError'>, <HTTPError 429: 'Too Many Requests'>, <traceback object at 0x118b2d500>)" so I just gotta keep testing. Thanks again for the fast reply! |
|
429 is an error message from the web, probably by executing it with very extensive requests over time we are affecting its capacity and/or performance and that is why it returns this warning. Working around this loop, you can introduce some artificial latency between requests to avoid saturating the server and not receive the error message. Another option is to introduce this delay/sleep when the error is caught, in the form of an exception that momentarily stops traffic when the problem occurs. If you finally opt for any option, you can work on this same repository and join it in the same PR. |
Thanks @imirkin with your fix in the lambda function is working fine.
I've just run with Python 3.7 (MiniConda and run smooths) @huertaj2