FIX Thanks to @imirkin by Pelirrojo · Pull Request #2 · hechmik/vgchartzScrape

Pelirrojo · 2022-01-15T22:00:09Z

Thanks @imirkin with your fix in the lambda function is working fine.
I've just run with Python 3.7 (MiniConda and run smooths) @huertaj2

rodri270 · 2022-02-15T01:02:03Z

I'm having an issue where the scrapper only pulls down the first 200 games and formats things a little funky. Also its not pulling down any of the other regional sales just total sales.

Example output:

Rank,Name,Genre,Platform,Publisher,Developer,Vgchartz_Score,Critic_Score,User_Score,Total_Shipped,Total_Sales,NA_Sales,PAL_Sales,JP_Sales,Other_Sales,Release_Date,Last_Update
1,Tetris,,Series,Nintendo,Alexey Pajitnov,,,,496400000.0,,,,,,1989-07-31,2020-02-27 00:00:00
2,Pokemon,,Series,Nintendo,Game Freak,,,,402220000.0,,,,,,1998-09-28,2020-02-03 00:00:00
3,Call of Duty,,Series,Activision,Infinity Ward,,,,400000000.0,,,,,,2003-10-29,2020-02-03 00:00:00
4,Super Mario,,Series,Nintendo,Nintendo,,,,391450000.0,,,,,,1983-07-20,2020-02-20 00:00:00
5,Grand Theft Auto,,Series,Rockstar Games,Rockstar North,,,,370000000.0,,,,,,1998-03-27,2020-02-03 00:00:00
6,FIFA,,Series,EA Sports,Extended Play Productions (1991-1997),,,,325000000.0,,,,,,1993-12-15,2020-02-03 00:00:00

Pelirrojo · 2022-02-15T08:25:29Z

Hi @rodri270

I'm speaking from memory, but I seem to recall that there is a json configuration file to configure the request size and paging (number of results x number of pages) and the fields that are retrieved. Have a look at: https://github.com/hechmik/vgchartzScrape/blob/master/cfg/resources.json

The output format is the one the ofiginal software had CSV, but as the code uses pandas it should be easy to export it in another way.

rodri270 · 2022-02-15T21:15:53Z

I appreciate the heads up thanks a lot. I was able to find that and make the changes from there. Now the issue I'm having is I'm getting "Unexpected error: (<class 'urllib.error.HTTPError'>, <HTTPError 429: 'Too Many Requests'>, <traceback object at 0x118b2d500>)" so I just gotta keep testing. Thanks again for the fast reply!

Pelirrojo · 2022-02-16T13:45:15Z

429 is an error message from the web, probably by executing it with very extensive requests over time we are affecting its capacity and/or performance and that is why it returns this warning.

Working around this loop, you can introduce some artificial latency between requests to avoid saturating the server and not receive the error message.

Another option is to introduce this delay/sleep when the error is caught, in the form of an exception that momentarily stops traffic when the problem occurs.

If you finally opt for any option, you can work on this same repository and join it in the same PR.

FIX Thanks to @imirkin

3ff4d29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX Thanks to @imirkin#2

FIX Thanks to @imirkin#2
Pelirrojo wants to merge 1 commit intohechmik:masterfrom
Machine-Learning-Labs:master

Pelirrojo commented Jan 15, 2022

Uh oh!

rodri270 commented Feb 15, 2022

Uh oh!

Pelirrojo commented Feb 15, 2022

Uh oh!

rodri270 commented Feb 15, 2022

Uh oh!

Pelirrojo commented Feb 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Pelirrojo commented Jan 15, 2022

Uh oh!

rodri270 commented Feb 15, 2022

Uh oh!

Pelirrojo commented Feb 15, 2022

Uh oh!

rodri270 commented Feb 15, 2022

Uh oh!

Pelirrojo commented Feb 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants