Site
Home / Forum / General Discussion / Bug: WWE and Other Leagues do not scrape correctly anymore - Only Takes Odds

Bug: WWE and Other Leagues do not scrape correctly anymore - Only Takes Odds


Posted: 20 Apr 2022 14:03
JayBird
Posts: 59
Joined: 2022-01-23

1) Create a Folder called Sports > WWE
2) Inside folder add Season 2006
3) Inside folder have files inside the 2006 for the event (Note you can mock these using text files editing the extension to .mp4
4) Load Kodi
5) Install the KODI add on for the SportsDB
6) Add the Sports folder from step 1 as a source
7) Set as TVshow and Scraper as SportsDB
8) Scrape
9) Look at TVShows > WWE > Season 2006

It will only show ODD number episodes, and does not set the correct episode file against the episode displayed.

A look at the KODI Database fine inside user Data shows that all files have been picked up (if you look at the files table), however, if you look at the episodes table, you can see:

- Only odd numbers have been inserted
- the filename does not match the show title

idEpisode idFile c00 c01 c02 c03 c04 c05 c06 c07 c08 c09 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 idShow userrating idSeason

9316 9641 SmackDown #351 Singles Match 8972 2006-05-05 https://www.thesportsdb.com/images/media/event/thumb/dggg3p1650121193.jpg 5496 2006 41 -1 -1 -1 /Sports/WWE/Season 2006/WWE.2006-05-01.RAW.#675.mp4 1030 -1 135 1096

Kodi View: https://ibb.co/P6kJytv

Paste BIN: https://pastebin.com/MDqVnC4a

WWE.2006-08-14.RAW.#690.mp4 > Wont Work
WWE.2006-05-12.SmackDown.#352.mp4 > Will Work

Posted: 20 Apr 2022 20:26
JayBird
Posts: 59
Joined: 2022-01-23

.

Posted: 20 Apr 2022 20:53
JayBird
Posts: 59
Joined: 2022-01-23

.

Posted: 21 Apr 2022 10:56
JayBird
Posts: 59
Joined: 2022-01-23

.

zag
Posted: 21 Apr 2022 21:08

zag
Posts: 3,329
Joined: 2020-03-23

yes that simply means no match was found in our DB.

Best thing to do is post a filename example of one that is scraped successfully and one that does not.

Kodi debug log would also be useful

Posted: 21 Apr 2022 21:47
JayBird
Posts: 59
Joined: 2022-01-23

.

Posted: 21 Apr 2022 21:48
curswine
Posts: 868
Joined: 2020-06-17

I've been trying to see if I can help resolve this, by using a dummy library and nothing has been linking successfully ever since you made the changes:

Today I fixed the eventseason api method which was limited to 1,500 events. This meant that only recent events would scrape with leagues that have lots of episodes. I have made it unlimited.

I also reduced the size of the returned data by 90% which means that the scraper should be a lot faster.


Before this nothing pre-2015 in WWE could be scraped, but post-2015 was all accurate and worked fine.

Now scraping is like 50% successful for everything, but not accurate, in the sense that a file for RAW will be linked to a SmackDown show etc.

JayBird
Posted: 22 Apr 2022 16:05
JayBird
Posts: 59
Joined: 2022-01-23

.

Posted: 22 Apr 2022 19:59

zag
Posts: 3,329
Joined: 2020-03-23

Please post a single file that is miss matched.

Big log files are not useful to me in diagnosing the issue.

Posted: 22 Apr 2022 20:42
JayBird
Posts: 59
Joined: 2022-01-23

.

Posted: 22 Apr 2022 20:43
JayBird
Posts: 59
Joined: 2022-01-23

Full file sent in Discord

zag
Posted: 25 Apr 2022 13:53
JayBird
Posts: 59
Joined: 2022-01-23

.

Posted: 25 Apr 2022 13:53
JayBird
Posts: 59
Joined: 2022-01-23

.

Posted: 25 Apr 2022 13:55
JayBird
Posts: 59
Joined: 2022-01-23

.

Posted: 25 Apr 2022 21:48

zag
Posts: 3,329
Joined: 2020-03-23

I've spent a couple of hours debugging this tonight and think it has to do with the order the JSON is returned.

When I made the change the other day it meant that only some fields were returned rather than the full JSON episode list (3,500 items each with full event data, it crashed my browser just testing the api). So the change meant 90% less (un-needed) data and much faster scraping. All good and a welcome change and certainly needed now that we store a lot of data for WWE and other sports!!

But....

These were the correct fields, but importantly in the slightly different order. Looking at the scraper source code I can see it is calling these fields using REGEX which is notoriously breakable. I believe the XML scraper is stupid enough to need these in the correct order

I've re-ordered the JSON data now for that API key. Lets see if that makes any difference... I have tested and it worked for me.

Posted: 25 Apr 2022 21:54

zag
Posts: 3,329
Joined: 2020-03-23

For anyone interested in the details, on line 32 of the scraper (tsdb.xml) you can see the scraper calling a list of all episodes in the WWE league "TVShow". Now the regex calls each data point in order:


"idEvent": (?:"|null,)?([0-9]*).*?
"strEvent": (?:"|null,)?([^"]*).*?
"idLeague": (?:"|null,)?([0-9]*).*?
"strSeason": (?:"|null,)?([0-9]*).*?
"intRound": (?:"|null,)?([0-9]*).*?
"dateEvent": (?:"|null,)?([^"]*).*?


So it is looking for idEvent, strEvent, idLeague, strSeason, intRound, dateEvent (in that order).

Our API returned the the correct JSON but in a slightly different order, therefor messing up the matching.

In conclusion... I hate REGEX and someone really needs to write a python scraper

I'm pretty confident this is now fixed, can you test it out with a full scrape?


EDIT: There is still an issue with the event description data truncating. I will need to look at that another night.

Posted: 26 Apr 2022 09:09
JayBird
Posts: 59
Joined: 2022-01-23

Tested, Working > Your Awesome

Posted: 26 Apr 2022 09:10

zag
Posts: 3,329
Joined: 2020-03-23

I've managed to fix the description text being cut off.

The only remaining issue is the event/episode thumbnail not showing, are you seeing this also?

Posted: 18 Dec 2022 18:33
JayBird
Posts: 59
Joined: 2022-01-23

This is happening again!


/Wrestling/Sports/WWE/Season 2022/WWE.2022-10-08.Extreme.Rules.S2022E0.mkv
/Wrestling/Sports/WWE/Season 2022/WWE.2022-11-05.Crown.Jewel.S2022E0.mp4
/Wrestling/Sports/WWE/Season 2022/WWE.2022-11-26.Survivor.Series.S2022E0.mp4

ALL come back as NXT Level Up 46

Posted: 18 Dec 2022 18:41
JayBird
Posts: 59
Joined: 2022-01-23

Also Same issue for AEW

/Wrestling/Sports/AEW/Season 2022/AEW.2022-11-19.Full.Gear.S2022E0.mp4

is shown in Kodi a as Rampage 74

Posted: 18 Dec 2022 19:58
curswine
Posts: 868
Joined: 2020-06-17

Its probably because they've both surpassed 200 rounds.

I don't think any more WWE/AEW events need to be added this year, so if Zag could maybe bump the amount of rounds up to 300 then it could be fixed.

Posted: 31 Dec 2022 17:59
curswine
Posts: 868
Joined: 2020-06-17

Try again now, the number of rounds has been bumped up to 300.


Who is Online?

In total there are 68 users online :: 3 registered, 0 hidden and 65 guests (based on users active over the past 5 minutes) Most users ever online was 424 on Fri Nov 10, 2017 9:02 pm

About Us

Discussion forum for TheSportsDB.com site and related topics

Rules

- Be Polite
- Respect other users
- Always post log files with issues
- Try to be helpful
- No Piracy discussion

Showing 0 to 22 (Total: 22)