Site
Home / Forum / Developers / Game Statistics

Game Statistics


Posted: 06 Oct 2020 19:59
pauld0051
Posts: 16
Joined: 2020-10-06

Hey there,

I am very new to development and on my journey I have discovered webscraping. But I am doubting that what I am doing is entirely legal according to the terms and conditions of scoreboard.com

I am not doing anything for profit - I am just intending on scraping outcomes of various sporting events. At the moment, I just went popular and chose the English Premier League (thinking this would have the most data available). I am happy to keep scraping away - as I doubt I am causing scoreboard.com any issues. But I figured there may be a more legal avenue and here I stumbled about The Sports DB.

The type of data I am looking for is up to 20 statistics at the end of the game and could include:
Goals (home)
Goals (away)
Possession
Goal Attempts
Shot on Goal
Shots off Goal
Blocked Shots
Free kicks
Corner Kicks
Offsides
Goalkeeper Saves
Fouls
Red Cards
Yellow Cards
Total Passes
Tackles
Attacks
Dangerous Attacks
Goals Against

This is stuff I've easily scraped - but if I can receive this data in other legal methods, I'd be happy to learn how.

Is there any such thing here? Or that you could direct me to?

As this is not a profit situation, it's a project for my course I am on, I am on an extreme budget of £0 (which is €0 as well as $0). Maybe that changes at a later stage, who knows. But right now, I am in that financial situation.

Happy to hear if you know of something that could suit my needs. Thanks kindly.

sorewinner
Posted: 06 Oct 2020 20:14

zag
Posts: 3,311
Joined: 2020-03-23

Welcome,

We don't offer those stats yet for soccer, but they are easy enough to add(and on the todo list) and could probably add them to the free tier. Let me have a look at our source API's.

I don't think there's anything wrong with scraping personally, but an API is more solid as it should never change.

Posted: 06 Oct 2020 20:16

zag
Posts: 3,311
Joined: 2020-03-23

Something like this?



Posted: 06 Oct 2020 20:20
pauld0051
Posts: 16
Joined: 2020-10-06

Yes, exactly like that one

Posted: 06 Oct 2020 20:26
pauld0051
Posts: 16
Joined: 2020-10-06

The only thing with scraping is it may go against the ToCs of the site. For example, the ToCs of scoreboard.com

Also, their headers change extremely regularly - making scraping a manual exercise sometimes.

Posted: 06 Oct 2020 21:50

zag
Posts: 3,311
Joined: 2020-03-23

OK since this was already on the todo list and I like new users requesting things (in the hope they stick around and help out) I've made some progress:

Event Example
https://www.thesportsdb.com/event/1032723

API Example
https://www.thesportsdb.com/api/v1/json/1/lookupeventstats.php?id=1032723

NOTE: For Moderators, you can manually sync the event using the sync icon next to the Event Statistics header on the event page for Soccer.

This is all still alpha and subject to change if needed, but it looks good to me.

pauld0051
Posted: 07 Oct 2020 07:13
pauld0051
Posts: 16
Joined: 2020-10-06

Great! Thank you for this! I will try and get this integrated as soon as I can (will aim for before this weekend's matches).

I will definitely try to stick around, but as I am quite new to this journey of coding, I might not be much help to begin with. Would love to learn though!

Thanks again - looking forward to checking this out.

Posted: 07 Oct 2020 07:32
pauld0051
Posts: 16
Joined: 2020-10-06

As a side, and this is where my noobyness comes in, I'm not quite sure how I am meant to get the data. I have two ways in mind. First, I can just manually hit the "world" button by each of the events and scrape from that (this is fine for me, I like this idea, it's a little manual, but at least I am sitting within copyright rules). Or, what is option B?

I have to be honest, I have done very little on APIs outside of Google Maps and that was being lead like a baby the whole way. If I went with the scraping idea, is this ok to begin with - honestly, I will get better

Posted: 07 Oct 2020 08:19
cydalby
Posts: 335
Joined: 2020-06-16

As a side, and this is where my noobyness comes in, I'm not quite sure how I am meant to get the data. I have two ways in mind. First, I can just manually hit the "world" button by each of the events and scrape from that (this is fine for me, I like this idea, it's a little manual, but at least I am sitting within copyright rules). Or, what is option B?

I have to be honest, I have done very little on APIs outside of Google Maps and that was being lead like a baby the whole way. If I went with the scraping idea, is this ok to begin with - honestly, I will get better


What language are you using? Be happy to give some pointers for api fetching

Posted: 07 Oct 2020 08:36
cydalby
Posts: 335
Joined: 2020-06-16

OK since this was already on the todo list and I like new users requesting things (in the hope they stick around and help out) I've made some progress:

Event Example
https://www.thesportsdb.com/event/1032723

API Example
https://www.thesportsdb.com/api/v1/json/1/lookupeventstats.php?id=1032723

NOTE: For Moderators, you can manually sync the event using the world icon next to the Event Statistics header on the event page for Soccer.

This is all still alpha and subject to change if needed, but it looks good to me.


Amazing, been wanting this for ages!

Doesn't look to be working with my apiKey, is this because its in alpha and locked down to "1"?


Posted: 07 Oct 2020 08:53

zag
Posts: 3,311
Joined: 2020-03-23

Yes only test key "1" until the data is finalized.

Normal users such as yourself @pauld0051 cannot force a manual event stats sync, only moderators can. I'll try to remember to do the premiership matches this weekend but others can also do it.

I'll also see if I can write a script to do previous events.

Depending on what language you use, it should have a JSON_Decode function which is a one liner to turn the data into a normal array that you can process. In PHP is JSON_Decode and in python you need to 'import json' then use json.loads function. It should be pretty easy, much easier that scraping anyway

Posted: 07 Oct 2020 08:58
pauld0051
Posts: 16
Joined: 2020-10-06

Great. I will look up how to do this. I am going to use Python for this. If it is a one liner, that will be easier than I am considering to do

Unless you feel like giving me a hint on how to do this in Python (which, I realise I am asking too much already).

My last call for API was in JS - so this will be my first in Python. Will be happy to try and test this for you too

Posted: 07 Oct 2020 09:07
pauld0051
Posts: 16
Joined: 2020-10-06


What language are you using? Be happy to give some pointers for api fetching


Sorry... I didn't see this message here. Oh, yes, I would LOVE some pointers!

I am going to use Python - coding on VS Code. So I don't actually know what modules or libraries I need yet. Haven't looked anything up at all. But very keen to get this started for the coming weekend's games as a test run.


Posted: 07 Oct 2020 09:17
cydalby
Posts: 335
Joined: 2020-06-16

https://repl.it/repls/SpecializedDevotedApplet

I've made a quick sample here of how to get the JSON response from that api in Python in the link above. let me know if thats enough to get you started!

pauld0051, zag
Posted: 07 Oct 2020 09:50
pauld0051
Posts: 16
Joined: 2020-10-06

https://repl.it/repls/SpecializedDevotedApplet

I've made a quick sample here of how to get the JSON response from that api in Python in the link above. let me know if thats enough to get you started!


Oh, that's really nice and simple.

So - when it comes to the weekend's games how do I get all 10 games after the last one finishes? Do I do a separate page for each game and use the unique ID? Or is there a set way that would acquire all the IDs of the games and it would just find the data for each?

I have to yet learn how to use that data I'm getting in JSON format - at the moment I am not yet sure even how I plan on displaying this data. But getting each to a value would be good. Such as:

aston_villa-shots_on_goal = 11

For example.

Then I can print that to the data for the team along with the other stats (will also need goals for and against too which isn't on this particular list).


Posted: 07 Oct 2020 10:04
cydalby
Posts: 335
Joined: 2020-06-16

You'd first need to pull the list of events from that day from the https://www.thesportsdb.com/api/v1/json/1/eventsday.php?d=2020-10-10 api.

Then filter that json object so you've just got the idEvent left over. Then create a for loop, and loop over the new stats api passing in a different id each time.

Some useful links below:
https://www.w3schools.com/python/numpy_array_filter.asp
https://www.w3schools.com/python/python_for_loops.asp

Feel free to dm me on twitter (@cydalby) if you need any more help!

GOAviator
Posted: 07 Oct 2020 10:21
pauld0051
Posts: 16
Joined: 2020-10-06

You'd first need to pull the list of events from that day from the https://www.thesportsdb.com/api/v1/json/1/eventsday.php?d=2020-10-10 api.

Then filter that json object so you've just got the idEvent left over. Then create a for loop, and loop over the new stats api passing in a different id each time.

Some useful links below:
https://www.w3schools.com/python/numpy_array_filter.asp
https://www.w3schools.com/python/python_for_loops.asp

Feel free to dm me on twitter (@cydalby) if you need any more help!


Thanks... I'm not much a twitter buff to be fair. But I will look at those links and see what I can do there. This is really awesome stuff.

Do you discord?

paulyd#7399

Posted: 07 Oct 2020 18:05
pauld0051
Posts: 16
Joined: 2020-10-06

So I tried a few things. And a couple have worked and this one hasn't (as of yet).

event_day = requests.get(
f'https://www.thesportsdb.com/api/v1/json/{apiKey}/eventsday.php?id={date}')

I have apiKey = 1 and date = 2020-10-10 but I can not seem to get this to print out.

Any ideas?

The other ones work (for a given ID):

r = requests.get(
f'https://www.thesportsdb.com/api/v1/json/{apiKey}/lookupeventstats.php?id={id}')

arr = np.array([r.json()])

Posted: 07 Oct 2020 18:12
pauld0051
Posts: 16
Joined: 2020-10-06

My second issue is the massive amount of data - just to sort out the bits I want. I have done a little in the past on sorting JSON files in JS, but never really in python. And all the W3 schools examples seem to be un-nested arrays. But I am sure it is pretty simple.

So for example, nested here I have 'strEvent': 'Aston Villa vs Liverpool'

So I'd like to take that, split it so Aston Villa gives the variable home_team = "Aston Villa" and away_team = "Liverpool"

This seems pretty straight forward. I can probably build the rest if I get a push in that direction


Posted: 07 Oct 2020 18:22
GOAviator
Posts: 50
Joined: 2020-08-24

My second issue is the massive amount of data - just to sort out the bits I want. I have done a little in the past on sorting JSON files in JS, but never really in python. And all the W3 schools examples seem to be un-nested arrays. But I am sure it is pretty simple.

So for example, nested here I have 'strEvent': 'Aston Villa vs Liverpool'

So I'd like to take that, split it so Aston Villa gives the variable home_team = "Aston Villa" and away_team = "Liverpool"

This seems pretty straight forward. I can probably build the rest if I get a push in that direction



If you go ahead in the event json reply, you will find splitted teams in home team and away team

Posted: 07 Oct 2020 18:25
pauld0051
Posts: 16
Joined: 2020-10-06


If you go ahead in the event json reply, you will find splitted teams in home team and away team


Is that the one I am currently having troubles getting?


Posted: 07 Oct 2020 19:08
pauld0051
Posts: 16
Joined: 2020-10-06

So it is not the one I am having trouble with, I can see home and away team, but I don't know the code to extract that.

I experimented with this:

r2 = requests.get(
f'https://www.thesportsdb.com/api/v1/json/{apiKey}/lookupevent.php?id={id}')

arr2 = np.array([r2.json()])

#print(arr2)

home_team = arr2['events'][0][1]['strHomeTeam']
print("Home team is " + home_team)

But it really didn't work - I kept getting:

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Any advice?


Posted: 08 Oct 2020 14:43
pauld0051
Posts: 16
Joined: 2020-10-06

Yay, I got it! But the way I am doing it feels long winded. Is there a way to take this and make it a bit shorter?
match_status = arr_events[0]['events'][0]['strStatus']
event_id = arr_events[0]['events'][0]['idEvent']
home_team = arr_events[0]['events'][0]['strHomeTeam']
away_team = arr_events[0]['events'][0]['strAwayTeam']
home_score = arr_events[0]['events'][0]['intHomeScore']
away_score = arr_events[0]['events'][0]['intAwayScore']
home_shots_on_goal = arr_stats[0]['eventstats'][0]['intHome']
away_shots_on_goal = arr_stats[0]['eventstats'][0]['intAway']
home_shots_off_goal = arr_stats[0]['eventstats'][1]['intHome']
away_shots_off_goal = arr_stats[0]['eventstats'][1]['intAway']
home_total_shots = arr_stats[0]['eventstats'][2]['intHome']
away_total_shots = arr_stats[0]['eventstats'][2]['intAway']
home_blocked_shots = arr_stats[0]['eventstats'][3]['intHome']
away_blocked_shots = arr_stats[0]['eventstats'][3]['intAway']
home_shots_inside_box = arr_stats[0]['eventstats'][4]['intHome']
away_shots_inside_box = arr_stats[0]['eventstats'][4]['intAway']
home_outside_box = arr_stats[0]['eventstats'][5]['intHome']
away_outside_box = arr_stats[0]['eventstats'][5]['intAway']
home_corners = arr_stats[0]['eventstats'][6]['intHome']
away_corners = arr_stats[0]['eventstats'][6]['intAway']
home_offsides = arr_stats[0]['eventstats'][7]['intHome']
away_offsides = arr_stats[0]['eventstats'][7]['intAway']
home_possession = arr_stats[0]['eventstats'][8]['intHome']
away_possession = arr_stats[0]['eventstats'][8]['intAway']
home_yellow_cards = arr_stats[0]['eventstats'][9]['intHome']
away_yellow_cards = arr_stats[0]['eventstats'][9]['intAway']
home_red_cards = arr_stats[0]['eventstats'][10]['intHome']
away_red_cards = arr_stats[0]['eventstats'][10]['intAway']
home_saves = arr_stats[0]['eventstats'][11]['intHome']
away_saves = arr_stats[0]['eventstats'][11]['intAway']
home_total_passes = arr_stats[0]['eventstats'][12]['intHome']
away_total_passes = arr_stats[0]['eventstats'][12]['intAway']
home_accurate_passes = arr_stats[0]['eventstats'][13]['intHome']
away_accurate_passes = arr_stats[0]['eventstats'][13]['intAway']
home_passes_percent = arr_stats[0]['eventstats'][14]['intHome']
away_passes_percent = arr_stats[0]['eventstats'][14]['intAway']
home_fouls = arr_stats[0]['eventstats'][15]['intHome']
away_fouls = arr_stats[0]['eventstats'][15]['intAway']


That's pretty much all the data I need.

zag
Posted: 08 Oct 2020 20:07
GOAviator
Posts: 50
Joined: 2020-08-24

I am sorry I can’t help with python. I would like to learn it.

Posted: 08 Oct 2020 21:13
pauld0051
Posts: 16
Joined: 2020-10-06

I am sorry I can’t help with python. I would like to learn it.

You're welcome to copy and paste - the rest of the code is further up. Happy to share.


Who is Online?

In total there are 68 users online :: 3 registered, 0 hidden and 65 guests (based on users active over the past 5 minutes) Most users ever online was 424 on Fri Nov 10, 2017 9:02 pm

About Us

Discussion forum for TheSportsDB.com site and related topics

Rules

- Be Polite
- Respect other users
- Always post log files with issues
- Try to be helpful
- No Piracy discussion

Showing 0 to 25 (Total: 27)