View Single Post
Old Sep 8, 2010, 12:25 AM   #18
twilyth
Guest
 
Posts: n/a (0/day)

I'm not sure this is the best place to put this since it's really stats related, but mod, feel free to mover and/or repost elsewhere.

I've been corresponding with Bok (Phil) at Free-DC about the upcoming TPU WCG team ad and he gave me some information about the mechanics behind gathering and calculating the stats for BOINC projects and some specific info on WCG. It's not stuff you really need to know as a user, but I think it's really interesting. It also makes you appreciate the dedication of guys like Phil.

Quote:
I had a quick look over at your forum and thought I might share some other links with you regarding the stats updates

The way I do the stats is actually a little different to most of the other sites but we all struggle with the sheer amount of data we are given when you combine all the boinc projects together. I also additionally do stats on most of the non BOINC projects too which adds to the complexity.

I have two scripts which are running constantly. First one polls each project for a very small file called tables.xml in their /stats directory. It checks the update time on this file and if it is different to the one held in the database here it will go ahead and download the new files for team/user/host data for that project. Most of these aren't too big but Seti/Climate Prediction/World Community Grid can be fairly large. WCG's host file compressed is 162Mb. Seti's is quite a bit larger.. Once they have been downloaded I have another script which is checking for newer files and once it finds them it will uncompress and start parsing the data into a database.

It throws everything in and computes ranks and such across the project, within team,country etc etc. Then checks for new milestones, movement. All in all for WCG alone this can be well over 1million sql statements and this is actually without the initial inserts as I do them via fast load into mysql. Add those and it would be upwards of 5million sql statements.

You can see a rudimentary log here

http://stats6.free-dc.org/stats.php?page=logs&proj=bwcg (this is the website running at my home rather than the official one)

Once the script has done a round of projects, it then goes on to do the combined updates to team/user/country and their ranks and such and on completion it runs some replication tasks to copy the raw data over to the web facing database (in order for the website to never be locked up whilst the data is being parsed and calculated).

If the script finishes a loop and realizes that it has crossed midnight it will start all of it's daily rollover tasks and take backups and such.

Hope this explains the delay between getting new data and showing it

This page is not really advertised but I use it as a brief check of when data was updated. You can take out the proj=bwcg part to show all projects.

http://stats.free-dc.org/stats.php?p...srun&proj=bwcg

All in all it does somewhere between 500million and 1billion sql statements in a given day

Phil
 
Reply With Quote
The Following 3 Users Say Thank You to For This Useful Post: