Forums :
Announcements :
Beta testing the new C@H
Message board moderation
Author | Message |
---|---|
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
Hi all, Today is an exciting day as its a big step in the process of completely revamping the C@H server. After a lot of work the last couple of months, we're ready to open up the new server to public beta testing. It is currently up and running at beta.cosmologyathome.org. If you would like help, please point your clients to this server, run a few jobs then report back in this thread about your experience. On Aug 17 we dumped the user database into this server, so if you registered before that, your regular C@H login will work on there. Otherwise you will need to create a new account. Once beta testing is done we will attempt to transfer over any credits you earned on the beta server, however please note this will be on a best-effort basis with no guarantee. All other aspects of the beta server, including message boards, etc... will be deleted. The new server is beta software, and may crash, be reset without notice, etc..., however if you are excited or curious about trying out new software, we could use your help ironing out all the bugs! So what's new?
|
Crystal Pellet Send message Joined: 12 Feb 13 Posts: 23 Credit: 363,354 RAC: 0 |
1. Please send for this BETA test only the new application: camb_boinc2docker 2. Too less disk space is required. The slot needs more than 1000000000 bytes. cosmohome 25 Aug 09:10:48 Aborting task camb_boinc2docker_3266_1440455267.684832_1: exceeded disk limit: 1135.79MB > 953.67MB cosmohome 25 Aug 10:00:52 Aborting task camb_boinc2docker_3776_1440459354.579906_0: exceeded disk limit: 1162.56MB > 953.67MB Task aborted at about 80% done. 3. Don't allocate all available threads, but max minus 1. VBoxHeadless.exe is always running at standard priority and can't be lowered either by vboxwrapper nor by the user, due the Oracle's security policy concerning VirtualBox. |
Crystal Pellet Send message Joined: 12 Feb 13 Posts: 23 Credit: 363,354 RAC: 0 |
Hi Marius, The disk limit is exceeded, cause you are using snapshots. The snapshot alone needs almost 1 GB on disk. The development of vboxwrapper leaded to the conclusion that working with the VM's gives lesser problems, when not using snapshots. For jobs lasting very long (days and days), snapshots are useful, but for your very short running tasks, you could better do without. You could use following camb_boinc2docker_vbox_job.xml to achieve that: <vbox_job> <!-- Set as desired --> <memory_size_mb>2048</memory_size_mb> <!-- This is the VBox guest OS, not the host OS, so it stays this for all app_versions. --> <os_name>Linux26_64</os_name> <!-- These are all needed for boinc2docker --> <enable_isocontextualization>1</enable_isocontextualization> <enable_cache_disk>1</enable_cache_disk> <enable_shared_directory/> <enable_scratch_directory/> <enable_network/> <completion_trigger_file>completion_trigger_file</completion_trigger_file> <!-- --> <fraction_done_filename>results/progress</fraction_done_filename> <minimum_checkpoint_interval>60</minimum_checkpoint_interval> <enable_vm_savestate_usage/> <disable_automatic_checkpoints/> </vbox_job> You use vbox_job.xml twice, named as vbox_job.xml and camb_boinc2docker_vbox_job.xml. The vbox_job.xml can be deleted. It's confusing, cause not used. |
Crystal Pellet Send message Joined: 12 Feb 13 Posts: 23 Credit: 363,354 RAC: 0 |
When using the 'save state' method, the task still requires more than the 1000000000 bytes you allow. When a user suspends the task with 'leave application in memory' ticked off or when BOINC is stopped, a save-file of also almost 1GB in size of the VM is temporary created in the slot directory. |
![]() Send message Joined: 26 Mar 08 Posts: 9 Credit: 2,267,539 RAC: 0 |
After setting "Request tasks to checkpoint at most every xx seconds" to a value higher than the runtime of the task, I'm able to complete tasks on Win7 x64 with vbox 5.0.0. So the 'maximum disk limit exceeded' error is definitely caused by the checkpoint snapshots. I agree with Crystal Pellet's point that a vbox_mt app should not use all available CPU cores, because normal priority can cause problems for other processes (e.g. GPU tasks) running in parallel. Unless you need fast results or plan to send out very large tasks which would take days on a single core, I don't see an advantage of a multi-threaded app at all. Another remark: If there is any good measure for the actual computing power used to complete a task, you may consider using this for granting credit instead of CreditNew. With any credit system, there may be complaints that credits are too low, too high or both compared to project X (I personally don't care; cross-project comparison will always be apples and oranges, because the projects are calculating different things), but CreditNew has often lead to a credit lottery, leaving everybody unhappy who cares about his stats. |
Crystal Pellet Send message Joined: 12 Feb 13 Posts: 23 Credit: 363,354 RAC: 0 |
MT-tasks with 8 threads: Elapsed time avg: 947.76 sec - CPU used 6700,41 seconds on average. MT-tasks with 7 threads: Elapsed time avg: 1013.74 sec - CPU used 6371.32 seconds on average and on the 8th thread a camb_legacy v2.16 was running. |
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
Awesome, thanks for the useful comments! Replies below..
|
Crystal Pellet Send message Joined: 12 Feb 13 Posts: 23 Credit: 363,354 RAC: 0 |
* Multi threaded: By default BOINC is going to allocate all free CPUs to the job. If you have 4 CPUS and in your computing preferences you tell BOINC to use 50% CPU time, it'll run it as 2 CPU job. Is this a solution to what you guys are talking about, or am I misunderstanding? 1. The problem is that VBoxHeadless.exe is running at the 'normal' priority, where normal BOINC-tasks are running at the lowest 'idle' priority. So your task is concurring with the user himself. Setting cpu's to e.g. 50% is only a partial solution, cause most crunchers want to use all cores, but al lowest priority for BOINC. There is a cmdline parameter --nthreads. Maybe you could use that, when taking ncpus - 1 for --nthreads. 2. When your mt-task is starting it pushes all other already running BOINC-tasks to a waiting state, maybe even loosing a lot of computing time when 'Leave in application' is not set or swapped to disk when "LAIM" is set, but system is low on memory. Your VM needs about 1.5GB RAM. * Crystal Pellet: Thanks good catch, there's an unnecessary vbox_job.xml in there. Btw, what is the <enable_vm_savestate_usage> tag, I'm not seeing that in the docs for vboxwrapper? If you set that tag, in your *job.xml file together with the also not documented disable_automatic_checkpoint tag the VM will save its state immediately when a user suspend the task (LAIM off) or BOINC stops. The VM is saved and not poweroff (although of course not running anymore) After resume no loss, because it restores from the very last point where the user suspended it. In your setup the whole task could be lost when no checkpoint was made or at least the loss of time since the last checkpoint. Therefore also in my setup to checkpoint every 60 seconds, because no checkpoints needed, but the checkpoint-file updates more regular now. That file is also used for restoring the cpu-seconds after a task-resume. |
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
I upped the job disk bound to 3gb, let me know if anyone still sees the disk errors. The 3gb should be an overestimate, I will work on tweaking the exact space / memory requirements. |
Rapture![]() Send message Joined: 27 Oct 07 Posts: 85 Credit: 661,330 RAC: 0 |
Thanks for the long awaited update! I was wondering what has happened with this great project. I am looking forward to the changed being planned. Keep up the good work! |
Crystal Pellet Send message Joined: 12 Feb 13 Posts: 23 Credit: 363,354 RAC: 0 |
You removed a needed file from the download directory: cosmohome 27 Aug 11:34:07 Giving up on download of camb_boinc2docker_boinc_app: permanent HTTP error |
STE\/E Volunteer tester Send message Joined: 12 Jun 07 Posts: 375 Credit: 16,539,257 RAC: 0 |
Wu's seem to stop after 10 Min's with this Message ? PBT99 6005 cosmohome 8/27/2015 5:13:37 AM task postponed 86400.000000 sec: VM Hypervisor failed to enter an online state in a timely fashion. 6006 cosmohome 8/27/2015 5:13:37 AM Starting task camb_boinc2docker_26880_1440646584.135051_0 Then another Wu Starts & runs for 10 Min's then stops & etc. I have vbox 4.3.12 installed on a Win 8 Laptop all Wu's that run for 10 Min's just stay suspended waiting to run ... |
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
Crystal Pellet: Yea, I noticed the file was gone and readded it. It might have been gotten deleted again at some other points too, I'll look into why the file deleter is getting it. Btw, your suggestion with the vm_save_state looks really great, I'm testing it now. Thanks! STEVE: In the DB I see a number of failed jobs from you due to the same file getting deleted error, which is now hopefully back. That doesn't sound like the error you're describing though. I also see several in progress, not sure if those are this 10min thing? Can you hit Update to see if it sends back any error logs that might help debugging the problem? |
STE\/E Volunteer tester Send message Joined: 12 Jun 07 Posts: 375 Credit: 16,539,257 RAC: 0 |
I've updated the project several times now ... The deleted Wu's were actually download error's, I've returned no successful Wu's yet as they all hang/suspend themselves after 10 Min's ... |
Crystal Pellet Send message Joined: 12 Feb 13 Posts: 23 Credit: 363,354 RAC: 0 |
Crystal Pellet: Yea, I noticed the file was gone and readded it. It might have been gotten deleted again at some other points too, I'll look into why the file deleter is getting it. Btw, your suggestion with the vm_save_state looks really great, I'm testing it now. Thanks! Hi Marius, That camb_boinc2docker_boinc_app-file is gone again. At least it's not in the download-dir. I've successfully tested an option to reduce the number of cores for the Virtual Machine by the user himself. You don't have to do anything, when the user places following file with the name app_config.xml in his project directory: <app_config> <project_max_concurrent>1</project_max_concurrent> <app> <name>camb_boinc2docker</name> <max_concurrent>1</max_concurrent> </app> <app_version> <app_name>camb_boinc2docker</app_name> <plan_class>vbox64_mt</plan_class> <avg_ncpus>7.000000</avg_ncpus> <max_ncpus>7.000000</max_ncpus> </app_version> </app_config> In the example I've reduced the number of cores to 7 on my 8-threaded machines. The VM is created and running with 7 cores. Results with 6 cores: http://beta.cosmologyathome.org/result.php?resultid=1951 http://beta.cosmologyathome.org/result.php?resultid=1939 Result with 7 cores: http://beta.cosmologyathome.org/result.php?resultid=1897 |
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
STEVE: That camb_boinc2docker_boinc_app file (http://beta.cosmologyathome.org/download/2b0/camb_boinc2docker_boinc_app) is and should have been present for at least the last four hours. But I do still see your client giving errors downloading it. Can you try a project reset? Maybe a remove / add too? Other clients have been able to complete the exact same workunit after your client gave a download error on them, so my guess is the workunits / files are fine on the server. Crystal Pellet: Very useful, thanks. To make sure I understand, the difference between this, and say, just lowering the "Use at most" CPU time option is that this targets camb_boinc2docker specifically, leaving other apps to use that last 8th core? |
Crystal Pellet Send message Joined: 12 Feb 13 Posts: 23 Credit: 363,354 RAC: 0 |
STEVE: That camb_boinc2docker_boinc_app file (http://beta.cosmologyathome.org/download/2b0/camb_boinc2docker_boinc_app) is and should have been present for at least the last four hours. Shouldn't be that file 1 directory higher: in download-dir itself and not in /2b0/ ? It looks like it is deleted after every task from the user's machine. Crystal Pellet: Very useful, thanks. To make sure I understand, the difference between this, and say, just lowering the "Use at most" CPU time option is that this targets camb_boinc2docker specifically, leaving other apps to use that last 8th core? That's correct! This last core could be left free for GPU-task support or another single-core CPU-task could use it. That app_config.xml should be placed in the Cosmology project directory on the users machine (now of course the beta-directory). |
STE\/E Volunteer tester Send message Joined: 12 Jun 07 Posts: 375 Credit: 16,539,257 RAC: 0 |
STEVE: That camb_boinc2docker_boinc_app file (http://beta.cosmologyathome.org/download/2b0/camb_boinc2docker_boinc_app) is and should have been present for at least the last four hours. But I do still see your client giving errors downloading it. Can you try a project reset? Maybe a remove / add too? Other clients have been able to complete the exact same workunit after your client gave a download error on them, so my guess is the workunits / files are fine on the server I haven't been able to get the camb_boinc2docker_boinc_app to run for more than 10 min's before suspending & starting another task on the Win 8 Laptop. I did get it to run on another PC though that has Win 7 Pro installed ... Question, are the camb_legacy wu's I'm getting multi task too ??? they only run 1 at a time ??? |
fzs600 Send message Joined: 7 May 10 Posts: 2 Credit: 1,049,413 RAC: 5 |
Hello it is possible to choose its application in the preferences of his account? choose between : camb_legacy and camb_boinc2docker Thank you |
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
Steve: Yes, camb_legacy is single threaded. We have no plans to modify this app in the future, so it will stay single threaded, but if you've got the RAM, it's just as efficient to run multiple copies of it as if it were multithreaded. fsz600: Ah, it should be possible by going to Your Account -> Cosmology@Home preferences, but currently its not. I'll work on fixing this and post back here when its live. |