Cloud Performance Observations

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    #16
    I'm not willing to drop the cache on the vmware Cloud: it's too busy with paying clients/customers to jeopardise anything. I'm on thin ice with a big client just now due to sporadic "white page" issues. Grr.

    Have dropped the caches on both xen VMs (they are the ones that are closest in spec.) and below are the subsequent figures, immediately afterwards - eNlight being the second one:
    Code:
                 total       used       free     shared    buffers     cached
    Mem:        524288     502076      22212          0        860      74452
    -/+ buffers/cache:     426764      97524
    Swap:       522104      77588     444516
    Total:     1046392     579664     466728
    
    --------------------//-----------------------
                 total       used       free     shared    buffers     cached
    Mem:        524288     468668      55620          0        316      21908
    -/+ buffers/cache:     446444      77844
    Swap:      2129912     307208    1822704
    Total:     2654200     775876    1878324
    Now we have a point of reference.

    Now for disc mappings, which is rather interesting to note the differences:
    Code:
    /dev/xvda3 on / type ext3 (rw,noatime,usrquota)
    proc on /proc type proc (rw)
    sysfs on /sys type sysfs (rw)
    devpts on /dev/pts type devpts (rw,gid=5,mode=620)
    /dev/xvda1 on /boot type ext2 (rw,noatime)
    tmpfs on /dev/shm type tmpfs (rw,noexec,nosuid)
    none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
    /usr/tmpDSK on /tmp type ext3 (rw,noexec,nosuid,loop=/dev/loop0)
    /tmp on /var/tmp type none (rw,noexec,nosuid,bind)
    
    ----------------------//------------------------
    /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw,usrquota)
    proc on /proc type proc (rw)
    sysfs on /sys type sysfs (rw)
    devpts on /dev/pts type devpts (rw,gid=5,mode=620)
    /dev/xvda1 on /boot type ext3 (rw)
    tmpfs on /dev/shm type tmpfs (rw,noexec,nosuid)
    none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
    /usr/tmpDSK on /tmp type ext3 (rw,noexec,nosuid,loop=/dev/loop0)
    /tmp on /var/tmp type none (rw,noexec,nosuid,bind)
    I haven't been told what the underlying disc structure is on 'Cloud #2' but it is apparent that eNlight uses LVM of some sort. Not logging access time should theoretically speed disc writes, if nothing else but has other implications. Having /boot as ext3 is a waste of time, IMO, though once the server is underway, should have little bearing.

    [Will PM the eNlight login details (and 'Cloud #2' if wished) - both VMs are used for live sites, albeit not "crucial". By their nature they serve up different traffic at present and one is 'forced' to use FCGI (yuk!) until its software is migrated/updated.]

    The saga continues...

    EJ
    Last edited by ejsolutions; 09-02-2012, 16:32. Reason: typos (again!)

    Comment


      #17
      Hello,


      Swap: 522104 77588 444516
      Total: 1046392 579664 466728

      --------------------------------------- vs -----------------------------------------

      Swap: 2129912 307208 1822704
      Total: 2654200 775876 1878324
      As you can see the overall load pressure is more on eNlight for total used RAM size ( 579M vs 775M )
      You can notice the swap difference as well ( 77M vs 307M) which again proves your RAM pressure.


      total used free shared buffers cached
      Mem: 524288 502076 22212 0 860 74452

      --------------------------------------- vs -----------------------------------------

      total used free shared buffers cached
      Mem: 524288 468668 55620 0 316 21908
      - Low cache ( 74M vs 21M) suggests that you have more dynamic content on eNlight which is not useful for cache.

      Usage of swap indicates that system does not have enough RAM for required data processing and swapping a lot and hence generating %iowait.
      It could be that OS is swapping more because of available swap space ( eNlight need chanegs ) but figs are still cloudy as swap pressure is 77M/512M vs 307M/2048M which is 15% in both cases.

      Lets wait for 24 hours to have the Munin graphs and then if you agree we will make changes to MAX RAM (768M) and MAX CPU (2) and will observe for next 24 hours.
      I think I should not mention but you wont face any downtime in this auto upgrades because of eNlight's design :smile:

      Thanks for the cooperation and its really helpful for everyone !!!

      Comment


        #18
        Yes, dynamic content is the name of the game; I'm a specialist ( & part of the Dev. Team) for an osCommerce-derived package. However the 'low order' Cloud is for running informational content, a demo and a relatively small Joomla site. The bread & butter UK Cloud is the vmware one.

        The intention is for eNlight to replace 'Cloud #2', which costs a flat rate of 15/month (at current resource levels), including cPanel. This is for an "unmanaged service", however. At this moment of time the 'pressure' on both servers is split, due to this testing phase. Previously, 'Cloud #2' coped perfectly fine serving up what is now on eNlight, in addition to its current Joomla site.

        Hopefully from this, you can see that raising the resources to what you assess as being appropriate is just not going to happen.

        Unlike in Windoze, swapping is a natural occurrence for a *nix environment and is actually healthy. Were the OS not to associate some RAM to disc buffers, slab etc., then there would be more free RAM for applications, reducing the need to swap. That is not how a *nix OS operates under normal circumstances.
        This can be readily seen on the MUCH busier vmware Cloud that has 1Gb RAM - very little above 512Mb RAM is utilised, even though mySQL has been allocated large cache etc. The disc buffers are proportionally higher, along with other memory management allocations.
        The dual processor, 2Gb RAM USA-based Cloud, which has a 57k product ecommerce site and mediocre levels of traffic (apart from search bots), barely touches the upper 1Gb and I had considered reducing it back to 1Gb but decided to retain it for 'bursty moments'.

        EJ
        Last edited by ejsolutions; 09-02-2012, 23:02. Reason: A touch more info.

        Comment


          #19
          To respond directly to your swap comments:
          I actually think a reduction to the allocated swap space is a worthwhile 'experiment', even as far as 512Mb, matching currently allocated RAM and that of 'Cloud #2'. If that appears to be too extreme, then I could live with 1Gb/768Mb.

          Comment


            #20
            Over 24 hours since the cleared cache operations, so here's the latest "Scores on the boards": eNlight being shown as the second set again.
            Code:
                         total       used       free     shared    buffers     cached
            Mem:        524288     511728      12560          0       4960     107960
            -/+ buffers/cache:     398808     125480
            Swap:       522104     155632     366472
            Total:     1046392     667360     379032
            -------------------//-------------------------
                         total       used       free     shared    buffers     cached
            Mem:        524288     469620      54668          0        356      20712
            -/+ buffers/cache:     448552      75736
            Swap:      2129912     356624    1773288
            Total:     2654200     826244    1827956
            To illustrate the effect of this, a small Softaculous update made eNlight "throw a wobbly", with a Load exceeding 9 (Munin gave up on it).

            Any scope to reduce this excessive swap space? I'm sure a reduction will cause the OS algorithms to better manage memory usage (buffers in particular).
            Code:
            /dev/VolGroup00/LogVol01 swap                    swap    defaults        0 0
            A remount on LogVol00 is likely to be counter-productive
            Regards,
            EJ
            Last edited by ejsolutions; 10-02-2012, 17:37.

            Comment


              #21
              Hello,

              I agree that the %iowait issues is getting its head due to more swap while the OS is trying to balance the RAM and swap space. By digging more on that aspect I found
              * Swap should be 2x RAM is old rule now and its against todays advance memory management.
              * Swap should be there but in case of eNlight 512MB could be ideal case as it can give more RAM as and when needed.

              eNlight also keeps address buffer for 16GB RAM scalability so that you can go from 512MB to 16GB uninterrupted. Upon checking few more VMs having higher RAM, observation was that they are on very much low %iowait, sometimes 0 for 2-3 minutes on good traffic so hopefully utilizing the RAM perfectly.

              One can check the disk load by dstat or vmstat 1 and in your case the dstat is giving more insights. Any improvement on iowait after yesterday ?

              BTW I really wonder what was the Softaculous update causing load of 9 ?

              I'm considering to reduce the swap from 2Gig to 512M on new templates so that customers will use more RAM and pay more and will have less iowait

              And better I switch the swap on ext instead of LVM which can reduce further overheads.

              Comment


                #22
                I'm glad that your findings are matching what I have been advocating; I've been doing this type of thing for far too many years.

                As a Logical volume 'sits above' the filesystem, it shouldn't have any bearing - a swap filesystem will generally outperform a swapfile on ext2 (don't even think about ext3).

                Since your "exploration" of the eNlight Cloud yesterday there has been no improvement in I/O wait, in fact more like marginally worse.
                Dstat and vmstat are very much "a blast from past" for me - haven't used those tools for more than a decade.

                In creating a 'blanket' template, I suggest that 768Mb might be a sensible compromise (though I'd prefer 512Mb). I'd have thought that the majority of users will have in excess of 512Mb RAM. Unless you wish to cater for us spendthrift users that tweak to the nth degree, for a cheap Cloud. Where RAM is beyond, say 1.6Gb, I hazard a guess that swap is barely used in any case.

                If you can take a snapshot of my current VM, then you're more than welcome to try it with a lower swap template.

                [I've since deleted the Softaculous notification email, though I'll have a quick dig for the log, to see what was done.]

                EJ

                Comment


                  #23
                  Your experience will be really helpful and we are certainly open for the improvements.

                  We will be planning and making the changes in coming few days after our standard testings.
                  Thanks and we for more suggestions and performance metrics in order to improve.

                  This is indeed a new system which does automatic RAM and CPU upgrades/downgrades. If you can, I want you to set MAX CPU count to 24 and would eager to see how far your VM touches so that it can help us in establishing the relation of swap to cpu. It will not affect much on your USED CPU as there is less cpu intensive task and traffic.

                  Comment


                    #24
                    Originally posted by eUK-Rishi View Post
                    If you can, I want you to set MAX CPU count to 24 and would eager to see how far your VM touches so that it can help us in establishing the relation of swap to cpu. It will not affect much on your USED CPU as there is less cpu intensive task and traffic.
                    Your wish is my command - done. Given that I have a recorded spike of 15, one wonders if/how often we'll see a switch to multi-processors.
                    After a reboot, (just to be certain):
                    Code:
                                 total       used       free     shared    buffers     cached
                    Mem:        524288     469300      54988          0        236      15024
                    -/+ buffers/cache:     454040      70248
                    Swap:      2129912     286412    1843500
                    Total:     2654200     755712    1898488

                    Comment


                      #25
                      Your wish is my command - done. Given that I have a recorded spike of 15, one wonders if/how often we'll see a switch to multi-processors
                      I see only one spike in whole day for 9 CPUs but thats normal as your USED was just 0.83. I wish everyone on eNlight cloud could keep their MAX CPU count to 2+. If the OS has atleast 2 cpus then it will surely perform a lot better than 1 with its schedulers.

                      Or may be I can do some trick so that VMs wont go below 2 CPUs in any case

                      After a reboot, (just to be certain):
                      If I may ask, what was the reason to reboot ?

                      Comment


                        #26
                        Originally posted by eUK-Rishi View Post
                        I see only one spike in whole day for 9 CPUs but thats normal as your USED was just 0.83.
                        All is not well currently - WHM unresponsive, so can't check current status.

                        Time: Sat Feb 11 13:12:10 2012 +0000
                        1 Min Load Avg: 29.16
                        5 Min Load Avg: 13.50
                        15 Min Load Avg: 5.89
                        Running/Total Processes: 3/257

                        Time: Sat Feb 11 16:08:13 2012 +0000
                        1 Min Load Avg: 14.61
                        5 Min Load Avg: 6.03
                        15 Min Load Avg: 3.65
                        Running/Total Processes: 2/231


                        Originally posted by eUK-Rishi View Post
                        If I may ask, what was the reason to reboot ?
                        1. To see if WHM would indicate more than one processor - it didn't
                        2. to see if it would clear a tailwatchd error: http://www.eukhost.com/forums/f45/ta...566/#post96719 - it didn't

                        Comment


                          #27
                          This is bad.
                          Daily % I/O Wait
                          Current: 90.8 min: 3.68 Avg: 20.8 Max: 94.78

                          There have been 8 instances of http load time being greater than 3 seconds - max. 5

                          In same time frame, Load peaked over 20 and current was 18. Now dropped back, so unable to determine root cause of load.

                          EJ

                          Comment


                            #28
                            Code:
                            [email protected] [~]# free -m
                                         total       used       free     shared    buffers     cached
                            Mem:          1024        951         72          0          2        376
                            -/+ buffers/cache:        571        452
                            Swap:         2079          0       2079
                            This is the stat from a similar high disk load server with MAX RAM as 1Gig & in performance mode. The Swap usage is 0. %iowait is again near 5-10 with 1 CPU.

                            I think you should really switch MAX RAM to 1Gig for a day to see if there are any improvements and there must be some reason of this high load.

                            Meanwhile I'll ask our cPanel admin to check on the tailwatchd error.

                            Comment


                              #29
                              Originally posted by eUK-Rishi View Post
                              Code:
                              [email protected] [~]# free -m
                                           total       used       free     shared    buffers     cached
                              Mem:          1024        951         72          0          2        376
                              -/+ buffers/cache:        571        452
                              Swap:         2079          0       2079
                              This is the stat from a similar high disk load server with MAX RAM as 1Gig & in performance mode. The Swap usage is 0. %iowait is again near 5-10 with 1 CPU.
                              Your reading of those stats is obviously different from mine. What I see is a vindication of my suggestion that swap is set too large and a part consequence is that buffers have been poorly allocated. This is having a detrimental effect on disc I/O. Reduce the swap size and see if some swap gets used along with RAM being allocated to disc buffers. I (notionally) bet the I/O wait drops.

                              Originally posted by eUK-Rishi View Post
                              I think you should really switch MAX RAM to 1Gig for a day to see if there are any improvements and there must be some reason of this high load.
                              This is counter-productive to the intended use of the eNlight Cloud, for me. It would push up the monthly cost to make it an unviable alternative to 'Cloud #2'.

                              [An off-topic comment: From looking back at emails, it appears that the load was caused by a few site scrapers (since blocked). A reduced I/O wait would help to reduce the Load. Also, for some reason, looking at Apache Status displays connections that should have long since cleared - timeout/keepalive settings match other VMs and don't exhibit the same. Weird.]

                              Comment


                                #30
                                Forgot to add:
                                Given that the dynamic CPU has now been set to 24, I hope that you agree that Load is not being affected by a lack of processor power. The high Loads should have triggered additional CPUs to be used, rather than a highly noticeable I/O wait time.
                                Time to put the VM back to single processor mode, methinks. Let's keep the variables to a minimum.

                                Comment

                                Working...
                                X