blog/Why is My Laptop CPU So Slow

    Recently, my laptop, with an Intel i7-4700MQ CPU running Debian Linux began running very slowly (very is largely relative, I didn’t notice for months…) when I changed from the current stable Debian Linux kernel 3.16 to the 4.4 kernel available in Debian backports. I investigated and noticed that the CPU was often running at only 600-700Mhz! A simple command to show the speed of all of your cores is watch -n 1 'for i in /sys/devices/system/cpu/cpu?/cpufreq/cpuinfo_cur_freq ; do cat $i ; done'. Initially I assumed there was some misconfiguration of cpufreq, the Linux system which manages CPU speed. However, changing cpufreq settings, such as changing the governor to performance, changing the minimum speed, etc., had no effect at all. There are several other systems which may slow down the CPU:

    In the end, I discovered that on Linux 4.4 the /sys/devices/virtual/thermal/cooling_device0/cur_state would end up being set by something unknown to 4 when plugging or unplugging the power cable to my laptop resulting in 600-700Mhz operation. At the same time, the CPU reports seeing the PROCHOT or FORCEPR lines asserted as the power cable is plugged/unplugged. It’s my guess that this is the result of some interaction between the kernel, ACPI, and an external chip on the system. This hasn’t occurred since upgrading to Linux 4.5, but the behavior can be half-reproduced by writing 4 to /sys/devices/virtual/thermal/cooling_device0/cur_state.

    I have developed a script to detect some possible reasons why my CPU might be running slowly, it’s available at: https://github.com/orezpraw/scripts/blob/master/whyslow.pl.

    blog/How To Disable CPU Power Management on Linux for Low Latency Operation

      Disable C-states

      C-states occur (available on all modern AMD and Intel CPUs) power down parts of the chip to save power. However, that incurs a latency penalty when a core must wake up to service a hardware request. Minimal latency is often desirable when working on certain tasks, such as audio-processing tasks. This is especially true when using advanced Linux audio systems like JACK. You can instruct your kernel to minimize latency by running the following command in a terminal: cat >/dev/cpu_dma_latency and then writing 0x00000000 and hitting enter. It cannot be shortened to 0x0 or 0 as this will not have the same effect. After having done so, the kernel will attempt to maintain minimum latency until the cat command is closed with control-D.

      Disable Intel Turbo Boost

      Intel turbo boost is, in Intel’s own words:

      Intel® Turbo Boost Technology 2.01 accelerates processor and graphics performance for peak loads, automatically allowing processor cores to run faster than the rated operating frequency if they’re operating below power, current, and temperature specification limits.

      Essentially, it operates by changing the clock speed of your processor depending on the load. Unfortunately, it may change the clock speed quite frequently, and changing the clock speed also incurs a latency penalty. You can disable Intel Turbo Boost on recent machines and kernels with the command: echo 1 >>/sys/devices/system/cpu/intel_pstate/no_turbo. I recommend doing this if you disable C-states with the above method to prevent your CPU from overheating. Most systems will not overheat, but disabling turbo forces the CPU to run at its stated speed rather than any turbo speeds, making it less prone to overheating and crashing, or overheating and then slowing itself down due to heat which could impact your low-latency tasks! For example, I have a 2.4Ghz i7 CPU with a maximum turbo speed of 3.4Ghz. If I disable C-states but do not disable turbo, the CPU runs at 3.1Ghz and causes my laptop fans to spin up. However, if I disable turbo, the CPU runs at 2.4Ghz and remains fairly cool.

      Set Frequency to Maximum

      Linux contains a system called cpufreq to adjust the CPU frequency based on a number of policies. Select the “performance” policy to have the kernel run your CPU at its maximum frequency with the following command: for i in /sys/devices/system/cpu/cpu*/cpuidle/state5/disable ; do echo 1 >>$i ; cat $i ; done. Depending on your system, this might have no effect, because many CPUs (including my Haswell mobile i7-4700MQ) will manage their own clockspeed automatically, leaving the OS with limited control.

      blog/The Unreasonable Effectiveness of Traditional Information Retrieval in Crash Report Deduplication

        Check out my new paper with Eddie Antonio Santos and Abram Hindle. Here’s the abstract:

        Organizations like Mozilla, Microsoft, and Apple are flooded with thousands of automated crash reports per day. Although crash reports contain valuable information for debugging, there are often too many for developers to examine individually. Therefore, in industry, crash reports are often automatically grouped together in buckets. Ubuntu’s repository contains crashes from hundreds of software systems available with Ubuntu. A variety of crash report bucketing methods are evaluated using data collected by Ubuntu’s Apport automated crash reporting system. The trade-off between precision and recall of numerous scalable crash deduplication techniques is explored. A set of criteria that a crash deduplication method must meet is presented and several methods that meet these criteria are evaluated on a new dataset. The evaluations presented in this paper show that using off-the-shelf information retrieval techniques, that were not designed to be used with crash reports, outperform other techniques which are specifically designed for the task of crash bucketing at realistic industrial scales. This research indicates that automated crash bucketing still has a lot of room for improvement, especially in terms of identifier tokenization.

        You can read the preprint!

        blog/Ruby 2.1 Garbage Collector

          While building my twitter bot I ran into a problem with very poor performance. It turned out that this was caused by Ruby’s garbage collector, which by default runs after every 8 to 32MB of malloc() calls. Since I was dealing with single strings that were ~400MB in length, this was not appropriate. So, I discovered that I could alter this behavior by setting some environment variables:

          export RUBY_GC_OLDMALLOC_LIMIT=1503238553
          export RUBY_GC_OLDMALLOC_LIMIT_MAX=1503238553
          export RUBY_GC_MALLOC_LIMIT=1503238553
          export RUBY_GC_MALLOC_LIMIT_MAX=1503238553
          export RUBY_GC_HEAP_INIT_SLOTS=200000000
          export RUBY_GC_HEAP_FREE_SLOTS=2000000

          which improved performance by an order of magnitude. With these settings, Ruby’s GC runs after every 1.5GB of malloc()s. However, you cannot set these limits any higher than about 1.5 GB because the GC uses a signed 32-bit int counter internally, and it uses a formula like limit = limit * 1.4 and then checks if limit > max. So, above 1.5GB, limit will overflow and become negative!

          blog/Twitter Bot

            I made a twitter bot based on the code of mispy’s twitter_ebooks using the MSR Challenge 2015 data available at http://2015.msrconf.org/challenge_data/. You can check it out here: https://twitter.com/horse_overflow.

            blog/Heuristic Search and Software Engineering

              I recently gave a presentation about Heuristic Search, existing Search-based Software Engineering, and future directions and opportunities in combining the fields of Software Engineering and Heuristic Search. The presentation was given to the Heuristic Search group at the Department of Computing Science at the University of Alberta. If you are interested in these topics, please take a look at the pdf of the slides.

              blog/How To Get Cabal Working with PaX

                Unfortunately, modern haskell seems to generate code which needs special exceptions granted to it to by the PaX patch to the kernel using paxctl. Since cabal compilation often involves building temporary executables and then running them immediately, this is impossible to achieve when running paxctl by hand. So, I came up with the following solution.

                First, I created a shim script which will take the place of the normal linker (gcc):

                #!/bin/sh
                gcc "$@"
                while [[ $# -gt 0 ]]
                do
                        if [[ "$1" == "-o" ]]
                        then    paxctl -cm "$2"
                        fi
                        shift
                done

                This script simply runs the linker as normal by passing all of the options that it was given to the linker. Then it finds -o options that indicate the output files of the linker and runs paxctl on them. Finally, when running cabal install we can give it the following option to instruct it to instruct ghc to call our shim instead of the usual linker:

                --ghc-options="-pgml linkpax"