In the post of Linux memory management, it is discussed that linux uses availabe DRAM as buffer/cache to optimize the whole system performance. That is certainly a very good thing. However, there could be side effect.

Linux dd cache

What is the issue?

Last week I have spent a lot time to figure out why kernel lockup during IR829 bundle image installation. During the bundle image installation, there is a step to extract Guest OS disk image out from bundle image and write it into Guest OS disk using linux utility tool “dd”.

bash# gzip -d -c $GOS_IMAGE.gz | dd of=/dev/sda bs=1M

From time to time the kernel lockup occurs. During the process, I ran “top” to monitor the overall state of the box, I can see that free memory drops sharply and the memory consumption by Buffer&Cache increased dramatically, which is expected as dd is trying to write disk image into disk /dev/sda. Linux kernel uses available DRAM to cache/buffer disk image. The cache/buffer space will be reclaimed by a kernel thread “kswapd” in case of kernel realizes that there is need/demand by other process. All sounds very nice, however the reality is not so nice, especailly for a running kernel version 2.6.35. Sometime “kswapd” doesn’t do its job right and lockup the box.

What is solution?

The solution is to avoid using the cache memory when Guest OS disk image is copied into disk. By reading the latest version of dd manual, it indicates that there are options like iflag, oflag. However the manual doesn’t say what are possibe flags/values. I then chased down to GNU coreutil document for dd: https://www.gnu.org/software/coreutils/manual/html_node/dd-invocation.html, it mentions the very interesting options such as “nocache”, “dsync”, “direct”. This makes me really exciting. The first try is “dd iflag=nocache oflag-nocache”, it turns out that the option is not accepted. When I checked the version by “dd –version”, I found that the running dd version is 8.5, which is really old, comparing to the latest one version 8.25, which is released in January 20 2016. Check http://ftp.gnu.org/gnu/coreutils/.

So I ended up to download the latest version coreutils and compiled them and packaged it into my linux ramdisk.

Acoording to the dd nocache unit testcase,

  # Stream data just using readahead cache
  dd if=ifile of=ofile iflag=nocache oflag=nocache || fail=1

However it does not work for me. It works well after I add “dsync” option in oflag like below:

bash# gzip -d -c $GOS_IMAGE.gz | dd of=/dev/sda iflag=nocache oflag=nocache,dsync bs=1M

With these new options, there is no noticable cache/buffer memory consumption increase during the installation period, however the paid price is the increased time to complete the whole operation, which is expected.

How it works?

I further checked the souce code how it works. It turns out that dd.c implements a function invalidate_cache,which tells kernel to the block of memory is no longer needed through posix_fadvise(fd, …, POSIX_FADV_DONTNEED).

/* Discard the cache from the current offset of either
   STDIN_FILENO or STDOUT_FILENO.
   Return true on success.  */

static bool
invalidate_cache (int fd, off_t len)
{
  int adv_ret = -1;

  /* Minimize syscalls.  */
  off_t clen = cache_round (fd, len);
  if (len && !clen)
    return true; /* Don't advise this time.  */
  if (!len && !clen && max_records)
    return true; /* Nothing pending.  */
  off_t pending = len ? cache_round (fd, 0) : 0;

  if (fd == STDIN_FILENO)
    {
      if (input_seekable)
        {
          /* Note we're being careful here to only invalidate what
             we've read, so as not to dump any read ahead cache.  */
#if HAVE_POSIX_FADVISE
            adv_ret = posix_fadvise (fd, input_offset - clen - pending, clen,
                                     POSIX_FADV_DONTNEED);
#else
            errno = ENOTSUP;
#endif
        }
      else
        errno = ESPIPE;
    }
  else if (fd == STDOUT_FILENO)
    {
      static off_t output_offset = -2;

      if (output_offset != -1)
        {
          if (0 > output_offset)
            {
              output_offset = lseek (fd, 0, SEEK_CUR);
              output_offset -= clen + pending;
            }
          if (0 <= output_offset)
            {
#if HAVE_POSIX_FADVISE
              adv_ret = posix_fadvise (fd, output_offset, clen,
                                       POSIX_FADV_DONTNEED);
#else
              errno = ENOTSUP;
#endif
              output_offset += clen + pending;
            }
        }
    }

  return adv_ret != -1 ? true : false;
}