Research talk:Measuring edit productivity/Work log/2015-09-18

Add topic
From Meta, a Wikimedia project coordination wiki

Friday, September 18, 2015[edit]

OK! So I'm running a job on stat1003 and I've learned about two issues.

  1. is that the output queue used in para to parallelize the processing work needs a fixed size or memory is going to become a huge issue. When I run the job on a single file (no output queue), memory usage is minimal.
  2. this problem implies that the mappers can produce output far faster than the bzip2 stream can write. That means we need to multiprocess the compression of bzip2. I filed a feature request to add that. I'll be digging into that primarily today.

--Halfak (WMF) (talk) 14:31, 18 September 2015 (UTC)Reply