35

I have a directory of <20MB pdf files (each pdf represents an ad) on an AWS EC2 large instance. I'm trying to upload each pdf file to S3 using ruby and DM-Paperclip.

Most files upload successfully but some seem to take hours with the CPU hanging at 100%. I've located the line of code that causes the issue by printing debug statements in the relevant section.

 # Takes an array of pdf file paths and uploads each to S3 using dm-paperclip
 def save_pdfs(pdfs_files)
  pdf_files.each do |path|
  pdf = File.open(path)
  ad = Ad.new
  ad.pdf.assign(pdf) # <= Last debug statment is printed before this line
  begin
    ad.save
  rescue => e
    # log error
  ensure
    pdf.close
  end
 end

To help troubleshoot the issue I attached strace to the process while it was stuck at 100%. The result was hundreds of thousands of lines like this:

 ...
 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
 ... 500K lines

Followed by a few thousand:

 ...
 brk(0x1224d0000)                        = 0x1224d0000
 brk(0x1224f3000)                        = 0x1224f3000
 brk(0x122514000)                        = 0x122514000
 ...

During an upload that doesn't hang, strace looks like this:

 ...
 ppoll([{fd=12, events=POLLOUT}], 1, NULL, NULL, 8) = 1 ([{fd=12, revents=POLLOUT}])
 fstat(12, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
 fcntl(12, F_GETFL)                      = 0x2 (flags O_RDWR)
 write(12, "%PDF-1.3\n%\342\343\317\323\n8 0 obj\n<</Filter"..., 4096) = 4096
 ppoll([{fd=12, events=POLLOUT}], 1, NULL, NULL, 8) = 1 ([{fd=12, revents=POLLOUT}])
 write(12, "S\34\367\23~\277u\272,h\204_\35\215\35\341\347\324\310\307u\370#\364\315\t~^\352\272\26\374"..., 4096) = 4096
 ppoll([{fd=12, events=POLLOUT}], 1, NULL, NULL, 8) = 1 ([{fd=12, revents=POLLOUT}])
 write(12, "\216%\267\2454`\350\177\4\36\315\211\7B\217g\33\217!e\347\207\256\264\245vy\377\304\256\307\375"..., 4096) = 4096
 ...

The pdf files that cause this issue seem random. They are all valid pdf files, and they are all relatively small. They vary between ~100KB to ~50MB.

Is the strace with the seemingly excessive stat system calls related to my issue?

3
  • 1
    Your ensure block is not being executed when an exception occurs unless the exception is raised by ad.save. In this case, ad.pdf.assign(pdf) might be raising an exception, and the file would not be closed. That may have happened several hundred times before the file that's taking 100% CPU usage, leaving you with references to hundreds of files. If you wrap everything in a block and pass it to File.open, then you can be sure the file will always be closed correctly. Depending on how many files you are dealing with, that may improve performance significantly. May 5, 2016 at 13:33
  • Maybe related: serverfault.com/a/562148 Feb 27, 2018 at 0:11
  • for all "download/upload cpu hangs/outofmemory issues", I strongly recommend, to set the <attachment>_file_size parameter (in HTTP: Content-length header).
    – xerx593
    Feb 10, 2019 at 23:49

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Browse other questions tagged or ask your own question.