Potential issue with pdf files and git repos

Matey Chopov matey.chopov at ca.abb.com
Wed Jun 13 22:15:56 UTC 2018


Hi Thomas,

Here is a similar version to the original .pdf I was talking about here: http://www.gimpel.com/html/manual.pdf

I think you are right about the issue being with file size instead of file content.

When I converted the original pdf file to text and back to pdf it drastically changed size. The original is about 4.5MB while the converted one was around 1MB.

I proceeded to push a bigger .pdf file (around 11.1 MB size with no encryption) and I got the same trace error, so apparently it isn’t about the file being encrypted or not.

I am getting a similar trace when using uwsgi, I played around with the buffer size, but to no avail.
Here is another sample log file when I try to push a big .pdf file to a git repo on Kallithea 0.3.5 using uwsgi as a web server.

https://pastebin.com/Y1piX0vX

Tomorrow, I will try to test with the default branch instead of 0.3.5.

Thanks,

Mat

From: Thomas De Schampheleire [mailto:patrickdepinguin at gmail.com]
Sent: Wednesday, June 13, 2018 4:52 PM
To: Matey Chopov <matey.chopov at ca.abb.com>
Cc: kallithea-general at sfconservancy.org
Subject: Re: Potential issue with pdf files and git repos

On Wed, Jun 13, 2018, 22:29 Thomas De Schampheleire <patrickdepinguin at gmail.com<mailto:patrickdepinguin at gmail.com>> wrote:
2018-06-13 20:39 GMT+02:00 Matey Chopov <matey.chopov at ca.abb.com<mailto:matey.chopov at ca.abb.com>>:
> Hi,
>
> It looks like it happens with a specific .pdf manual, I tested it with another .pdf file and the exception didn't occur, the file got pushed correctly.
>
> Here's the line in the trace I think is the most interesting:
>
> DatabaseError: (DatabaseError) file is encrypted or is not a database u'SELECT ui.ui_id AS ui_ui_id, ui.ui_section AS ui_ui_section, ui.ui_key AS ui_ui_key, ui.ui_value AS ui_ui_value, ui.ui_active AS ui_ui_active \nFROM ui \nWHERE ui.ui_key = ?' ('push_ssl',)
> 2018-06-13 11:07:04.946 ERROR [waitress] Exception when servicing <waitress.channel.HTTPChannel connected 0.0.0.0:12756<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2F0.0.0.0%3A12756&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174375499&sdata=W6criUpKycuZzM7RhwN%2FHAFWlvo3ORA%2BYjGzFo%2BVbDQ%3D&reserved=0> at 0x7f06c5923e90>
>
> I have uploaded the surrounding log on pastebin: https://pastebin.com/n9fY4xae<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fn9fY4xae&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174385503&sdata=mg7SjCBrBxdwo%2FuD9Bj5O%2FVUgXb4J7A%2BiHT5N3R9X38%3D&reserved=0>
>
> So the problematic pdf that I though wasn't encrypted, was actually encrypted with RC4, the weird thing is that in Git Extensions you can still see the file contents in the "diff" section.
>
> Apparently, pdf readers automatically decrypt such files if there is no password (which is the current case).
>
> I used qpdf to decrypt the file (with no password) which gave another valid .pdf file with no encryption (at least that's what I get when I analyze the file with pdfinfo).
>
> Tried pushing that file too, but it still failed.
>
> I played with the pdf headers, changed the Creator and Producer values to the ones of a .pdf file I know could be uploaded. Same error.
>
> I tried converting the file from pdf1.3 to pdf1.4, same issue.
>
> So, what finally worked for me was converting from pdf to ps, then to text, then from the text file, to ps, and then to pdf. The indexing table got screwed, but that doesn't really bother me. Finally, pushed the new pdf file to the git repo with success.
>
> Commands:
>
> pdftops test.pdf test.ps<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftest.ps&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174385503&sdata=TGgIAOfHl3F23zTRt%2BBFffRTM9%2FOdJxPIERyGF%2Fq82o%3D&reserved=0>
> ps2txt test.ps<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftest.ps&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174395507&sdata=zeS75LIlalR4ER%2BJuA%2FQJ4QeYBj1zBaLRS4TMjIC%2B0E%3D&reserved=0> test.txt
>
> enscript -B --margins=10:10 -o test.ps<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftest.ps&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174406836&sdata=s6sM2Khn5hyKssqEOxzjpxGO3Tt7N7aGnDhJmX8iCq4%3D&reserved=0> -f Courier at 7.3/1<mailto:Courier at 7.3/1> test.txt
> ps2pdf test.ps<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftest.ps&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174406836&sdata=s6sM2Khn5hyKssqEOxzjpxGO3Tt7N7aGnDhJmX8iCq4%3D&reserved=0> test_last.pdf
>

Your log also shows:

Traceback (most recent call last):
  File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/task.py",
line 74, in handler_thread
    task.service()
  File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/channel.py",
line 368, in service
    request._close()
  File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/parser.py",
line 249, in _close
    body_rcv.getbuf()._close()
  File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/buffers.py",
line 303, in _close
    buf._close()
  File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/buffers.py",
line 110, in _close
    self.file.close()
IOError: [Errno 9] Bad file descriptor


which reminds me of following two open issues:

https://bitbucket.org/conservancy/kallithea/issues/219/waitress-exception-when-serving-file<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbitbucket.org%2Fconservancy%2Fkallithea%2Fissues%2F219%2Fwaitress-exception-when-serving-file&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174415529&sdata=exLgLZUqvSCv2fdNzcX1RwwbcqYT%2BO6KZUFySQhS0NU%3D&reserved=0>
https://bitbucket.org/conservancy/kallithea/issues/229/bad-file-descriptor<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbitbucket.org%2Fconservancy%2Fkallithea%2Fissues%2F229%2Fbad-file-descriptor&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174425545&sdata=Yl31jgLaplWLQL6IAczbkc8jO4x%2Fk2ngaC%2Bh4i%2BqURg%3D&reserved=0>



Is the PDF on which you see the issue something you could share?
Or could you create another PDF with dummy data that also exhibits the issue?

If at all it would be possible, could you test with the default branch
of Kallithea, instead of 0.3.5 ?

Note that it doesn't make sense to me that the contents of the file would matter. I think it is more likely about the file size. Could you check the file sizes of the different files you tested with?

Also, the reporter of issue #219 reported back that his issue was gone when switching away from waitress to another web server, in his case uwsgi. Could you try that too ?

Thanks,
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sfconservancy.org/pipermail/kallithea-general/attachments/20180613/36180695/attachment-0001.html>


More information about the kallithea-general mailing list