Potential issue with pdf files and git repos
Matey Chopov
matey.chopov at ca.abb.com
Thu Jun 14 16:32:20 UTC 2018
Ok, finally got it.
I built Kallithea from the source repo. I changed waitress for uwsgi and I also changed the sqlite db for a postgres one.
Had to readjust my git client http post buffer:
git config http.postBuffer 524288000
After that, got a 11Mb pdf file into the git repo with no problems.
Thanks,
Mat
From: Matey Chopov
Sent: Wednesday, June 13, 2018 6:16 PM
To: Thomas De Schampheleire <patrickdepinguin at gmail.com>
Cc: kallithea-general at sfconservancy.org
Subject: RE: Potential issue with pdf files and git repos
Hi Thomas,
Here is a similar version to the original .pdf I was talking about here: http://www.gimpel.com/html/manual.pdf
I think you are right about the issue being with file size instead of file content.
When I converted the original pdf file to text and back to pdf it drastically changed size. The original is about 4.5MB while the converted one was around 1MB.
I proceeded to push a bigger .pdf file (around 11.1 MB size with no encryption) and I got the same trace error, so apparently it isn’t about the file being encrypted or not.
I am getting a similar trace when using uwsgi, I played around with the buffer size, but to no avail.
Here is another sample log file when I try to push a big .pdf file to a git repo on Kallithea 0.3.5 using uwsgi as a web server.
https://pastebin.com/Y1piX0vX
Tomorrow, I will try to test with the default branch instead of 0.3.5.
Thanks,
Mat
From: Thomas De Schampheleire [mailto:patrickdepinguin at gmail.com]
Sent: Wednesday, June 13, 2018 4:52 PM
To: Matey Chopov <matey.chopov at ca.abb.com<mailto:matey.chopov at ca.abb.com>>
Cc: kallithea-general at sfconservancy.org<mailto:kallithea-general at sfconservancy.org>
Subject: Re: Potential issue with pdf files and git repos
On Wed, Jun 13, 2018, 22:29 Thomas De Schampheleire <patrickdepinguin at gmail.com<mailto:patrickdepinguin at gmail.com>> wrote:
2018-06-13 20:39 GMT+02:00 Matey Chopov <matey.chopov at ca.abb.com<mailto:matey.chopov at ca.abb.com>>:
> Hi,
>
> It looks like it happens with a specific .pdf manual, I tested it with another .pdf file and the exception didn't occur, the file got pushed correctly.
>
> Here's the line in the trace I think is the most interesting:
>
> DatabaseError: (DatabaseError) file is encrypted or is not a database u'SELECT ui.ui_id AS ui_ui_id, ui.ui_section AS ui_ui_section, ui.ui_key AS ui_ui_key, ui.ui_value AS ui_ui_value, ui.ui_active AS ui_ui_active \nFROM ui \nWHERE ui.ui_key = ?' ('push_ssl',)
> 2018-06-13 11:07:04.946 ERROR [waitress] Exception when servicing <waitress.channel.HTTPChannel connected 0.0.0.0:12756<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2F0.0.0.0%3A12756&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174375499&sdata=W6criUpKycuZzM7RhwN%2FHAFWlvo3ORA%2BYjGzFo%2BVbDQ%3D&reserved=0> at 0x7f06c5923e90>
>
> I have uploaded the surrounding log on pastebin: https://pastebin.com/n9fY4xae<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fn9fY4xae&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174385503&sdata=mg7SjCBrBxdwo%2FuD9Bj5O%2FVUgXb4J7A%2BiHT5N3R9X38%3D&reserved=0>
>
> So the problematic pdf that I though wasn't encrypted, was actually encrypted with RC4, the weird thing is that in Git Extensions you can still see the file contents in the "diff" section.
>
> Apparently, pdf readers automatically decrypt such files if there is no password (which is the current case).
>
> I used qpdf to decrypt the file (with no password) which gave another valid .pdf file with no encryption (at least that's what I get when I analyze the file with pdfinfo).
>
> Tried pushing that file too, but it still failed.
>
> I played with the pdf headers, changed the Creator and Producer values to the ones of a .pdf file I know could be uploaded. Same error.
>
> I tried converting the file from pdf1.3 to pdf1.4, same issue.
>
> So, what finally worked for me was converting from pdf to ps, then to text, then from the text file, to ps, and then to pdf. The indexing table got screwed, but that doesn't really bother me. Finally, pushed the new pdf file to the git repo with success.
>
> Commands:
>
> pdftops test.pdf test.ps<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftest.ps&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174385503&sdata=TGgIAOfHl3F23zTRt%2BBFffRTM9%2FOdJxPIERyGF%2Fq82o%3D&reserved=0>
> ps2txt test.ps<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftest.ps&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174395507&sdata=zeS75LIlalR4ER%2BJuA%2FQJ4QeYBj1zBaLRS4TMjIC%2B0E%3D&reserved=0> test.txt
>
> enscript -B --margins=10:10 -o test.ps<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftest.ps&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174406836&sdata=s6sM2Khn5hyKssqEOxzjpxGO3Tt7N7aGnDhJmX8iCq4%3D&reserved=0> -f Courier at 7.3/1<mailto:Courier at 7.3/1> test.txt
> ps2pdf test.ps<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftest.ps&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174406836&sdata=s6sM2Khn5hyKssqEOxzjpxGO3Tt7N7aGnDhJmX8iCq4%3D&reserved=0> test_last.pdf
>
Your log also shows:
Traceback (most recent call last):
File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/task.py",
line 74, in handler_thread
task.service()
File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/channel.py",
line 368, in service
request._close()
File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/parser.py",
line 249, in _close
body_rcv.getbuf()._close()
File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/buffers.py",
line 303, in _close
buf._close()
File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/buffers.py",
line 110, in _close
self.file.close()
IOError: [Errno 9] Bad file descriptor
which reminds me of following two open issues:
https://bitbucket.org/conservancy/kallithea/issues/219/waitress-exception-when-serving-file<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbitbucket.org%2Fconservancy%2Fkallithea%2Fissues%2F219%2Fwaitress-exception-when-serving-file&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174415529&sdata=exLgLZUqvSCv2fdNzcX1RwwbcqYT%2BO6KZUFySQhS0NU%3D&reserved=0>
https://bitbucket.org/conservancy/kallithea/issues/229/bad-file-descriptor<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbitbucket.org%2Fconservancy%2Fkallithea%2Fissues%2F229%2Fbad-file-descriptor&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174425545&sdata=Yl31jgLaplWLQL6IAczbkc8jO4x%2Fk2ngaC%2Bh4i%2BqURg%3D&reserved=0>
Is the PDF on which you see the issue something you could share?
Or could you create another PDF with dummy data that also exhibits the issue?
If at all it would be possible, could you test with the default branch
of Kallithea, instead of 0.3.5 ?
Note that it doesn't make sense to me that the contents of the file would matter. I think it is more likely about the file size. Could you check the file sizes of the different files you tested with?
Also, the reporter of issue #219 reported back that his issue was gone when switching away from waitress to another web server, in his case uwsgi. Could you try that too ?
Thanks,
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sfconservancy.org/pipermail/kallithea-general/attachments/20180614/4d68855f/attachment-0001.html>
More information about the kallithea-general
mailing list