made fixes for really broken but readable files.#32
Open
mlaukala wants to merge 5 commits intoempira:masterfrom
Open
made fixes for really broken but readable files.#32mlaukala wants to merge 5 commits intoempira:masterfrom
mlaukala wants to merge 5 commits intoempira:masterfrom
Conversation
Author
|
ScanNextToken() will usually crash if it finds itself inside of a stream due to unexpected characters. Currently, my added function IsValidXref() uses ScanNextToken() to determine if the next token is an 'xref' token. This is used to determine if a fix needs to be made. However, if startxref is wrong, the _lexer.Position could be inside of a stream which will more than likely cause an exception. |
4fa3ff3 to
b9ae5ef
Compare
…ng an EOF symbol.
b9ae5ef to
d6337ee
Compare
b5370ce to
a0b0b97
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Can now open and parse files with an incorrect startxref and incorrect stream lengths. Note that adobe acrobat x 10.0.0 will open these files and prompt for save when closed.
When the startxref is incorrect, it looks for the 'trailer' symbol and uses that trailer. Otherwise the xref table is not rebuilt. When the trailer is found, it then parses through the entire file and records the location of each object and places a new PdfReference inside of the PdfCrossReferenceTable.
Will also attempt to correct invalid stream lengths. After the stream length is pulled from the object, we first check for an incorrect 'endstream' symbol. If the 'endstream' symbol is not present where expected, we then look for the next valid 'endstream' symbol after the 'startstream' symbol. We use the 'endstream' symbol index and set the length of the stream.
Note 1: No implementation for a pdf file with a compressed trailer object yet.
Note 2: Not tested with versioned files and will still probably fail.
Note 3: On invalid stream length, should probably check 1k chunks of data for 'endstream'. Currently checks within the invalid length and if not found, loads the rest of the file and checks again.