I need to extract text from a pdf using C, on platform Linux.
I find "stream" and "endstream",send the string between "stream" and "endstream" to zlib,and uncompress some contents.Then I can analyse the pattern and extract english character correctly,but the problem is I don't know how to handle the Chinese character,I have tryed many charset,but I don't know which charset it uses.
So I need your help about this,such as the detail format of pdf, or the character set and so on.
Thank you!!
Please forgive my terrible English.