ããã¹ãã®ããã¹ã圢åŒã®è§£æãç¶ããŸãã ãããã£ãŠã以åã«çŽæãããPDFã
ããŒã¿ãã«ããã¥ã¡ã³ã圢åŒã§ã¯ã
ååèŠãDOCXãODTã»ã©åçŽã§ã¯ãããŸããããããã§ãå
ã¯ãã€ããªåœ¢åŒã§ã¯ãªãããã¹ã圢åŒã®ãŸãŸã§ãã ã³ã£ããããïŒ æ¬¡ã«ãäžèº«ãèŠãŠã¿ãŸãããã
ããããæ¬åœã«ããããã®ããã¹ãã
ãæ°ã¥ããããããŸããããç§ãã¡ã®åã«ã¯ããã€ããªããŒã¿ãæ£åšããéåžžã«ãããã¹ããã®ããã¥ã¡ã³ãããããŸãã ãã¡ãããããŒãããã¯ã§pdfããã¯ãèªãããšã¯ã§ããŸããããäœãæžãããäœãç»é¢ã«è¡šç€ºãããããçè§£ããããšã¯éåžžã«å¯èœã§ãã ãã®èšäºã®ç®çã¯ããŒã¿åœ¢åŒã説æããããšã§ã¯ãªãããšãäºåã«ãç¥ããããŸãããã®ããããããã¹ãã¯ã©ãã§æ€çŽ¢ã§ããŸããïŒã
PDFããŒã¿ã¿ã€ã
PDFã¯ããã€ãã®åºæ¬çãªããŒã¿åïŒæ£ç¢ºã«ã¯8ã€ïŒããµããŒãããŠããŸããããã®äžéšã¯æååãé
åãèŸæžïŒãã£ã¹ãããªãŒïŒãã¹ããªãŒã ããªããžã§ã¯ãã§ãã ããããã«ã€ããŠèª¬æããŸãããã
è¡PostScriptããç¶æ¿ãããPDFæååããã®çµæã.pdfã®æååã¯ãæ¬åŒ§ã§å²ãŸãã8ãããæåã®ã·ãŒã±ã³ã¹ãæå³ããŸãã æååã¯ããã¯ã¹ã©ãã·ã¥ã䜿çšããŠæ¬¡ã®è¡ã«è»¢éã§ããŸããããã¯ã¹ã©ãã·ã¥ã¯è¡ã®äžéšã§ã¯ãªããç¹ã«ç¹æ®æåããšã¹ã±ãŒãããŸãã
ïŒæåã®è¡\
æåã®è¡\ n 2çªç®ã®è¡ã¯è§ãã£ãä»ã\ïŒ\ïŒïŒ
ãã®çµæãåºåã«2è¡ã衚瀺ãããŸãã
æåã®è¡æåã®è¡
æ¬åŒ§ä»ãã®2è¡ç®ïŒïŒ
PDFã®å
ã®8ãããæåã®ãããããšãã°åãUnicodeãšã³ã³ãŒãã£ã³ã°ã§ããã¹ãããŒã¿ãæ¿å
¥ããæ¹æ³ãããã€ããããŸãã ç¬ç«ãã2ãã€ãã®16鲿°ïŒ
<2B>
ïŒãŸãã¯ãã®ã·ãŒã±ã³ã¹ïŒ
<54776F20>
ïŒã䜿çšããŠã8鲿åã³ãŒãæ¿å
¥ïŒ
\053
ïŒã䜿çšã§ããŸãã ããšãã°ã次ã®è¡ã¯åçã§ãã
ïŒ2 + 2 =4ãïŒ
ïŒ2 \ 053 2 \ 0754ãïŒ
ïŒ2ã€ã®<2B> 2ã€ã®<3D> 4ã€ãïŒ
ïŒ<54776F202B2074776F203D20> 4ïŒã
å°æ¥ãPDFããã¥ã¡ã³ããå«ãè¡ã®ããã¹ãããŒã¿ãæ€çŽ¢ããæ¹æ³ãåŠç¿ããŸãã
é
åPDFé
åã¯è§æ¬åŒ§ã§å²ãŸããåçŽã«ã°ã«ãŒãåããããªããžã§ã¯ãã®ã·ãŒã±ã³ã¹ã§ãã äŸïŒ
[(Hello,)10(world!)]
ã é
åã«ã¯ããã¹ãæååãå«ãŸããããšããããŸãã
èŸæžãããã¯<<ãš>>ã§å²ãŸããããŒãšå€ã®ãã¢ã§ãã å€ãã®å ŽåãèŸæžã¯ãèŸæžã«èšè¿°ãããŠããããããã£ãå«ããªããžã§ã¯ãã«ãã®ãªããžã§ã¯ããä»äžããããã«äœ¿çšãããŸãã ãããããããã®ããŒã¿ã¯ãããšãã°ã¹ããªãŒã ã埩å·åããæ¹æ³ããã®é·ãã調ã¹ãæ¹æ³ããŸãã¯éã«çŸåšã®ãªããžã§ã¯ããé¢å¿ã®ãªããã®ãšããŠç Žæ£ããæ¹æ³ïŒã€ã¡ãŒãžã®å ŽåïŒã決å®ããã®ã«åœ¹ç«ã¡ãŸãã éåžžã®PDFèŸæžã®äŸã次ã«ç€ºããŸãã
<<
/é·ã681
/ãã£ã«ã¿ãŒ
/ FlateDecode
>>
èªãã åŸãç§ã®ã³ãŒãã¯æ¬¡ã®ããã«è¡šç€ºããŸãïŒ
$èŸæž = é
å ïŒ
"é·ã" => "681" ã
ããã£ã«ã¿ãŒã => true ã
"FlateDecode" => true ã
ïŒ ;
ã¹ããªãŒã ã¹ããªãŒã ã¯ã
stream
endstream
ãš
endstream
stream
endstream
éã®8ãããããŒã¿ã®ã·ãŒã±ã³ã¹ã衚ããŸãã ãã€ããªããŒã¿ã¯ãå§çž®ããã¹ããç»åãåã蟌ã¿ãã©ã³ãã®ãããã§ãã£ãŠããã¹ããªãŒã ãšããŠè¡šç€ºãããŸãã ã¹ããªãŒã ã¯åžžã«ãªããžã§ã¯ãã®å
éšïŒããäžïŒã«é
眮ãããå°ãªããšããã®é·ãïŒèŸæžã®ãªãã·ã§ã³
/Length N
ïŒãšãå€ãã®å Žåå§çž®æ¹æ³ïŒ
/Filter /FlateDecode
ïŒã«ãã£ãŠç¹åŸŽä»ããããŸãã PDFã¯ååãªæ°ã®å§çž®åœ¢åŒïŒæå·å圢åŒ
/CryptDecode
ãå«ãïŒããµããŒãããŸããã3ã€ã ãã«é¢å¿ããããŸãïŒæãäžè¬çã«äœ¿çšãããFlateïŒgzipå§çž®ïŒããã³ãããŸããªASCII HexïŒããŒã¿ãæ«å°Ÿã®æåãæã€16鲿ååãšããŠè¡šããŸã
>
ïŒ ASCII 85ããŒã¹ïŒãœãŒã¹ããã¹ãã®4ã€ã®é£ç¶ããæåããASCIIããŒãã«ã§
!
ãã
y
ãŸã§ã®5æåã§ãšã³ã³ãŒããããŠããå Žåã®å§çž®ïŒã
ã¹ããªãŒã ã§ã¯ãPDFããã¥ã¡ã³ãããååŸããããã¹ããæ¢ããŸãã ãã®ãããã¯ã®åé ã«ããç»åã®åŸåã«ã¹ããªãŒã ã®äŸãèŠã€ããããšãã§ããŸãïŒ
ã¯ããã¯ãããããã®äºè£-ããã¯ããã§ãã
ãªããžã§ã¯ããªããžã§ã¯ãã¯ãåäœããæå€§ã®æ§é ã§ãã ãªããžã§ã¯ãã¯ãããŒã¯ãŒã
obj
ããã³
endobj
ã§å²ãŸãããéåžžã®æ°å€ããã¹ããªãŒã ãŸã§ã®ä»ã®ããŒã¿åãå
éšã«å«ãããšãã§ããŸãã ãªããžã§ã¯ãã¯ãããã¥ã¡ã³ãå
ã§ç¬èªã®IDãæã¡ãããã䜿çšããŠåç
§ã§ããŸãã ãŸããèªåã®å
éšã«ã¹ã¬ãããæã€ãªããžã§ã¯ãïŒã¡ã€ã³ãµãã¿ã¹ã¯ãå¿ããªãã§ãã ããïŒã«èå³ããããŸããã»ãšãã©ã®å ŽåãèŸæžã®åœ¢ã§è¿œå ã®ãªãã·ã§ã³ã»ãããå«ãŸããŠããŸãã ããã¯ãPDFãã¡ã€ã«å
ã®ãªããžã§ã¯ãã®å
žåçãªäŸã§ãïŒéå§çž®ã¹ããªãŒã ã³ã³ãã³ãã䜿çšïŒïŒ
2 0 obj
<<
/é·ã9 2 R
>>
ã¹ããªãŒã
BT
/ F1 12 Tf
72,712 TdïŒçãããã¹ãã¹ããªãŒã ãïŒTj
ET
ãšã³ãã¹ããªãŒã
endobj
ããŠãããŒã¿ã®å
éšè¡šçŸã®å
¥ééšåã¯ããã§çµãããŸããããã¡ãã£ãšããããšã-ã¹ããªãŒã ããããã¹ããååŸããå
éšæå倿ã®èŸæžãååŸããŸãïŒãããŸã§èŠãããšã®ãªãå®è£
ã§ãïŒã
ããã¹ããæ¢ãå Žæã¯ïŒ
ãPDFææžå
ã®ããã¹ããªããžã§ã¯ããã©ãã§æ€çŽ¢ã§ããŸããïŒããšããåé¡ãå®åŒåããŸããããã§ã¯ããã¹ãŠã1åãŸãã¯2å以äžã®ç°¡åãªããšãããŸããŸãªãã©ãŒã©ã ã§èª¬æããŠããŸããã¹ã¬ããããããªããžã§ã¯ããæ¢ããŸãã éåžžãgzipã§å§çž®ãããã¹ããªãŒã ãæå³ããŸãããããã¥ã¡ã³ãã«ã¯ãå§çž®ãããŠããªãããéã«ãããã€ãã®å§çž®ïŒ
/Filter /FlateDecode /ASCIIHexDecode
ïŒãå«ãŸããŠããå¯èœæ§ããããŸãã ããŠãæå¹ãªäŸãå¿
èŠã§ãã Mikhail Yuryevich Lermontovã«ããè©©ãSailãã®
PDFåœ¢åŒ ïŒãã®ããã¥ã¡ã³ãã¯ãåã®èšäºã®odtãã¡ã€ã«ãã
Acrobat.comã§äœæãããŸããïŒã
ãã®ããã¥ã¡ã³ãã§ãªããžã§ã¯ããèŠã€ããŠãè§£æãéå§ããŸãã å°ãã«ã³ãã³ã°ãããŠãæããã«ããã¹ãããŒã¿ããããªããžã§ã¯ããåãäžããŸãããããã¯åãªãäŸã§ããã¹ã¯ãªããã¯äœãæ±ãããæ°ã«ããŸããã

ãŸããPDFããŒã¿åã«é¢ãã以åã«ååŸããç¥èã䜿çšããŠãç®ã®åã«ãããã®ãçè§£ããŸãããã ããŒã¿ã¹ããªãŒã ã681ãã€ãïŒ
/Length 681
ïŒã§ãããã¹ããªãŒã ãgzipïŒ
/FlateDecode
ïŒã§å§çž®ãããŠããïŒ
/Filter
ïŒãšèšãããããã£ã®ãã£ã¯ã·ã§ããªãæã€ãªããžã§ã¯ãã®åã«ãããŸãã ããŒã¿ã¹ããªãŒã ãã¢ã³ããŒãããã®ã«ååãªæ
å ±ãæ¢ã«ãããŸã
gzuncompress
ãé©åã§ãïŒ
0.1ã¯ãã
q 0 -0.1 612.1 792.1 re W * n
q 0 0 0 RG
0 0 0 rg
BT
2 Tr 0.59999ã¯ãã
56.8 716.6 Td / F1 18 Tf [<01> 17 <02> 10 <03> 10 <04> 17 <05>] TJ
ET
Q
q 0 0 0 rg
BT
56.8 682.5 Td / F1 11 Tf [<06> 9 <07> 11 <08> 6 <07> 11 <07> 11 <09> 13 <0A> 4 <0B> 14 <0C> 11 <0D> 11 <0E > 9
<0F> 9 <0A> 4 <10> 11 <11> 10 <12> 23 <13> 6 <10> 11 <14> 10 <10> 11 <15>] TJ
ET
...å€ãã®ããã¹ã...
次ã«ããã®äŸããå°ãè±ç·ããPDFã§ã®ããã¹ãã®è¡šç€ºã«ã€ããŠããå°ãåŠç¿ããŸãããã ããã€ãã®ããšãèŠããŠããå¿
èŠããããŸãã
- ããã¹ããã¹ããªãŒã å
ã«ããå Žåãããã¹ã
BT
ïŒããã¹ãã®å
é ïŒã®å
é ãšET
æ«å°ŸïŒããã¹ãã®æ«å°ŸïŒã®ãããŒã«ãŒãã®éã«å«ãŸããŸãã - PDFã¯ãTjããŒã±ããïŒããã¹ãã®è¡šç€ºïŒãŸãã¯
TJ
ããŒã«ãŒïŒåã
ã®æåã®äœçœ®ã«åºã¥ããŠããã¹ãã衚瀺ïŒããããã©ããã«ãã£ãŠãããã¹ãã衚瀺ããå Žåãšããªãå ŽåããããŸãã ãããã®ããŒã«ãŒã¯ããã®å Žåã®ããã«ãããã¹ãã®è¡ãŸãã¯è¡ã®é
åã®åŸã«è¡šç€ºãããŸãïŒ [<01>17<02>10<03>10<04>17<05>]TJ
ïŒã - äžã§æžããããã«ãPDFã¯åã
ã®æåã®é
眮ããµããŒãããŸããã€ãŸããæåã®åãã¢éã®è·é¢ã®ä»»æã®åå¥ã®ãµã€ãºãæå®ã§ããŸãã ããã«ã€ããŠã¯åŸã§
ãã®æ
å ±ã¯ãäŸãã2è¡ã匷調ããã®ã«ååã§ãã
1. <01> 17 <02> 10 <03> 10 <04> 17 <05>
2. <06> 9 <07> 11 <08> 6 <07> 11 <07> 11 <09> 13 <0A> 4 <0B> 14 <0C> 11 <0D> 11 <0E> 9
<0F> 9 <0A> 4 <10> 11 <11> 10 <12> 23 <13> 6 <10> 11 <14> 10 <10> 11 <15>
ãã®äŸã®PDFãæ³šææ·±ãèªãã èªè
ã¯ãèŠåºãïŒ
SAIL ïŒãšè©©ã®æåã®è¡ïŒ
åžã ããçœããªã ïŒãããããšã瀺åããŠãããããããŸããã ãããŠåœŒã¯æ£ããã ãããïŒ ãã ãããã®ããã¹ãã®éåžžã«å¥åŠãª16é²ã³ãŒãã¯èŠã€ãããŸããã
ã¯01 02 03 04 05
ãšããŠãšã³ã³ãŒããããŸã
06 07 08 07 07 09
...
äœããã®å¯Ÿå¿è¡šãããããã«èŠããŸãããïŒ ããŠãããªãã¯åã³æ£ãããèŠãŠã¿ãŸããã...
å€æè¡š
åã®äŸã§ã¯ãPDFããããã¹ããååŸããããã®ã»ãšãã©ã®é¢æ°ãä¿åãããŸããããã¯ãã€ã³ã¿ãŒãããäžã®ãããªãã¯ãã¡ã€ã³ã§èŠã€ããããšãã§ããŸãã äœãäœã§ããããçè§£ããŠã¿ãŸãããã ãããã£ãŠã
ToUnicode CMapsã«é¢å¿ã
ãããŸã ãããã«ã€ããŠã¯ãAdobeããPDF圢åŒã®èª¬æã®ããã¹ããååŸããããšã«é¢ãããµãã»ã¯ã·ã§ã³ã§èª¬æããŸãã ãã¡ã€ã«ã§ããããæ€çŽ¢ããŸãããã ç§ã¯åã³ã«ã³ãã³ã°ããèªè
ã«ãæå³çã«æ£ããäœåããæäŸããŸãã

è§£èªããïŒ
/ CIDInit / ProcSet findresource begin
12 dictéå§
begincmap
/ CIDSystemInfo <<
/ã¬ãžã¹ããªïŒAdobeïŒ
/泚æïŒUCSïŒ
/ãµããªã¡ã³ã0
>> def
/ CMapName / Adobââe-Identity-UCS def
/ CMapType 2 def
1 begincodespacerange
<00>
endcodespacerange
45 beginbfchar
<01> <041F>
<02> <0410>
<03> <0420>
<04> <0423>
<05> <0421>
<06> <0411>
<07> <0435>
<08> <043B>
<09> <0442>
...倿ã®å€ãã®è¡...
endbfchar
endcmap
CMapName currentdict / CMap defineresource pop
çµãã
çµãã
ããªãã¿ã®æ°å
<01>
ã
<02>
ãªã©ïŒ ãŸã-ããã¹ãè¡ã§å°ãåã«èŠãŸããã
01
ã
041F
ã«çœ®ãæããå¿
èŠããããšä»®å®ãããã®æ°åãäœãé ããŠããã®ãèŠãŠã¿ãŸãããã ãã£ãïŒ
#x041F
=
ïŒ ãããã£ã©ã¯ã¿ãŒããå¥ã®ãã£ã©ã¯ã¿ãŒãžã®å€æãèŠã€ãã£ãã®ã§ãä»åºŠã¯ããã¥ã¡ã³ããåç
§ããŠãããå°ãåŠã³ãŸãã
bfcharbeginbfchar
ãš
endbfchar
ã®éã®å€æãæãç°¡åã§ãã æåã®ã³ãŒããå¥ã®ã³ãŒãã«äžèŽãããŸãã ããšãã°ãäžèšã®äŸã§ã¯ã
01
ãæåã³ãŒã
é ãããšãããããŸãã
ããããããã¯ãã®å€æã®æäœã®ç¹å¥ãªå Žåã«ãããŸãã-æå€§512æåïŒUnicodeã§ã¯æå€§128æåïŒã®æååå
šäœã«åäžã®ã³ãŒããäžèŽãããããšãå¯èœã§ãã
bfrangebeginbfrange
ãš
endbfrange
å²ãŸããå¥ã®ããè€éãªå€æããããŸãã åã
ã®ãã£ã©ã¯ã¿ãŒã§ã¯ãªãããã®ç¯å²ã§åäœããŸãã 倿ã¯ãäœæ¥ã®ããã«2ã€ã®ãªãã·ã§ã³ããµããŒãããŸãã
<0000> <005E> <0020>
-0000ãã005Eã®ç¯å²ã§äœæ¥ããŸããåå€ã¯ãéé0020ããã³007Eã®å€ã«å€æãããŸãã åçã«æ°ã¥ããŸãããïŒ 0000ã¯0020ã0001ãã0021ã0002ãã0022ãªââã©ã«å€æãããŸãã<005F> <0061> [<00660066> <00660069> <00660066006C>]
-005Fãš0061ã®ééïŒã€ãŸããå¥ã®0060ïŒã®åå€ã¯ãè§æ¬åŒ§å
ã®é
åã®å¯Ÿå¿ããã·ãŒã±ã³ã¹ã«çœ®ãæããããŸãã005Fã¯0066 00ã«çœ®ãæããããŸãã 66ïŒã€ãŸãff
ïŒã0060ã§fi
ããããŠ0061ã§ffl
ã
ã¢ã«ãŽãªãºã ãšã³ãŒã
ç§ãã¡ã®ç¥èã䜿çšããŠãåžã«é¢ãããäžéãªãè©©ãèªãããšãã§ããŸãã ããŠãæãè峿·±ãã³ãŒããšå®å
šãªãœãŒã¹ãžã®ãªã³ã¯ãæç€ºããæéã§ãïŒ
- 颿° pdf2text ïŒ $ filename ïŒ {
- // pdfãã¡ã€ã«ã®ããŒã¿ãè¡ã«èªã¿èŸŒã¿ããã¡ã€ã«ã«å«ãŸããå¯èœæ§ãããããšãèæ
®
- //ãã€ããªã¹ããªãŒã ã
- $ infile = @ file_get_contents ïŒ $ filename ã FILE_BINARY ïŒ ;
- if ïŒ empty ïŒ $ infile ïŒ ïŒ
- return "" ;
- //æåã®ãã¹ã ãã¡ã€ã«ãããã¹ãŠã®ããã¹ãããŒã¿ãååŸããå¿
èŠããããŸãã
- //æåã®ãã¹ã§ã¯ãäœçœ®ä»ãããããããŒãã£ãããŒã¿ã®ã¿ãååŸããŸãã
- // 16鲿¿å
¥ãªã©ã
- $倿 = é
å ïŒ ïŒ ;
- $ texts = array ïŒ ïŒ ;
- //ãŸããpdfãã¡ã€ã«ãããã¹ãŠã®ãªããžã§ã¯ãã®ãªã¹ããååŸããŸãã
- preg_match_all ïŒ "#objïŒã*ïŒendobjïŒismU" ã $ infile ã $ objects ïŒ ;
- $ãªããžã§ã¯ã = @ $ãªããžã§ã¯ã [ 1 ] ;
- //èŠã€ãããã®ãèŠãŠã¿ãŸããã-ããã¹ãã«å ããŠãæãŸãããšãã§ããŸã
- //ããšãã°ãåããã©ã³ããªã©ãå€ãã®è峿·±ããã®ã§åžžã«ãããããããã®ã§ã¯ãããŸããã
- for ïŒ $ i = 0 ; $ i < count ïŒ $ objects ïŒ ; $ i ++ ïŒ {
- $ currentObject = $ objects [ $ i ] ;
- //çŸåšã®ãªããžã§ã¯ãã«ããŒã¿ã¹ããªãŒã ããããã©ããã確èªããŸãïŒã»ãšãã©ã®å ŽåïŒ
- // gzipã䜿çšããŠå§çž®ããŸãã
- if ïŒ preg_match ïŒ "#streamïŒã*ïŒendstreamïŒismU" ã $ currentObject ã $ stream ïŒ ïŒ {
- $ã¹ããªãŒã = ltrim ïŒ $ã¹ããªãŒã [ 1 ] ïŒ ;
- //ãã®ãªããžã§ã¯ãã®ãã©ã¡ãŒã¿ãèªã¿åããŸããããã¹ãã®ã¿ã«é¢å¿ããããŸã
- //ããŒã¿ãªã®ã§ãæå°éã®ã¯ãªããã³ã°ãè¡ã£ãŠé床ãäžããŸã
- //å®è¡ãã
- $ options = getObjectOptions ïŒ $ currentObject ïŒ ;
- if ïŒ ïŒ ïŒ empty ïŒ $ options [ "Length1" ] ïŒ && empty ïŒ $ options [ "Type" ] ïŒ && empty ïŒ $ options [ "ãµãã¿ã€ã" ] ïŒ ïŒ ïŒ
- ç¶ãã ;
- //ãããã£ãŠãããããããããã¹ãã«ãªãåã«ããã€ããªãã埩å·åãã
- //ãã¥ãŒã ãã®ã¢ã¯ã·ã§ã³ã®åŸããã¬ãŒã³ããã¹ãã®ã¿ãåŠçããŸãã
- $ data = getDecodedStream ïŒ $ stream ã $ options ïŒ ;
- if ïŒ strlen ïŒ $ data ïŒ ïŒ {
- //ãããã£ãŠãçŸåšã®ã¹ããªãŒã ã§ããã¹ãã³ã³ãããèŠã€ããå¿
èŠããããŸãã
- //æåããå ŽåãèŠã€ãã£ãããŒãã£ããã¹ãã¯æ®ãã«ç§»åããŸã
- //åã«èŠã€ãã£ã
- if ïŒ preg_match_all ïŒ "#BTïŒã*ïŒETïŒismU" ã $ data ã $ textContainers ïŒ ïŒ {
- $ textContainers = @ $ textContainers [ 1 ] ;
- getDirtyTexts ïŒ $ texts ã $ textContainers ïŒ ;
- //ãã以å€ã®å Žåãã·ã³ããªãã¯å€æãèŠã€ããããšãã
- // 2çªç®ã®ã¹ãããã§äœ¿çšããŸãã
- } ãã®ä»
- getCharTransformations ïŒ $倿 ã $ããŒã¿ ïŒ ;
- }
- }
- }
- // pdfããã¥ã¡ã³ãã®åæè§£æã®çµããã«ãåä¿¡ããããã¥ã¡ã³ãã®åæãéå§ããŸã
- //ã·ã³ããªãã¯å€æãèæ
®ããããã¹ããããã¯ã æåŸã«ãæ»ããŸã
- //çµæãååŸãããŸããã
- getTextUsingTransformations ïŒ $ texts ã $ transformations ïŒã è¿ã ãŸãã
- }
GitHubã«ã³ã¡ã³ããä»ã
ãŠã³ãŒããååŸã§ã
ãŸã ã
ãããã«
ããŠããã®ã³ãŒãã¯äœæã®å ã§ã¯ãããŸãããæäŸããããã¹ãŠã®pdfãã¡ã€ã«ãè§£æããããã§ã¯ãããŸããã ããšãã°ããã·ã¢èªã®ãã©ã³ããå®è£
ãããè±èªã®æåãããã·ã¢èªã®æåã®è¡šç€ºã«å€æããã
ããã¥ã¡ã³ãããããŸãã
ãã®ã³ãŒãã¯ãåã
ã®æåã®é
眮ã§ã¯æ©èœããŸããã ã¿ã¹ã¯ã¯å®è¡å¯èœã§ãããé£ããã¯ãããŸãããç§ã¯ãã®è§£æ±ºçãèªè
ã®è©ã«çœ®ããŠããŸãã
ãã®ã³ãŒãã¯ãæ
å ±ãæç€ºããããã®å
éšæšæºã«åŸã£ãŠPDFãã¡ã€ã«ãèªãã®ã«ã¯çæ³çã§ã¯ãããŸããïŒããŒãžãæ€çŽ¢ãããããã¥ã¡ã³ãã®ããŒãžã§ã³ã§åäœããŸããïŒPDFã¯å€æŽã®å±¥æŽããµããŒãããŸãïŒãåŠçã§ããæ
å ±ãå®å
šã«èªã¿åããªãå¯èœæ§ãããããŸãã
誰ã
$content = shell_exec('/usr/local/bin/pdftotext '.$filename.' -');
ããã£ã³ã»ã«ããŠããªãããšã«æ³šæããŠãã ãã
$content = shell_exec('/usr/local/bin/pdftotext '.$filename.' -');
ã ãããããã®å Žåãã¿ã¹ã¯ã¯ãä»»æã®ãã©ãããã©ãŒã ããã³ä»»æã®ãã©ãããã©ãŒã ã§PDFãèªã¿åãããšã§ããã
ãã®èšäºã«èå³ãæã£ãŠããã ããã°å¹žãã§ãããã®ç®çã¯ãã³ãã¥ããã£ãPDFããã€ã¹ã«æ
£ããããPHPã§ãããèªãèœåãæã¡ãè€éãªã±ãŒã¹ã§ããŒã¿ãååŸããããã®åºçºç¹ãèŠã€ããããšã§ãã
ã¢ã¯ãã£ããã£ãšåé¡ãžã®é¢å¿ã«å¿ããŠãPDFïŒããã¥ã¡ã³ãã®å
éšæ§é ãããžã·ã§ãã³ã°ããã©ã³ããå
éšãªã³ã¯ïŒã«é¢ããã¹ããŒãªãŒãç¶ããããRTFãäŸãšããŠãããã¯ããã¹ãŠã®ã³ã¹ãã§ããã¹ããã«æ»ããŸãã ãæž
èŽããããšãããããŸããïŒ
åç
§ïŒ