• We've upgraded our forums. Please post any issues/requests in this thread.

Searching in PDF?

Joined
Apr 9, 2010
Messages
773 (0.28/day)
Likes
99
Location
Atlanta/Marietta, GA
System Name Norbert
Processor Intel Core i7 920
Motherboard Gigabyte X58A-UD5
Cooling Corsair H50 with 2x Scythe GT AP-14
Memory 3x 2gb G.Skill 1600Mhz C9 DDR3
Video Card(s) MSI Twin Frozr II GTX 465 GE & EVGA GTS 450 SC
Storage 2x 1Tb Samsung Sprinpoint F3 7200rpm
Display(s) Dell U3011, Dell 2408WFP, Samsung 2693HM
Case Lian Li V1020R
Audio Device(s) Creative X-Fi Titanium
Power Supply Seasonic X-750
Software Windows 7 Ultimate 64bit
#1
Hey guys, I need to look up a large list of data from a PDF file, basically just to check if each entry of the list is in the pdf. Is there some way to do this without having to check one by one?
 

erocker

Senior Moderator
Staff member
Joined
Jul 19, 2006
Messages
42,369 (10.18/day)
Likes
18,018
Processor Intel i7 8700k
Motherboard Gigabyte z370 AORUS Gaming 7
Cooling Water
Memory 16gb G.Skill 4000 MHz DDR4
Video Card(s) Evga GTX 1080
Storage 3 x Samsung Evo 850 500GB, 1 x 250GB, 2 x 2TB HDD
Display(s) Nixeus EDG27
Case Thermaltake X5
Power Supply Corsair HX1000i
Mouse Zowie EC1-B
Software Windows 10
#2
Ctrl + f
 
Joined
Apr 10, 2006
Messages
373 (0.09/day)
Likes
93
Location
Arizona, USA
Processor Intel Core i5 3450
Motherboard Asus P8H77-M Pro
Cooling Thermalright HR-02 Macho
Memory 2x2GB G-Skill DDR3-2000
Video Card(s) Asus GTX660ti 2GB
Storage 128GB Samsung 830, 1TB Western Digital Caviar Black
Display(s) 23" Dell U2311H
Case Silverstone FT03 (titanium color)
Audio Device(s) Onboard Realtek ALC892
Power Supply SeaSonic X series SS-460FL
Software Windows 8 Professional 64-bit
#3
That.
Or, if you need to check a bunch of text with some other advanced method, you could use the text selection tool in Adobe Reader, select the text, copy and paste it into some other application (like MS Word or Excel) that will allow you to search the way you want (with custom VBA macro code).
 
Joined
Jul 26, 2010
Messages
1,655 (0.61/day)
Likes
729
Location
Philly
System Name Primary Rig
Processor Phenom II X4 B50 @ 3.7GHz
Motherboard Biostar TA790GX 128M
Cooling Sunbeam CR-CCTF 120mm , 6x120mm, MOS-C1
Memory 2x2GB Kingston HyperX 1066 @ 800 4-4-4-12
Video Card(s) Sapphire HD 5830 800/1000 @ 885/1225
Storage 320GB, 400GB, 500GB, 1.5TB
Display(s) Hannspree HF259
Case CM 690
Power Supply OCZ 850W
Benchmark Scores 3Dmark06: 18545/5219 CPU Mark 7.0: 3911.2 Cinebench R10: 11826/3359 x264 HD 2.0: 75.6/23.9
#4
I'd do a slight modification on what gvblake22 said.

First I'd copy the data out of the pdf with the text selection tool. Then I'd create a copy of the data you're looking for (called myData_test.txt) and paste the data from the pdf into it. Using some basic command line tools like this (where myData.txt is the data you're looking for):
Code:
sort myData_test.txt | uniq -d > matchingData.txt
sort myData.txt | diff matchingData.txt -
The output of the second command will only show the data that's missing from the pdf.
 
Joined
Apr 10, 2006
Messages
373 (0.09/day)
Likes
93
Location
Arizona, USA
Processor Intel Core i5 3450
Motherboard Asus P8H77-M Pro
Cooling Thermalright HR-02 Macho
Memory 2x2GB G-Skill DDR3-2000
Video Card(s) Asus GTX660ti 2GB
Storage 128GB Samsung 830, 1TB Western Digital Caviar Black
Display(s) 23" Dell U2311H
Case Silverstone FT03 (titanium color)
Audio Device(s) Onboard Realtek ALC892
Power Supply SeaSonic X series SS-460FL
Software Windows 8 Professional 64-bit
#5
I'd do a slight modification on what gvblake22 said.

First I'd copy the data out of the pdf with the text selection tool. Then I'd create a copy of the data you're looking for (called myData_test.txt) and paste the data from the pdf into it. Using some basic command line tools like this (where myData.txt is the data you're looking for):
Code:
sort myData_test.txt | uniq -d > matchingData.txt
sort myData.txt | diff matchingData.txt -
The output of the second command will only show the data that's missing from the pdf.
That's a great idea. I'm assuming you just run that code as a batch file or in the Windows > Run > 'cmd'?
 
Joined
Jul 26, 2010
Messages
1,655 (0.61/day)
Likes
729
Location
Philly
System Name Primary Rig
Processor Phenom II X4 B50 @ 3.7GHz
Motherboard Biostar TA790GX 128M
Cooling Sunbeam CR-CCTF 120mm , 6x120mm, MOS-C1
Memory 2x2GB Kingston HyperX 1066 @ 800 4-4-4-12
Video Card(s) Sapphire HD 5830 800/1000 @ 885/1225
Storage 320GB, 400GB, 500GB, 1.5TB
Display(s) Hannspree HF259
Case CM 690
Power Supply OCZ 850W
Benchmark Scores 3Dmark06: 18545/5219 CPU Mark 7.0: 3911.2 Cinebench R10: 11826/3359 x264 HD 2.0: 75.6/23.9
#6
That's a great idea. I'm assuming you just run that code as a batch file or in the Windows > Run > 'cmd'?
Those are actually GNU command line utilities (common to linux/unix). I run them in Windows using cygwin. They can be run without cygwin using the GNU Utilities for Windows (though I've never tried it). Theoretically you should be able to use the GNU Utilities for Windows just like native DOS commands (in batch scripts or directly in the command prompt).
 
Joined
Apr 9, 2010
Messages
773 (0.28/day)
Likes
99
Location
Atlanta/Marietta, GA
System Name Norbert
Processor Intel Core i7 920
Motherboard Gigabyte X58A-UD5
Cooling Corsair H50 with 2x Scythe GT AP-14
Memory 3x 2gb G.Skill 1600Mhz C9 DDR3
Video Card(s) MSI Twin Frozr II GTX 465 GE & EVGA GTS 450 SC
Storage 2x 1Tb Samsung Sprinpoint F3 7200rpm
Display(s) Dell U3011, Dell 2408WFP, Samsung 2693HM
Case Lian Li V1020R
Audio Device(s) Creative X-Fi Titanium
Power Supply Seasonic X-750
Software Windows 7 Ultimate 64bit
#7
Thanks for the response guys! I actually ended up finding a copy of the file in excel so all's good now :D