• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Searching in PDF?

ebolamonkey3

New Member
Joined
Apr 9, 2010
Messages
773 (0.14/day)
Location
Atlanta/Marietta, GA
System Name Norbert
Processor Intel Core i7 920
Motherboard Gigabyte X58A-UD5
Cooling Corsair H50 with 2x Scythe GT AP-14
Memory 3x 2gb G.Skill 1600Mhz C9 DDR3
Video Card(s) MSI Twin Frozr II GTX 465 GE & EVGA GTS 450 SC
Storage 2x 1Tb Samsung Sprinpoint F3 7200rpm
Display(s) Dell U3011, Dell 2408WFP, Samsung 2693HM
Case Lian Li V1020R
Audio Device(s) Creative X-Fi Titanium
Power Supply Seasonic X-750
Software Windows 7 Ultimate 64bit
Hey guys, I need to look up a large list of data from a PDF file, basically just to check if each entry of the list is in the pdf. Is there some way to do this without having to check one by one?
 
That.
Or, if you need to check a bunch of text with some other advanced method, you could use the text selection tool in Adobe Reader, select the text, copy and paste it into some other application (like MS Word or Excel) that will allow you to search the way you want (with custom VBA macro code).
 
I'd do a slight modification on what gvblake22 said.

First I'd copy the data out of the pdf with the text selection tool. Then I'd create a copy of the data you're looking for (called myData_test.txt) and paste the data from the pdf into it. Using some basic command line tools like this (where myData.txt is the data you're looking for):
Code:
sort myData_test.txt | uniq -d > matchingData.txt
sort myData.txt | diff matchingData.txt -

The output of the second command will only show the data that's missing from the pdf.
 
I'd do a slight modification on what gvblake22 said.

First I'd copy the data out of the pdf with the text selection tool. Then I'd create a copy of the data you're looking for (called myData_test.txt) and paste the data from the pdf into it. Using some basic command line tools like this (where myData.txt is the data you're looking for):
Code:
sort myData_test.txt | uniq -d > matchingData.txt
sort myData.txt | diff matchingData.txt -

The output of the second command will only show the data that's missing from the pdf.
That's a great idea. I'm assuming you just run that code as a batch file or in the Windows > Run > 'cmd'?
 
That's a great idea. I'm assuming you just run that code as a batch file or in the Windows > Run > 'cmd'?
Those are actually GNU command line utilities (common to linux/unix). I run them in Windows using cygwin. They can be run without cygwin using the GNU Utilities for Windows (though I've never tried it). Theoretically you should be able to use the GNU Utilities for Windows just like native DOS commands (in batch scripts or directly in the command prompt).
 
Thanks for the response guys! I actually ended up finding a copy of the file in excel so all's good now :D
 
Back
Top