• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Mail clients, with easy search inside attached files

Frick

Fishfaced Nincompoop
Joined
Feb 27, 2006
Messages
20,128 (2.86/day)
Location
norr
System Name Black MC in Tokyo
Processor Ryzen 5 7600
Motherboard MSI X670E Gaming Plus Wifi
Cooling Be Quiet! Pure Rock 2
Memory 2 x 16GB Corsair Vengeance @ 6000Mhz
Video Card(s) XFX 6950XT Speedster MERC 319
Storage Kingston KC3000 1TB | WD Black SN750 2TB |WD Blue 1TB x 2 | Toshiba P300 2TB | Seagate Expansion 8TB
Display(s) Samsung U32J590U 4K + BenQ GL2450HT 1080p
Case Fractal Design Define R4
Audio Device(s) AuraSound AS42 Soundbar | Plantronics 5220 | Sony WH-1000XM3 | Nektar SE61 | Behringer XR18
Power Supply Corsair RM850x v3
Mouse Logitech G602
Keyboard Dell SK3205
Software Windows 10 Pro
Benchmark Scores Rimworld 4K ready!
So, for various reasons I am mainly using our company's icloud mail adress for my mailing, which honestly doesn't feel right. My predecessor did it and recently our main adress (the one I use) got completely wiped (synching error in our printer, somehow) and it has just been easier to keep using it. But after the summer holidays I intend to start using the "proper" adress for all my mails, and I have been looking at some clients, but it seems Windows Mail and Mozilla Thunderbird does NOT do what I really need it to do, and what icloud does really well: Search within attached PDF files.

Any tips or ideas?
 
You want an email client program that lets you search for words or phrases in attached files, specifically, .pdf files - without opening the attached file?

Hmm, I am not aware of any email program that will do that. I know some email programs will let you search for emails that have attachments, but not within the attachment itself.

For one, the attachment would have to be readable (as opposed to an executable). And then the email program would have to know how to read the readable file format used by the file. For example, it would have to know how to read .pdf, .docx, .csv, .rtf or .txt files. That would be a challenge to program in while keeping bloat down. It might even present some security issues.

I will be interested if someone knows of a program for this. Good luck.
 
You want an email client program that lets you search for words or phrases in attached files, specifically, .pdf files - without opening the attached file?

Hmm, I am not aware of any email program that will do that. I know some email programs will let you search for emails that have attachments, but not within the attachment itself.

For one, the attachment would have to be readable (as opposed to an executable). And then the email program would have to know how to read the readable file format used by the file. For example, it would have to know how to read .pdf, .docx, .csv, .rtf or .txt files. That would be a challenge to program in while keeping bloat down. It might even present some security issues.

I will be interested if someone knows of a program for this. Good luck.

Exactly this.
 
So, for various reasons I am mainly using our company's icloud mail adress for my mailing, which honestly doesn't feel right. My predecessor did it and recently our main adress (the one I use) got completely wiped (synching error in our printer, somehow) and it has just been easier to keep using it. But after the summer holidays I intend to start using the "proper" adress for all my mails, and I have been looking at some clients, but it seems Windows Mail and Mozilla Thunderbird does NOT do what I really need it to do, and what icloud does really well: Search within attached PDF files.

Any tips or ideas?

You can try a local or online document storage solution. Something like that should be able to search PDF files. Of course that means you need to save the emails and/or attachments to those systems in order for them to be searchable.
 
I agree that will work, but how is that different from just and saving the attachment to some location, then searching through the file in the normal way? The OP is asking for a method of searching through attachments while they are still in his email client's inbox. I don't know of any way to do that.
 
I agree that will work, but how is that different from just and saving the attachment to some location, then searching through the file in the normal way?
Some software has dedicated engines for indexing and searching documents and you probably use their interface instead. Doesn't fit the OP's case exactly and could be overkill but if you have a business requirement for searching documents it might be an option to consider.
The OP is asking for a method of searching through attachments while they are still in his email client's inbox. I don't know of any way to do that.
I did a quick test it seems you can do it if you are using GMail otherwise I don't have a recommendation.
 
I did a quick test it seems you can do it if you are using GMail otherwise I don't have a recommendation.
Nice catch! I just sent myself an email with a .pdf file attached. Then, from my gmail inbox, searched all emails for a phrase I knew was in that attachment and I can confirm, A Computer Guy is correct, gmail found the correct email.
 
I did a quick test it seems you can do it if you are using GMail otherwise I don't have a recommendation.
we have a gmail business account and I'm constantly using the search function to look up names in spreadsheets or docs that I may have sent or were sent to me. I must say their search function is a major time saver.
 
we have a gmail business account and I'm constantly using the search function to look up names in spreadsheets or docs that I may have sent or were sent to me. I must say their search function is a major time saver.

This is exactly it. I can find the documents easy, but then I have to find the mail convo that spawned them.
 
One of the stupidly obvious features that everyone should imo copy of Apple operating systems :( I’ve tried a bunch of other apps but have tied myself into Apple for reasons like this

Spark takes advantage of that inline, AI search but, last I checked, they basically read your messages with AI (like Apple does) but without e2e, storing that data (presumably rather than checking it against a database), and all to the end of organizing your mail into folders without letting you know with their dumb AI (literally creates folders and rules on your ), even on paid tiers

Emclient exists but I’ve never tried it because I honestly don’t trust paid clients anymore

I know I found some command-line clients on GitHub years ago but they’re cli email clients so :(

And yes gmail as well I don’t like checking my email in a browser but there are a bunch of electron wrappers if you don’t mind that
 
If I understand correctly, you need an email client that can perform OCR (optical character recognition) for all incoming emails with PDF files. I do not think there's an email client with such built-in capabilities. As the other members pointed out, it's best to use a separate app for storing/routing/organizing/searching the content of PDFs. At the office, we use filing system software to automate that and minimize the chance of a mistake. But, of course, we deal with heaps of files per day and when it's busy, we scan up to 30 PDFs in a single hour, so without a DMS we'll cave in very soon. Plus, some of the PDFs we then send to teams in other countries, and when they return the edited files via email, the system automatically finds the right folder within our network so we always know which PDF is the latest/final version (there's a lot more to it, but if it doesn't sound like overkill to you, check out this guide). Still, we have to download each PDF so that the software can find a spot for it - that way, nothing that ends up in the spam/junk pile ends up competing with the genuine stuff...
 
If I understand correctly, you need an email client that can perform OCR (optical character recognition) for all incoming emails with PDF files.
Huh? No where did the OP mention anything about OCR. And TBH, I am not sure you understand what OCR is. It seems you misunderstood the OPs request, or you are hawking that software you linked to. :(

OCR software works with a document scanner - a piece of hardware that uses the same technologies as fax machines and copying machines to scan in printed documents and then send them electronically (fax) or make a copy of them.

The OCR is software that works with a piece of paper - NOT an emailed document - but an actual piece of paper with text already printed on it that has been scanned in with the scanner. The user puts that piece of paper in a flatbed scanner, and scans it in (like you would when faxing or photocopying something).

It is important to note that printed documents are essentially just like pictures or images. The OCR software attempts to "recognize" letters in those images and converts them into words in an editable document.

I note you said so yourself that you "scan" up to 30 PDFs per hour. And your link clearly talks about scanning "printed" documents (and includes an icon for a flatbed scanner).

The OP is not asking for that. He is asking for an email program that is able "search" attached documents that are already in text form, in this case, .pdf documents.
 
Apple Mail on macOS does indeed do this. I checked by searching on a term that is frequently mentioned in a PDF newsletter I receive but never in the e-mail message body itself.

I then tried Mozilla Thunderbird which is connected to the same IMAP account. Zero search results.

So I can confirm one MUA that does, one that does not.

Macs do not need an OCR to process PDFs. The macOS/OS X operating system itself handled the PDF document format natively from the very beginning, probably something inherited from its nextSTEP ancestry.
 
Huh? No where did the OP mention anything about OCR. And TBH, I am not sure you understand what OCR is.
The OCR is software that works with a piece of paper - NOT an emailed document - but an actual piece of paper with text already printed on it that has been scanned in with the scanner.
Apologies if my recommendation doesn't quite hit the mark, but also, I don't think it's that far off. Definitely not a reason for the tone in your response. It's unwarranted. I only shared what I know, and plus - your criticism is not exactly on point. I haven't tried all software that implements OCR to search keywords within PDF files, but I'm pretty sure that such apps can work with all of the usual formats (.doc, .jpg, Excel, and even power point) and convert them so that the user can perform a search. So, that indeed goes for emailed documents as well - as long as you have the file on your PC, you can convert it into an editable PDF, and then proceed
I note you said so yourself that you "scan" up to 30 PDFs per hour. And your link clearly talks about scanning "printed" documents (and includes an icon for a flatbed scanner).
Obviously, I myself do not scan that many files per hour, I was referring to the entire team...
The OP is not asking for that. He is asking for an email program that is able "search" attached documents that are already in text form, in this case, .pdf documents.
I explained that in the beginning of my post, I don't think there's an email client that can "comb" through incoming attachments. Again, it all sounds like OCR to me, that's why I shared my office experience; really, really didn't meant to upset anyone... Finally, there are tons of apps that can do that, no one has to stick to my post when deciding...
 
Definitely not a reason for the tone in your response. It's unwarranted.
I am sorry you got your feelings hurt, but you cannot "hear" my tone or "see" my facial expressions or body language. So any "tone" you are interjecting is being put there by you based on your own "misperceptions" and obvious biases.

I was simply stating in a matter of fact, frank manner, which IMO, is appropriate for technical discussions. You appear to have gotten your feelings hurt by misconstruing what was said, and reading into the text words NOT said. :( Again, apologies your feelings were hurt, but that is what is unwarranted. So I would appreciate you not interject things not actually said, and just go by what is actually said. Thank you.

The OP stated what he wanted. To ensure I didn't make "unwarranted" assumptions, I rephrased his request to clarify and to confirm we were on the same page. He confirmed we were by replying, "Exactly this".

But you went on your "pitch" for OCR software - and you clearly are still trying to pitch it.

The OP does not need "OCR" software. PDF files already consist of characters that can be recognized, even if the file has been locked by the author and cannot be edited.

Frich stated he simply wants an email client that will allow him to search within email attachments. A Computer Guy correctly noted that gmail allows this. Again, no OCR software required.
 
I’m glad leeantheone is so generous with their patience but yeesh

Anyway just want to add that all pdfs are not searchable. Adobe and other pdf readers, many AIs, like the ones used in Apple’s operating systems I mentioned above (and gmail as others have) or the ones used with digital images on traffic lights to read license plates, and many other technologies use OCR to extract text from all sorts of digital content, whether it’s a pdf or a digital image (in fact, even scanners and fax machines rely on a digital image to function — what else is the OCR reading from?). OCR has been an essential feature to how pdf software functions for a long time.

/OT
 
Last edited:
We are still not on the same page and I'll accept a big chunk of the blame here.

(in fact, even scanners and fax machines rely on a digital image to function — what else is the OCR reading from?).
Yes, scanners and fax machines rely on digital "images". That is 100% true.

But the purpose for OCR software is for the computer (not scanner, not fax machine) to view that scanned-in "image", look for and "recognize" images of "text characters" (letters and numbers) and convert those images (pictures of letters and words) into a text document that can then be edited by a word processor or similar program.

The OCR software is not reading from the scanner or fax. It is reading the scanned image in the computer's memory.

OCR has been an essential feature to how pdf software functions for a long time.
Not really. OCR is old. PDF is relatively new. OCR and PDF software are totally different and unrelated. You can use OCR software to look for text images in a .pdf file, but the point is, it must be scanned-in first as an image. Then characters must then be recognized, then converted into an editable text file.

Images of license plates are first recorded with a camera. That's an important fact.

OCR software was created at least 20 years before .pdf.

We are in the same chapter of the same book, but not the same page.

Anyway just want to add that all pdfs are not searchable.
That depends on how the .pdf file was created. "IF" the .pdf file was originally created by scanning in the document with a scanner/fax machine, or via a camera, then you need OCR software to convert the text "images" (bitmaps) into "real" text.

If the .pdf file was created using Adobe Acrobat, Google Docs, or Microsoft Word, then the file can be searched without using an OCR program.
 
I mean, this is all off-topic, but seems like OP has checked out, so… I dunno, not like this matters, either — really not sure what the point of this exercise is :oops:

The OCR software is not reading from the scanner or fax. It is reading the scanned image in the computer's memory.
This was true some time ago, but nowadays there are all sorts of portable document scanners and “pens”/“markers” that both scan and convert to text in one device. Even if we were to refer to things as far back as the 70’s, those devices were often computers, too.
Not really. OCR is old. PDF is relatively new. OCR and PDF software are totally different and unrelated. You can use OCR software to look for text images in a .pdf file, but the point is, it must be scanned-in first as an image. Then characters must then be recognized, then converted into an editable text file.
IDK what the point is here, and wonder if companies like Adobe would disagree with you on their being unrelated. I used to scan a lot of books for high school and college debate teams, and it wasn’t until the late 2000’s (acrobat 9?) that Adobe integrated OCR technology, which was a lifesaver for me and anyone who scanned doscuments.
Images of license plates are first recorded with a camera. That's an important fact.
I said as much? I think this is an important fact because AI can pull text from images using OCR. Why are you emphasizing this point?
OCR software was created at least 20 years before .pdf.
IDK about that, at least if you mean software as in software that runs on a PC, which isn’t a distinction you were making before.


Earlier you had said that OCR is not used to parse text from emails containing documents/images. My point was merely to say that this is untrue, AFAIK. In my understanding, apps like gmail and Apple’s “live text” feature use OCR in exactly this fashion. Adobe was more blunt about it, although admittedly not using AI, calling the feature to convert PDFs that contained text that were not created by text editors into text OCR text recognition.
 
Back
Top