Why I’ll always defend my superfulous use of `cat`
I’ve come across forum posts and ##linux heads talking about how using:
cat file | grep re
…is really excessive and slow and poor and I suck.
Today, I spent about 15 minutes working on a regex to search a debug log file for the python script I recently wrote for taking argus userdata records and placing DNS data into a DB. It was crashing because I don’t handle absent records. The flows must be perfect (meaning src>dst userdata and dst>src userdata must be present). If the flows aren’t perfect, then I will get some unexpected input back, and, due to me asking for a string at an index that doesn’t exist, since there’s no data, blammo. Granted, it can be fixed with a conditional, I still wanted to understand what was going on.
AFter fiddling with regex and grep for 15 minutes, I realized that grep kept recognizing the file as binary. A ddg later and I located a stackoverflow thread with several answers… one of which was:
cat /var/log/radump2dnsdb.log | strings | grep '^.*\ s\.*$\|^.*\ d\.*$'
This will work fine for this instance (particularly because each line has a unique identifier), but David (the original asker), points out flaws related to the ability for `.*` to match binary data. The accepted answer is thorough and takes this case into account.
So, never again will I not be piping my `cat` output into `grep`, and I don’t care what you say.
By the way… my script seems to tolerate absent destination user data, but doesn’t tolerate absent source user data. Yikes.
[update ca. august 7th, 2013]
Of course, I’ve fixed the problems with the script!