Dakons blog

Erstellt: 21. 12. 2008, 22:36
GeƤndert: 20. 6. 2011, 08:15

Use the tools, Luke^H^H^H^HJohan

Tags:

I just read Johan's post about Qt environment variables and wondered if there was a better way to find these. Johan, this is really not meant as any attempt of blaming you for anything. It's just sort of a hobby for me to do things a effective way and you just ran into my sight ;) Really, no pun intended. His original attempt was:

grep getenv -r src/ | grep cpp\: | grep -o "\"[A-Z_0-9]*\"" | sed 's/"//g'

So for those not familiar with the options of those tools: search recursive in "src" directory for everything containing "getenv". As grep then writes "filename:matching line" take only those lines with "cpp:", then cut out the term in quotes and remove them.

Ok, my first idea was: why not simply limit the search to cpp files?

find src -name \*.cpp -print0 | xargs -0 grep getenv | grep -o "\"[A-Z_0-9]*\"" | sed 's/"//g'

The "-print0" makes find limit all matches by a 0 character instead of whitespace so this would also work for file names containing whitespace and other funny characters. Ok, next point: sed is a very powerful tool to do text editing but we really only want to delete a special class of characters. tr steps up for rescue:

find src -name \*.cpp -print0 | xargs -0 grep getenv | grep -o "\"[A-Z_0-9]*\"" | tr -d '"'

Ok, but wouldn't it be better to save a complete invocation of grep? Let's try this:

find src -name \*.cpp -print0 | xargs -0 grep -o 'getenv.*"[A-Z_0-9]*"' | sed 's/[^"]*"//;s/"$//'

Now the whole getenv-and-name things is matched at once and simply everything up to the first quotation marks and a quotation mark at the end of a line is deleted. There are two other things to make your life easier: I used ' instead of " for the grep arguments to get rid of the need to escape " in the grep expression and I also fed two sed commands to sed at once using ; as command delimiter.

If someone now steps up with find -exec: that would call the given program for every file, this way grep would be run many many times. The example above has only 4 program invocations: find, xargs, grep, and sed. That looks enough for now, has someone an even better idea?

So, writing this post took probably 10 times longer than Johan's initial program run so I don't know who is really less efficient today ;) But maybe it will show one or two tricks to someone reading this post.

A, one last idea: we can make the pattern matching in sed a bit simpler:

find src -name \*.cpp -print0 | xargs -0 grep -o 'getenv.*"[A-Z_0-9]*"' | sed 's/"$//;s/.*"//'

Now I remove the " at the end of the line first. After that I know that there is only one " in the line: that at the beginning of the environment variable name. So I simply remove every character up to the ". Before that I needed to check to delete only stuff up to the first quotation mark so I wouldn't delete the whole line.

Anbieterkennzeichnung