miércoles, 16 de agosto de 2017

guerrilla csv and xlsx

I like to have a huge toolbox so that I can always find the right tool to do any task. But I'm also a big fan of composability, and orthogonality.  So it's a bit like vim vs emacs, or small languages vs big languages, or scheme vs CL, or Python vs Perl.

On the command line, I also like to find tools that compose.  Although pipes and xargs are the way to compose commands, the interfaces have to be compatble, by using stdin/stdout, or file names ( <() comes to the rescue by helping with the plumbing).

So today I had to count the appearances of a given word in different xlsx files. Each xlsx had many sheets, and we only want to count the appearances in column 9.  It was kind of a checksum to make sure that all appearances of  $KEYWORD were still there.  So, task the task is:

Aggregate counts of appearances of 'keyword' in the ninth column in all sheets of each one of those excels.  Get the sum per file.

Apparently, after 5 minutes of typing in trance, this did the trick.

for i in **/*xlsx ; do echo $i ; csvfix write_dsv -f 9  <(xlsx2csv.py --all $i ) G 'keyword' WC ; done


We can't get much further with debugging this. The pity with these kind of approaches is that they either solve your problem in the first shoot, or it gets exponentially difficult to treat for special cases, or add debugging info.

I got to, at least, compare the results themselves using vimdiff.

vimdiff <(csvfix write_dsv -f 9  <(xlsx2csv.py --all file1.xlsx ) G 'keyword') \
        <(csvfix write_dsv -f 9  <(xlsx2csv.py --all file2.xlsx ) G 'keyword')


This is totally not rocket science, but I love the feeling of power and accomplishment you get when this magic incantations work.  You run that, you get the result, you use the result, and you throw the whole thing away.

And you keep doing what you were doing.  Or go write a post about that.


No hay comentarios: