Analyse my telphone bills (in pdf)

on

Ok. next. My mobile provider (starts with V and ends with fone and inbetween the string oda) hand me the monthly bill in pdf format. That’s nice. But. The give no summarized insight in my used minutes, used mb’s etc. Instead they show pages and pages and pages of used mb’s, kb’s in some amount of time.

Just a few lines:

3.906,00 KB14.09.2011 16:00:11          0,000                      high.vodafone.com
6.026,00 KB14.09.2011 16:15:11          0,000                      high.vodafone.com
3.951,00 KB14.09.2011 16:30:11          0,000                      high.vodafone.com
2.813,00 KB14.09.2011 16:40:26          0,000                      high.vodafone.com
T       235,00 KB15.09.2011 07:05:37          0,000                      live.vodafone.offnet
T       442,00 KB15.09.2011 15:59:34          0,000                      live.vodafone.offnet
T     1.708,00 KB16.09.2011 08:13:43          0,000                      live.vodafone.offnet
1.357,00 KB16.09.2011 08:28:43          0,000                      high.vodafone.com
T       293,00 KB16.09.2011 16:44:31          0,000                      live.vodafone.offnet
T       231,00 KB16.09.2011 20:44:44          0,000                      live.vodafone.offnet
T       246,00 KB16.09.2011 22:36:17          0,000                      live.vodafone.offnet
T       267,00 KB19.09.2011 11:15:20          0,000                      live.vodafone.offnet
T       312,00 KB20.09.2011 18:03:01          0,000                      live.vodafone.offnet

And this goes on and on for pages. Same for the used call-minutes. The call this transparancy. Ok. In dutch ‘Ik ben niet voor een gat te vangen’.

So I pick up my command line. First I convert this pdf to an ordinary text file with an in Ubuntu standard installed tool ps2txt.

This text file now will be used for further analysis. Let’s sum the called minutes. These lines can be identified by this format:

02.09.2011 18:03:09 31999999999 00:04:11          0,000 T
05.09.2011 12:11:55 31999999999 00:00:52          0,000 T
05.09.2011 17:57:51 31999999999 00:00:07          0,000 T
I need those time’s. Let’s grep them.

egrep ‘^ *[0-9]{2}\.[0-9]{2}\.[0-9]{4}.*[0-9]{10}’ 24-10-2011_349243385_00655797027.txt | sed “s/  */ /g”| cut -f 5 -d ‘ ‘