Ok. next. My mobile provider (starts with V and ends with fone and inbetween the string oda) hand me the monthly bill in pdf format. That’s nice. But. The give no summarized insight in my used minutes, used mb’s etc. Instead they show pages and pages and pages of used mb’s, kb’s in some amount of time.
Just a few lines:
3.906,00 KB14.09.2011 16:00:11 0,000 high.vodafone.com
6.026,00 KB14.09.2011 16:15:11 0,000 high.vodafone.com
3.951,00 KB14.09.2011 16:30:11 0,000 high.vodafone.com
2.813,00 KB14.09.2011 16:40:26 0,000 high.vodafone.com
T 235,00 KB15.09.2011 07:05:37 0,000 live.vodafone.offnet
T 442,00 KB15.09.2011 15:59:34 0,000 live.vodafone.offnet
T 1.708,00 KB16.09.2011 08:13:43 0,000 live.vodafone.offnet
1.357,00 KB16.09.2011 08:28:43 0,000 high.vodafone.com
T 293,00 KB16.09.2011 16:44:31 0,000 live.vodafone.offnet
T 231,00 KB16.09.2011 20:44:44 0,000 live.vodafone.offnet
T 246,00 KB16.09.2011 22:36:17 0,000 live.vodafone.offnet
T 267,00 KB19.09.2011 11:15:20 0,000 live.vodafone.offnet
T 312,00 KB20.09.2011 18:03:01 0,000 live.vodafone.offnet
And this goes on and on for pages. Same for the used call-minutes. The call this transparancy. Ok. In dutch ‘Ik ben niet voor een gat te vangen’.
So I pick up my command line. First I convert this pdf to an ordinary text file with an in Ubuntu standard installed tool ps2txt.
This text file now will be used for further analysis. Let’s sum the called minutes. These lines can be identified by this format:
02.09.2011 18:03:09 31999999999 00:04:11 0,000 T
05.09.2011 12:11:55 31999999999 00:00:52 0,000 T
05.09.2011 17:57:51 31999999999 00:00:07 0,000 T
I need those time’s. Let’s grep them.
egrep ‘^ *[0-9]{2}\.[0-9]{2}\.[0-9]{4}.*[0-9]{10}’ 24-10-2011_349243385_00655797027.txt | sed “s/ */ /g”| cut -f 5 -d ‘ ‘