Thursday, September 1, 2011

The greatness of CSV

CSV - comma-seperated values - is a simple file-format you all should know. It just consists of values, seperated by commas (or tabs, or whatever), with linebreaks seperating, well, the different lines (or records, or whatever your data may look like). You probably already knew that.

Now, let's suppose you have some data, in a CSV table. Let's also suppose the first line contains the names of the columns, like that:

"No.", "Name", "Price"
1, "Chocolade", 0.99
2, "Noodles", 0.39

and so on. As you see, really simple. Actually, why do i feel like writing about this stuff, it looks that simple you wouldn't think it's interesting to talk about it altogether. I mean, there are things like JSON nowadays...

The reason is, i found you can do an incredible lot with those simple CSV-tables.

1. easy to generate, easy to parse, thus also easy to transform in other formats
2. easy to combine, if you have several matching tables. A quick sed '1d' file1 >> file2 does the job
3. stripping the header, it is also easy to sort the lines using the sort command (see manpage)
4. import in programs like Excel, OpenOffice Calc, etc.
5. direct plotting with gnuplot http://www.gnuplot.info/docs_4.2/gnuplot.html#x1-17200043.14
6. many database programs support CSV as data source, so SQL querying is also an option. I haven't found a command-line tool to integrate SQL-Queries in a scripting workflow, but writing such a tool is not the biggest effort.(there are open-source CSV-database drivers or similar out there, for example h2database has commands to read from and write to CSV-files)
7. as i wrote here, there are also tools to directly transform a CSV-table to a LaTeX-table.
8. data-mining applications like Weka...
9. you name it ;-)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.