I’ve been trying to drag useful data out of MS Excel files without having to actually install Windows somewhere – and preferably without having to work with Excel. I discovered that Excel will export files as XML. The schema is painful, but, after a bit of working with XML::Simple, I was able to get the data into a more useable format.
This isn’t a finished product by any means, but I thought it might be useful to someone as a starting point, so that you don’t have to plough through the XML schema yourself:
#!/usr/bin/perl
use XML::Simple;
my @cols;
my $x = XMLin('AcademicTables.xml'); # Name of file
my $w = $x->{'Worksheet'};
foreach my $worksheet (@{$x->{'Worksheet'}}) {
next unless $worksheet->{'ss:Name'} eq 'Academics_Depts'; # Name of worksheet
@rows = @{$worksheet->{'Table'}->{'Row'}};
# First row - Get the column names
$firstrow = shift(@rows);
foreach my $cell (@{$firstrow->{'Cell'}}) {
push @cols, $cell->{'Data'}->{'content'};
}
# The the rest of the rows
foreach my $row (@rows) {
@values = @{$row->{'Cell'}};
foreach $col (@cols) {
$c = shift @values;
$val = $c->{'Data'}->{'content'};
$val =~ s/n{2,}/<br />/g; # Remove multiple newlines
print $col . " -> " . $val . "n";
}
}
}