Convert an html table to excel spreadsheet

A quick and dirty post for a quick and dirty technique that I used to get the Oracle CPU Database Risk Matrix into a spreadsheet. This won’t always work (the html table I was interested in happened to have the data in the same line of html as the tag), but it’s a start.

  • Do ‘View Source’ to get the html for the table.
  • Find the table
  • Cut and paste the table into a text file. I called mine “risk_matrix_raw.txt”
  • Create an awk file (mine was called risk_matrix.awk) with the following lines:
/<tr>/{printf("\n")}
/<td/{printf("%s~", $2)}
  • awk the text file, then pipe into sed to remove some of the tags
awk95 -F">" -f risk_matrix.awk risk_matrix_raw.txt |^ sed -e "s/<\/tr//g" -e "s/<\/td//g" \> risk_matrix.txt
  • Open the .txt file in Excel. When I did this, Excel asks whether its a fixed width or a delimited file. I specified that the file was delimited with a ‘~’ and Excel opened it.

Note (April 2020): for some reason in the original version of this post I had the code in .jpg files. Mea culpa. I’ve typed the code in, but not been able to test as yet, so, just for reference here are the original .jpgs

Awk commands Sed command

Contents