Getting the S&P 500 From Wikipedia with Python

Most of my strategies in the market are pretty basic – I don’t really like technical indicators but will use them to rank possible positions here and there. One of the best ways to reduce risk is to only look at the “good ones” so that even bad calls aren’t as bad as they could be.

My favorite way to do this is to only pull from the S&P 500 since they do a lot of the legwork for me. It’s nice having a large, revolving list of winners that get pruned as they start to falter. You won’t find any secret winners this way but you can be pretty sure you won’t find any major duds. Being in this group also promises quite a bit of liquidity which is always helpful.

Wikipedia

Back in the day (lol) I had a bit of trouble finding this list when I was pulling all the symbols from the NASDAQ ftp server until I stumbled on Wikipedia’s page. They maintain a list of the companies that make up the S&P 500 with some extra details like category and date added, all in a nice table format that plays nicely with a pandas data frame.

Wikipedia with the list in a nice table format

I’m sure there are easier ways now to get this list but the function I wrote a couple years ago still works and I haven’t had a need to update it. Reading the HTML produces a list of tables as dataframes where the first table is the one we care about.

Super straightforward, hope it helps if you want to stop using an API service. This obviously can change at any point but is easy enough to cache and put a check in – day old data probably isn’t the end of world for this kind of list. If it is, well, you get what you pay for.

Leave a comment