MA5840 Data Science and Strategic Decision Making
Efficient Markets Hypothesis
Background Information
The Efficient Markets Hypothesis (EMH) claims that future stock prices cannot be predicted from past stock prices, and that the asset management industry is a scam.
There are three versions of the EMH:
Weak form future stock prices cannot be predicted from publicly-available information, but perhaps secret information can help predict future prices (for example,insider tradingmight be profitable).Semi-strong form stock prices adjust rapidly when new public information becomes available, so that if you are faster than everyone else at processing new information, perhaps you can predict the future stock price.Strong form even secret, non-public information cannot help you to predict future stock prices, because there so many other people with this supposedly secret information are doing illegal insider trading, and prices react instantly.
In this assignment, you will use real-world price data to investigate theweak formof the EMH. Price data is available for several companies whose shares trade on stock exchanges in the United States. The idea is for each student to pick a different company, and then investigate if the weak form of the EMH looks plausible for the company they have picked.
Data
Each company has a separate file. Each row is a date, and each column has price information at the 4pm close of the stock exchanges in New York on that date.
The first column is the date of the closing prices of many financial instruments.The last column (furthest to the right side) has the future change in price of the company for that file, showing how the price has changed overnight between 4pm on the date of the line, to the 9:30am opening price on the next trading day.All the other columns between the first and last represent price information for many financial instruments, as they were at the 4pm closing time, including:Currency exchange rates for the AUD, EUR, GBP, JPY, and CAD.Interest rates on U.S. Treasury bonds, which will mention the word year, between 1 year and 30 years, depending on when the bond matures.Commodity prices, including oil, minerals such as gold, and agricultural commodities such as sugar and corn.The share price of the company for the file, as well as the share price of other related companies, such as competitors or suppliers.And a few others, such as the Baltic Dry Index (which measures the cost of shipping commodities).The dates go back a little over a year, and some dates are excluded, such as long weekends, the week between Christmas and New Year, and a few others.
All this price data has been normalized.
If the column represents a price (or the product of two different prices), then 0 means the lowest price and +1 means the highest price, during the time period that the file covers.If the column represents achangein price, then -1 means the biggest decrease and +1 represents the biggest increase, with 0 meaning the price did not change.
Procedure
First, sort the data by the last column (the future change in price). If there are any days where the price change was zero, we can delete those entire rows.
Put in a few empty rows between the days when the price change was positive and the days when the price change was negative, so that we have two groups of data: one group where all days have the price going up, the other group where all days have the price going down.
Then copy-and-paste the first row with the column descriptions so that both groups have the column descriptions.
Now we look for differences in the input columns of the two groups. If the Efficient Markets Hypothesis is really true, then the inputs should be randomly scattered for both groups, with the similar means, similar standard deviations, and graphs should look similar. This might include:
Summary statistics for each half-column of inputs: is the mean and median of each input about the same for both up days and down days?
Even if the mean and medians are the same, what about measures of spread, such as standard deviation?
How about looking at the distributions? For each input column, does the distribution of price information look the same for both up days and down days? Do some columns have different distributions, or do they all look like the same bell curve shape?
If there is an input where the mean of the up days looks different from the mean of the down days, try doing a 2-sample hypothesis test. Does it conclude that there is a significant difference?
And otherwise sniff around for clues to see if things are random like the Efficient Markets Hypothesis claims, or if something is not random.
Based on the evidence that you discover, come to a verdict that is (in your judgement) the most likely truth, given the balance of the evidence:
If it all looks completely random on every input, then maybe the Efficient Markets Hypothesis is correct. But then how can we explain that Warren Buffet got so rich by predicting which companies would have their stocks go up? Or is that just a coincidence, and maybe asset management is a scam?
If something looks non-random, then maybe the Efficient Markets Hypothesis is not entirely correct (i.e., maybe its usually correct, but on some days for some companies not everything is random). For the non-random thing that you have found, what days did it happen, and / or for what inputs? (The words in the input header say what it measures)
The idea is that youre like a detective, looking for clues, and if you find a clue then thats evidence that will be put before a judge.
Deliverables
Each student is required to submit two files:
A Microsoft Excel file, which should include all the summary statistics, graphs, and other calculates that you made, and;A document file, with your description of what you discovered. Do please copy-and-paste any relevant graphs or other pieces of the spreadsheet into your document, so that its easy to follow. Include references if you refer to anything other than the price data.