The regression coefficients are computed only from data in the training period

We compute MAE as the average absolute distance between estimated temperatures and their corresponding ground truth values. Finally, we evaluate the efficacy of smoothing the training data prior to performing regression. We investigate rolling mean, minimum, and median smoothing methods. In our experiments, rolling mean produces the smallest error for the datasets we investigate. We thus report results using only this smoothing technique, for brevity. To implement rolling mean, we use a window of size w and replace each element with the mean value of the previous w elements including the current element. In our experiments, we use four RPi-based, single board computers deployed outdoors as described in Section 4.3.1. We denote the processor temperature measurements from each as CPU-1, CPU-2, CPU-3, CPU-4. We refer to the outdoor temperature measurements from a nearby Weather Underground station as WU-T. The goal of this evaluation is to illustrate the degree to which it is possible to make an accurate prediction of outdoor temperature based on a combination of CPU temperature measurements and temperature measurements from the Weather Underground station. In this study, “ground truth” – the true outdoor temperature – comes from DHT22 sensors connected externally to each RPi. We do not use the measurements from the DHT22 sensors in any prediction. However,hydroponic vertical farming systems we use them to determine the mean absolute error between a prediction based on CPU and WU-T values and ground truth as established by the DHT value and thereby determine our prediction accuracy. Our RPis are equipped with a 1GHz ARMv7 processor, 512MB memory, 32GB of SSD storage, and Wifi communication.

All the temperature readings in the experiments are reported in degrees Fahrenheit. Since the matrix of comparisons is symmetric, we only show values in the upper triangle. For frost prevention, the application is attempting to determine when a small difference in temperature between warm air aloft and colder air near the ground will result in frost avoidance if the air is mixed. Specifically, large wind machines move the warm air downwards to raise the temperature enough near the ground to prevent frost from forming. The temperature differences are on the order of a few degrees Fahrenheit putting a premium on accurate measurement. The baseline in Table 4.4 shows the errors that result when each temperature sensor is used directly to predict another. That is, it is the “worst-case” prediction in the sense that it includes no prediction mechanism – only the raw data. In order to provide a more accurate prediction of local temperature based solely on the devices’ CPU temperatures and the nearby weather station, we combine multiple linear regression with smoothing. We hypothesize that the relationship between outdoor temperature and nearby CPU temperatures measured at the same time is linear. Further, particularly if one or more of the CPUs are loaded, we use one-dimensional smoothing of the CPU temperature series to improve the “signal” from the CPU temperature sensor. For the regressions, the explanatory variables are a subset of CPU and a weather station temperature , as indicated at the top of each results tables. Also, when smoothing is performed, we indicate this in the table header. In each case, we separate the experimental period under study into a “training” period followed immediately by a “testing” period. We then use the coefficients for the entire duration of the testing period. Table 4.5 shows the MAE between the temperature that our method predicts and the outdoor temperature for two “ground truth” sensors – DHT-1 and DHT-3 – using two separate subsets of explanatory variables for each. On the left hand side of the table, we show the MAE when predicting DHT-1 using CPU-1 alone and also when using all CPUs and WU-T .

On the right hand side of the table, we show the same results for DHT-3 using CPU-3 in the univariate case. The experiment start date is Aug. 25th. For all experiments, we use a training window of 72 hours . As mentioned in section 4.3.2, we use MAE as our measure of accuracy since it captures the “distance” between the predicted temperature and the DHT-measured temperature. It is this distance that concerns farmers who are deciding on whether to trust their crops to the methodology. Note that columns CPU-1 and CPU-3 under the Original column show values corresponding to results based on univariate linear regression. Note also that we highlight the minimum and maximum MAE in each column using boldface type. When predicting DHT-1, we observe that errors from univariate regression using only the CPU temperature from Pi1 are in the range from 0.45◦Fto 0.85◦F. MAE for multiple linear regression with CPU temperatures from all four devices and a nearby weather station data range from 0.32◦F to 0.81◦F. When predicting DHT-3 from its Pi3’s CPU sensor deployed in a similar manner we observe MAE values between 0.32◦F to 1.68◦F . MAE decreases to a range from 0.32◦F to 1.26◦F when we introduce multiple linear regression . Note that even though the setup is similar , the readings are influenced by other environmental factors . We find that multiple linear regression which includes CPU and nearby weather station temperatures as its predictors reduce prediction error. For DHT-1, the minimum error decreases from 0.45◦Fto 0.32◦Fwhile the maximum error decreases from 0.85◦Fto 0.81◦F . For DHT-3 the minimum error is 0.32◦F for both columns while the maximum error decreases from 1.68◦F for CPU-3 to 1.26◦F for All. If we compare errors per test window length, we note that for DHT-1 all errors but for the 2 weeks test window were reduced and for DHT-3 all errors but for 1h test window were reduced .These results indicate that it is possible to make predictions with an average absolute error of under 1 ◦F that requires infrequent model refitting using a combination of CPU and weather station data. Indeed, the accuracy of DHT22 sensors is approximately 0.5 ◦F. Thus this methodology is approaching the limit of accuracy that is possible using DHT22 sensors as ground truth. Under 1 ◦F is acceptable for frost prevention where current manual methods use measurements in the 3 ◦F range.

For the smoothing results in Table 4.5,vertical grow rack each value in the training period is replaced by the average of the 6 preceding it in the period . When comparing the All column from Original and Smoothed columns, we observe that the smoothing decreases the mean absolute error from the range of 0.32◦F to 0.81◦Fto the range of 0.28◦F to 0.69◦F . Similarly, for DHT-3 prediction, the MAE goes from the range 0.32◦F to 1.26◦Fto the range 0.20◦F to 1.24◦F .CPU temperatures are correlated with the CPU load Moore et al. , Haywood et al. and while the CPUs are idle for much of the time in our setting temporary computational load at the time of temperature recording may influence the prediction error . We next analyze the effect of the CPU load on the temperature prediction error. Out of the four devices that we consider, we keep Pi2 and Pi4 unloaded and add hourly jobs to Pi1 and Pi3, which increase the CPU load by encrypting and copying a 1GB file on Pi1 and a 512MB file on Pi3. Figure 4.4 illustrates CPU temperature measurements from Pi1 with hourly spikes due to the load. The load testing for Pi1 and Pi3 started mid September and we use September 20th as a test start date. Note that Pi2 and Pi4 have no artificial load and are kept idle for comparison. We observe that, compared to the August test, all four Pi’s show smaller errors on average, however, we omit these averages for brevity. Table 4.6 shows the MAE for predicting DHT-1 and DHT-3 based on different sets of explanatory variables for different duration of the test window , while both Pi1 and Pi3 are loaded. For predicting DHT- 1 based on CPU-1, we observe MAE in the range of 0.71◦F to 0.85◦F and for the DHT-3 of 0.65◦F to 0.78◦F. The effect of the CPU load is more pronounced in univariate prediction. Moreover, this effect is mitigated when we include nearby devices’ CPU temperature measurements. Including nearby devices in the DHT-1 prediction results in MAE in the range of 0.49◦F to 0.58◦F for DHT-1 and in the range of 0.39◦F to 0.53◦F for DHT-3. Similar to the results for the unloaded experiments, when the CPUs are loaded we also observe improvement in prediction error when we apply smoothing, as shown in Table 4.6. The two columns show MAE for DHT-1 and DHT-3 temperature prediction with the same smoothing technique explained earlier . Note that this type of smoothing is computationally simple enough to be performed on each device.

We observe that for any length of test window the error when all the predictors are used is smaller than when any single predictor counterpart is used: CPU-1 for DHT-1, and CPU-3 for DHT-3. With smoothing, the prediction MAE decreases from the range of 0.71◦F to 0.85◦F to the range of 0.36◦F to 0.54◦F for DHT-1, and from the range 0.65◦F to 0.78◦F to a range of 0.32◦F to 0.50◦F for DHT-3. While not strictly lower or higher, these results are similar to the results for the unloaded case. We conclude that, using a combination of multivariate regression and smoothing, it is possible to obtain high degrees of prediction accuracy regardless of whether the CPU is loaded or not.To account for the possibility that the specified time frame may have influenced the results , we show comparative results for the September time frame for loaded and unloaded experiments in Figure 4.5. The data shown in this figure is taken during the same period as the results shown in Table 4.6. That is, we use the 72-hour period ending on September 20th, 2018 as a training period and the remaining time as a test period . The bars in the figure corresponding to CPU-1 and CPU-3 show the same data as in Table 4.6 from the Smoothed All columns. For comparison, we show data for two other CPUs – CPU-2 and CPU-4 – taken at the same time, again using smoothing and all explanatory variables in each regression . Figure 4.5a shows the comparison when only the CPU directly attached to the DHT is used as a single explanatory variable . In Figure 4.5b, we show the results when all explanatory variables are used to predict each DHT. In Figure 4.5b, the maximum MAE observed in any experiment does not exceed 0.54◦F across all CPUs, DHTs, and load patterns. These results indicate that the methodology is robust with respect to typical loads that the CPUs may experience in our IoT setting. Comparing Figure 4.5a to Figure 4.5b shows that multivariate regression improves accuracy across all DHTs and load patterns. In addition to the two dates in August and September, we observed very similar error rates when testing during different seasons . This is illustrated in Figure 4.6 where we predict DHT-1 temperature for different days from April to December. April 20th has a higher error because Pi3 and Pi4 were not yet deployed and thus their CPU values were not available as features. December 7th had variable weather conditions with alternating rainy and sunny days, which may have contributed to a somewhat higher MAE. However, even so, the MAE for most of the days it was less than 1.25◦F.We also tested the accuracy of the model when there were changes in precipitation. From a time series perspective, precipitation could constitute a change-point in each temperature series . Table 4.7 shows the comparison of errors when training and testing periods had different levels of precipitation. For each column, the training period was 3 days and the test periods listed go from 1h to 3 days. In the first column, both training and testing days were without any precipitation . In the second column, we show the effects of training using rainy days to predict the temperatures during sunny days. December 4th, 5th, and 6th were rainy days with 2.54, 1.27, and 1.27 inches of rain respectively followed by three days without precipitation that were used for testing the model.