However, functional potential remained stable in rhizosphere along the chronosequence, evidencing that the soybean root system selects microbial taxa via trade-offs, to keep functions in the rhizosphere microbiome over time. Further studies deciphering the ecological processes regulating plant–microbiome assembly and the causality nexus across the plant–microbiome–soil continuum may enable researchers to gain insights about plant bioengineering and soil microbiome modulation, with consequences for clean food production and ecosystem services resilience.Recent technological advances have driven down the cost of off-the-shelf compute power, storage, and network bandwidth, and have simplified large scale resource use . As a result, applications have become datacentric and the resulting data resources and products have grown explosively in both number and size. Moreover, the environment in which we live has become increasingly accessible via sensor, observation, and monitoring systems that produce and collect vast amounts of data from ordinary objects, which comprise the “Internet of Things” . IoT applications attempt to extract actionable insights from this data to drive innovation, forge new industries, and facilitate new scientific discovery across society and our economy, e.g. in health care, manufacturing, education, transportation, the military, agriculture, the energy sector, and manufacturing, among others. In order to provide actionable insights in time, IoT developers rely on increasing connectivity and wast amount of data being uploaded by IoT devices in real time,equipment for vertical farming making it available for data analytics services from a cost-effective cloud. At the same time, a dynamically changing and heterogeneous set of IoT devices with their communication protocols, and power/battery requirements makes software integration, deployment, and maintenance more challenging given the scale and energy requirements.
Moreover, because IoT applications often exhibit significantly better locality than their e-commerce or social networking counterparts, they leverage “multi-tier” execution environments to achieve low latency response. Specifically, devices increasingly communicate with “edge computing” infrastructure that is “nearby” in terms of network transfer latency.Between these two tiers, Content Distribution Networks , regional clouds, private clouds, etc., provide computing, networking, and storage services for IoT applications. This tiered hierarchy facilitates efficient and cost-effective network use, localized decision making, controlled data sharing, and real-time and low-latency response for edge devices and services Elias et al. , Krintz et al. , Foukas et al. , Satyanarayanan Particularly exciting is the potential for multi-tier IoT systems to have large scale societal impact by providing the next generation of digital technologies for optimizing agricultural and food security outcomes. Increasing yields sustainably while using fewer resources is essential to ensure that food production is not out-paced by population growth. However, smallholder agriculturists and their rural communities are strikingly under-served by technology, with few solutions becoming commonplace USDA . Thus, new advances in IoT systems research are critical for IoT in general, and agriculture in particular, if we are to achieve the transformational societal impact that IoT promises. The goal of our work is to provide such advances using a problem-driven approach that is motivated by the needs of growers and agricultural operations. Current precision agriculture technologies fall short in three key ways that have severely limited their impact and widespread use: they fail to provide growers with easy management and control over their data, they lock growers into proprietary, closed, inflexible, and potentially costly technologies in order to extract actionable insights from their farm and sensor data, and few systems facilitate cross-vendor sensor and data integration.
In addition, many farms have slow, intermittent, or no connectivity to the cloud, precluding the use of cloud only solutions. As such, growers and farm consultants require new technological advances that provide them with automated farm data management, data-driven recommendations and decision support, and low-cost, integrated sensing in a uni- filed, easy to use platform that works with or without Internet connectivity. To answer this question, we investigate the design, implementation, and deployment of an end-to-end, multi-tier system for IoT applications in agriculture. Our system, called Hypatia bridges the gap between sensors and cloud to provide automatic, low-latency deployment of IoT applications across edge and cloud tiers. In addition, it can operate at the edge when the connection to the cloud is intermittent or non-existent. Hypatia automates ingress of sensors and data sets, exports services for common machine learning and data analytics capabilities, enables users to tailor the system to their analytics and visualization preferences, and integrates scoring metrics which it uses to automate model selection and to provide decision support and a recommendation system for users. To enable this, we investigate three novel advances that target key challenges from the agriculture sector. First, we investigate the necessary cloud service support required for clustering multivariate and highly correlated data collected . We use the system to study farm soil electrical conductivity . EC is fast and inexpensive to repeatedly measure , Lund et al., and can be used to identify management zone boundaries Moral et al. , Fortes et al. , Corwin & Lesch and to estimate the number of different soil metrics including salinity, water holding capacity, and texture Bell et al. , Kitchen et al. , Adamchuk et al. . The service simplifies the selection of analytics parameters from which growers choose, and provides a recommendation of the best variant that can be easily understood by experts and novices alike. Moreover, the service enables users to visualize the data and results in multiple ways. Second, we investigate how to extend the capabilities of on-farm sensing. To enable this we use integrated, on-farm sensors to estimate the measurements of other sensors, a technique that we call sensor synthesis.
Since the number of sensors per device is generally bounded by design constraints , sensor synthesis makes it possible to free up resources in IoT devices for other sensors, particularly those that are less amenable to synthe-sis, and to reduce the monetary cost of sensing. We apply sensor synthesis to measure micro-climates across a farm . Microclimate data is useful for more precise application of water and frost control. To enable this, we estimate outdoor temperature using the processor temperature of simple and low-cost single board computers deployed in outdoor settings. We combine data smoothing techniques and multiple linear regression methods, which we apply to nearby SBC processor and weather station data. We empirically evaluate this approach using a wide range of experiments and we investigate its accuracy with and without the computational load on the SBCs. We find that we can accurately estimate microclimate temperature from combinations of nearby devices on-farm, thereby reducing the number of temperature sensors required to capture temperature variation across a field. Finally, we develop a new scheduling system that automates distributed deployment of data analytics applications across IoT tiers . The scheduler accounts for both computation and communication of the applications and automatically splits the execution between edge systems and cloud computing systems, to minimize time to completion and to prioritize edge use. The scheduler uses execution histories to estimate time to completion and uses these predictions to automatically place application “jobs”. We find that by doing so,vertical farming systems the scheduler is able to significantly reduce time to completion over always using just the edge or just the cloud. In addition, the scheduler simplifies IoT deployment by automatically executing and auto scaling jobs on any system on which it is deployed . We combine each of these advances into a single scalable end-to-end system called Hypatia available through an intuitive user interface of a web browser. The result is an open-source, end-to-end system that enables users to collect data from multiple sensors sources, including user’s files, web API’s, and other publicly available datasets. Hypatia provides abstractions for data management algorithms and implements multiple variants of the two frequently used ones: clustering and linear regression, allowing other algorithms to be easily “plugged in” following the same abstractions. For the given algorithms, Hypatia provides scoring, and model selection. To facilitate model selection and to better understand data coming from various sources, Hypatia provides different visualization solutions. Hypatia implements the scheduler described above, which minimizes time to completion of algorithms while considering the type of the algorithm, data transfer and computation time requirements, and whether most of the time will be spent on model training or its use for inference/analysis. We design Hypatia as a distributed system that executes on any virtualized system over which IoT applications can be deployed without modification . We evaluate Hypatia using a number of different IoT analytics applications and show that it enables low latency, reliability, machine learning model selection, error analysis, data visualization, and scheduling, in a unified scalable system. In the Chapters that follow, we provide background on existing technologies and research that is relevant to our work . In Chapter 3, we present our advances for automating clustering and the efficacy of its use on correlated data and soil EC analysis. In Chapter 4, we present “sensor synthesis” and show how it can be used to predict outdoor temperature from the CPU temperature of SBCs. We further extend the system to ingress multiple sensors and use multiple linear regression providing scores for each model for model selection. Chapter 5 details the overall Hypatia system which we integrate with each of our advances including sensor data ingress , statistical clustering and scoring, multiple linear regression and model selection, and scalable schedule and automated deployment IoT applications across edge and cloud resources.
Finally, in Chapter 6, we present our conclusions and plans for future work. Today IoT developers increasingly combine IoT devices with the scale, data and analytics services, and the cost-effectiveness of the cloud. However, at present, the heterogeneous , asynchronous, highly scalable, dynamically changing, and geographically distributed nature of IoT-cloud applications, makes their infrastructure complex and difficult to provision, program, and optimize for high performance, energy efficiency, and scale. In an attempt to overcome this challenge, cloud providers are investing heavily in new cloud services. Unfortunately, while effective as platforms for auto scaling web services, extant cloud offerings have not been able to ameliorate the complexities facing reliable and pervasive IoT application deployment, which must be overcome if we are to achieve the transformational societal impact that IoT promises. First, the volumes and velocity of data produced by IoT systems Int has forced a movement from a centralized model of computing to an “edge”connected model for IoT, in which computation and analysis must be performed near where data is generated. The centralized approach imposes significant request latency and power consumption on remote devices at the network “edge” – prohibiting real-time, data-driven response. Instead, co-location of processing infrastructure and IoT devices significantly reduces the latency between data acquisition and device actuation enables the extension of device capability via local offloading, and alleviates the cost, power consumption, and congestion of network use of the cloud-direct model Floyer , Bonomi et al. , Satyanarayanan et al. , Satyanarayanan , Verbelen et al. . Some cloud vendors offer restricted versions of cloud services for edge devices AWS IoT Core , AWS IoT Green grass , Azure IoT Hub , Azure IoT Edge , Cloud IoT Core , Edge TPU , Bosch IoT Suite , General Electric IoT . However, these solutions are not portable across cloud vendors , they do not allow arbitrary computations and data analytics at the edge, they are hard to use due to complex configuration, and not being open source precludes extension and reproducibility. Despite the many advances in cloud services and cloud-based data analytics, few advances have made their way to the agriculture community. Such techniques, however, are critical for lowering the cost of farm operations, reducing labor needs via automation, and increasing yields sustainably. However, smallholder agriculturists and their rural communities are strikingly under-served by technology, with few solutions becoming commonplace USDA . In this dissertation, we focus on the scalable analytics building blocks that are key for a wide range of applications. We then tailor the system and solutions to agricultural problems and settings so that they may provide growers with decision support as well as data-driven actuation and control for precision agriculture. Precision agriculture Committee on Assessing Crop Yield: Site-Specific Farming, Information Systems, and Research Opportunities, Board on Agriculture, National Research Council is a set of farm management techniques that use data from environmental sensors, historical records, and models, and farming operations, to provide decision support to growers and farm consultants.