Environmental data arriving constantly from satellites and weather
stations are used to compute weather coefficients that are essential
for agriculture and viticulture. For example, the reference evapotranspiration
(ET0) coefficient, overlaid on regional maps, is provided each day
by the California Department of Water Resources to local farmers
and turf managers to plan daily water use. Scaling out single-processor
compute/data intensive applications operating on realtime data to
support more users and higher-resolution data poses data engineering
challenges. Cloud computing helps data providers expand resource
capacity to meet growing needs besides supporting scientific needs
like reprocessing historic data using new models. In this article,
we examine migration of a legacy script used for daily ET<inf>0</inf>
computation by CIMIS to a workflow model that eases deployment to
and scaling on the Windows Azure Cloud. Our architecture incorporates
a direct streaming model into Cloud virtual machines (VMs) that improves
the performance by 130% to 160% for our workflow over using Cloud
storage for data staging, used commonly. The streaming workflows
achieve runtimes comparable to desktop execution for single VMs and
a linear speed-up when using multiple VMs, thus allowing computation
of environmental coefficients at a much larger resolution than done
presently.
%0 Conference Paper
%1 Zinn:works:2010
%A Zinn, Daniel
%A Hart, Quinn
%A Ludascher, Bertram
%A Simmhan, Yogesh
%B Workshop on Workflows in Support of Large-Scale Science (WORKS)
%D 2010
%I IEEE
%K cloud, escience, peer reviewed streaming, usc, workflow,
%P 1-8
%R 10.1109/WORKS.2010.5671841
%T Streaming satellite data to cloud workflows for on-demand computing
of environmental data products
%U http://ceng.usc.edu/~simmhan/pubs/zinn-works-2010.pdf
%X Environmental data arriving constantly from satellites and weather
stations are used to compute weather coefficients that are essential
for agriculture and viticulture. For example, the reference evapotranspiration
(ET0) coefficient, overlaid on regional maps, is provided each day
by the California Department of Water Resources to local farmers
and turf managers to plan daily water use. Scaling out single-processor
compute/data intensive applications operating on realtime data to
support more users and higher-resolution data poses data engineering
challenges. Cloud computing helps data providers expand resource
capacity to meet growing needs besides supporting scientific needs
like reprocessing historic data using new models. In this article,
we examine migration of a legacy script used for daily ET<inf>0</inf>
computation by CIMIS to a workflow model that eases deployment to
and scaling on the Windows Azure Cloud. Our architecture incorporates
a direct streaming model into Cloud virtual machines (VMs) that improves
the performance by 130% to 160% for our workflow over using Cloud
storage for data staging, used commonly. The streaming workflows
achieve runtimes comparable to desktop execution for single VMs and
a linear speed-up when using multiple VMs, thus allowing computation
of environmental coefficients at a much larger resolution than done
presently.
@inproceedings{Zinn:works:2010,
abstract = {Environmental data arriving constantly from satellites and weather
stations are used to compute weather coefficients that are essential
for agriculture and viticulture. For example, the reference evapotranspiration
(ET0) coefficient, overlaid on regional maps, is provided each day
by the California Department of Water Resources to local farmers
and turf managers to plan daily water use. Scaling out single-processor
compute/data intensive applications operating on realtime data to
support more users and higher-resolution data poses data engineering
challenges. Cloud computing helps data providers expand resource
capacity to meet growing needs besides supporting scientific needs
like reprocessing historic data using new models. In this article,
we examine migration of a legacy script used for daily ET<inf>0</inf>
computation by CIMIS to a workflow model that eases deployment to
and scaling on the Windows Azure Cloud. Our architecture incorporates
a direct streaming model into Cloud virtual machines (VMs) that improves
the performance by 130% to 160% for our workflow over using Cloud
storage for data staging, used commonly. The streaming workflows
achieve runtimes comparable to desktop execution for single VMs and
a linear speed-up when using multiple VMs, thus allowing computation
of environmental coefficients at a much larger resolution than done
presently.},
added-at = {2014-08-13T04:08:36.000+0200},
author = {Zinn, Daniel and Hart, Quinn and Ludascher, Bertram and Simmhan, Yogesh},
biburl = {https://www.bibsonomy.org/bibtex/292accf1d9497df3b4f5780dbb018eb0c/simmhan},
booktitle = {Workshop on Workflows in Support of Large-Scale Science (WORKS)},
doi = {10.1109/WORKS.2010.5671841},
interhash = {0fee12da111927c918cf0ea452fbfea3},
intrahash = {92accf1d9497df3b4f5780dbb018eb0c},
keywords = {cloud, escience, peer reviewed streaming, usc, workflow,},
month = {November},
owner = {Simmhan},
pages = {1-8},
publisher = {IEEE},
timestamp = {2014-08-13T04:08:36.000+0200},
title = {Streaming satellite data to cloud workflows for on-demand computing
of environmental data products},
url = {http://ceng.usc.edu/~simmhan/pubs/zinn-works-2010.pdf},
year = 2010
}