GW should be able to go for Azure blobs registered in CDMIProxy directly


A blob can be registered in CDMI-Proxy and stored in an Azure backend. For optimization, it makes sense for GW to ask for backend url first, and if it's in Azure, try to access it directly.
Closed Jul 10, 2012 at 9:59 AM by ilja
Outdated, closing.


iblanque wrote Mar 30, 2012 at 11:28 AM

This is an important and blocking issue for large-scale production since in the way that the integration of the CDMI proxy and the GW work now, we got timeouts when multiple connections are requested simultaneously for large files.

By directly fetching the input data from the Azure storage by the GW nodes, using the URI given by the CDMI proxy, instead of being the CDMI proxy the one fetching the data, will also dramatically improve performance.

I will suggest (I do not how to make it) to increase the impact of this issue.

ilja wrote Mar 30, 2012 at 12:08 PM

There is a bit of functionality unexposed on the cdmiproxy side (getting only a relevant url), taking over the task till it's done.

From the GW point of view then, the process of download could look like:
  1. get actual url of the data
  2. if url \in azure namespace && GW has storage credentials of the storage -> resolve to a native streaming.
However, step 2 has a certain trick - GW must know storage credentials - not sure about the best way of approaching that.

ameick wrote Apr 4, 2012 at 3:05 PM

as this is more a kind of optimization, we (EMIC) decided to not implement this feature.
Regarding the timeouts, this should be fixed inside the CDMI Proxy. When CDMI Proxy accesses e.g. Azure blob storage, it can set the timeout to a higher value.

ilja wrote Apr 4, 2012 at 3:27 PM

@ameick - but it's a very reasonable optimization - if my application already sits in the cloud and I know that the data is in the same cloud - why cannot I download it directly, if I have the credentials?