Creating datasets with units

Attaching units

Usually, when loading data from disk we get a Dataset or DataArray with units in attributes:

In [1]: ds = xr.Dataset(
   ...:     {
   ...:         "a": (("lon", "lat"), [[11.84, 3.12, 9.7], [7.8, 9.3, 14.72]]),
   ...:         "b": (("lon", "lat"), [[13, 2, 7], [5, 4, 9]], {"units": "m"}),
   ...:     },
   ...:     coords={"lat": [10, 20, 30], "lon": [74, 76]},
   ...: )
   ...: ds
   ...: 
Out[1]: 
<xarray.Dataset> Size: 136B
Dimensions:  (lon: 2, lat: 3)
Coordinates:
  * lat      (lat) int64 24B 10 20 30
  * lon      (lon) int64 16B 74 76
Data variables:
    a        (lon, lat) float64 48B 11.84 3.12 9.7 7.8 9.3 14.72
    b        (lon, lat) int64 48B 13 2 7 5 4 9

In [2]: da = ds.b
   ...: da
   ...: 
Out[2]: 
<xarray.DataArray 'b' (lon: 2, lat: 3)> Size: 48B
array([[13,  2,  7],
       [ 5,  4,  9]])
Coordinates:
  * lat      (lat) int64 24B 10 20 30
  * lon      (lon) int64 16B 74 76
Attributes:
    units:    m

In order to get astropy.Quantity instances, we can use the Dataset.astropy.quantify() or DataArray.astropy.quantify() methods:

In [3]: ds.astropy.quantify()
Out[3]: 
<xarray.Dataset> Size: 136B
Dimensions:  (lon: 2, lat: 3)
Coordinates:
  * lat      (lat) int64 24B 10 20 30
  * lon      (lon) int64 16B 74 76
Data variables:
    a        (lon, lat) float64 48B 11.84 3.12 9.7 7.8 9.3 14.72
    b        (lon, lat) float64 48B [m] 13.0 2.0 7.0 5.0 4.0 9.0

We can also override the units of a variable:

In [4]: ds.astropy.quantify(b="km")
Out[4]: 
<xarray.Dataset> Size: 136B
Dimensions:  (lon: 2, lat: 3)
Coordinates:
  * lat      (lat) int64 24B 10 20 30
  * lon      (lon) int64 16B 74 76
Data variables:
    a        (lon, lat) float64 48B 11.84 3.12 9.7 7.8 9.3 14.72
    b        (lon, lat) float64 48B [km] 13.0 2.0 7.0 5.0 4.0 9.0

In [5]: da.astropy.quantify("degree")
Out[5]: 
<xarray.DataArray 'b' (lon: 2, lat: 3)> Size: 48B
<Quantity [[13.,  2.,  7.],
           [ 5.,  4.,  9.]] deg>
Coordinates:
  * lat      (lat) int64 24B 10 20 30
  * lon      (lon) int64 16B 74 76

Overriding works even if there is no units attribute, so we could use this to attach units to a normal Dataset:

In [6]: temporary_ds = xr.Dataset({"a": ("x", [0, 5, 10])}, coords={"x": [1, 2, 3]})
   ...: temporary_ds.astropy.quantify({"a": "m"})
   ...: 
Out[6]: 
<xarray.Dataset> Size: 48B
Dimensions:  (x: 3)
Coordinates:
  * x        (x) int64 24B 1 2 3
Data variables:
    a        (x) float64 24B [m] 0.0 5.0 10.0

Of course, we could use astropy.Unit instances instead of strings to specify units, too.

Note

Unit objects tied to different registries cannot interact with each other. In order to avoid this, DataArray.astropy.quantify() and Dataset.astropy.quantify() will make sure only a single registry is used per xarray object.

If we wanted to change the units of the data of a DataArray, we could do so using the DataArray.name attribute:

In [7]: da.astropy.quantify({da.name: "J", "lat": "degree", "lon": "degree"})
Out[7]: 
<xarray.DataArray 'b' (lon: 2, lat: 3)> Size: 48B
<Quantity [[13.,  2.,  7.],
           [ 5.,  4.,  9.]] J>
Coordinates:
  * lat      (lat) float64 24B [°] 10.0 20.0 30.0
  * lon      (lon) float64 16B [°] 74.0 76.0
Indexes:
    lat      AstropyIndex(PandasIndex)
    lon      AstropyIndex(PandasIndex)

However, xarray currently doesn’t support units in indexes, so the new units were set as attributes. To really observe the changes the quantify methods make, we have to first swap the dimensions:

In [8]: ds_with_units = ds.swap_dims({"lon": "x", "lat": "y"}).astropy.quantify(
   ...:     {"lat": "degree", "lon": "degree"}
   ...: )
   ...: ds_with_units
   ...: 
Out[8]: 
<xarray.Dataset> Size: 136B
Dimensions:  (x: 2, y: 3)
Coordinates:
    lat      (y) float64 24B [°] 10.0 20.0 30.0
    lon      (x) float64 16B [°] 74.0 76.0
Dimensions without coordinates: x, y
Data variables:
    a        (x, y) float64 48B 11.84 3.12 9.7 7.8 9.3 14.72
    b        (x, y) float64 48B [m] 13.0 2.0 7.0 5.0 4.0 9.0

In [9]: da_with_units = da.swap_dims({"lon": "x", "lat": "y"}).astropy.quantify(
   ...:     {"lat": "degree", "lon": "degree"}
   ...: )
   ...: da_with_units
   ...: 
Out[9]: 
<xarray.DataArray 'b' (x: 2, y: 3)> Size: 48B
<Quantity [[13.,  2.,  7.],
           [ 5.,  4.,  9.]] m>
Coordinates:
    lat      (y) float64 24B [°] 10.0 20.0 30.0
    lon      (x) float64 16B [°] 74.0 76.0
Dimensions without coordinates: x, y

By default, Dataset.astropy.quantify() and DataArray.astropy.quantify() will use the unit registry at astropy_xarray.unit_registry (the application registry). If we want a different registry, we can either pass it as the unit_registry parameter:

In [10]: import astropy.units as u

In [11]: da.astropy.quantify("degree")
Out[11]: 
<xarray.DataArray 'b' (lon: 2, lat: 3)> Size: 48B
<Quantity [[13.,  2.,  7.],
           [ 5.,  4.,  9.]] deg>
Coordinates:
  * lat      (lat) int64 24B 10 20 30
  * lon      (lon) int64 16B 74 76

or overwrite the default registry:

In [12]: da.astropy.quantify("degree")
Out[12]: 
<xarray.DataArray 'b' (lon: 2, lat: 3)> Size: 48B
<Quantity [[13.,  2.,  7.],
           [ 5.,  4.,  9.]] deg>
Coordinates:
  * lat      (lat) int64 24B 10 20 30
  * lon      (lon) int64 16B 74 76

Note

To properly work with xarray, the force_ndarray_like or force_ndarray options have to be enabled on the custom registry.

Without it, python scalars wrapped by astropy.Quantity may raise errors or have their units stripped.

Saving with units

In order to not lose the units when saving to disk, we first have to call the Dataset.astropy.dequantify() and DataArray.astropy.dequantify() methods:

In [13]: ds_with_units.astropy.dequantify()
Out[13]: 
<xarray.Dataset> Size: 136B
Dimensions:  (x: 2, y: 3)
Coordinates:
    lat      (y) float64 24B 10.0 20.0 30.0
    lon      (x) float64 16B 74.0 76.0
Dimensions without coordinates: x, y
Data variables:
    a        (x, y) float64 48B 11.84 3.12 9.7 7.8 9.3 14.72
    b        (x, y) float64 48B 13.0 2.0 7.0 5.0 4.0 9.0

In [14]: da_with_units.astropy.dequantify()
Out[14]: 
<xarray.DataArray 'b' (x: 2, y: 3)> Size: 48B
array([[13.,  2.,  7.],
       [ 5.,  4.,  9.]])
Coordinates:
    lat      (y) float64 24B 10.0 20.0 30.0
    lon      (x) float64 16B 74.0 76.0
Dimensions without coordinates: x, y
Attributes:
    units:    m

This will get the string representation of a astropy.Unit instance and attach it as a units attribute. The data of the variable will now be whatever astropy wrapped.