PyPi
Here I’ll show you how to use some of HydroBr’s functions. If you have any doubts about the functions/methods available, you use the python’s help function or read the docs page.
ANA database
Here is an example of usage of the functions that provides data from the Brazilian National Water Agency (Agência Nacional de Águas - ANA) database.
Importing the package
import hydrobr
Getting data from ANA
List of stations
HydroBr has two functions for list of stations from ANA:
hydrobr.get_data.ANA.list_prec_stations
- To get the list of precipitation stationshydrobr.get_data.ANA.list_flow_stations
- To get the list of flow/stage stations
When you use one of these functions, you will be able to get the list of stations directly from the ANA database (source=’ANA’), where you will get a list with many stations that have no one registered data, leading you to lost time in your search for data.
I highly recommend you to use the functions standard source, ‘ANAF’, where you will get a filtered list of stations, which already contains some description about each of that. The description includes:
- date of first measured data (FirstDate)
- date of last measured data (EndDate)
- number of years with data (NYD)
- missing data percentage between the start and the last date (MD)
- number of years without any missing data (N_YWOMD)
- percentage of years with missing data (YWMD).
# To get the list of prec stations - source='ANAF' is the standart
list_stations = hydrobr.get_data.ANA.list_prec_stations()
# To show the first five rows of the data
list_stations.head()
Name | Code | Type | SubBasin | City | State | Responsible | Latitude | Longitude | StartDate | EndDate | NYD | MD | N_YWOMD | YWMD | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | SALINÓPOLIS | 47000 | 2 | 32 | SALINÓPOLIS | PARÁ | INMET | -0.6500 | -47.5500 | 1958/01/01 | 1964/12/31 | 7 | 25.0 | 0 | 100.0 |
1 | SALINÓPOLIS | 47002 | 2 | 32 | SALINÓPOLIS | PARÁ | ANA | -0.6231 | -47.3536 | 1977/12/09 | 2019/08/31 | 43 | 3.5 | 35 | 18.6 |
2 | CURUÇA | 47003 | 2 | 32 | CURUÇA | PARÁ | ANA | -0.7375 | -47.8536 | 1981/07/01 | 2019/07/31 | 39 | 2.4 | 29 | 25.6 |
3 | PRIMAVERA | 47004 | 2 | 32 | PRIMAVERA | PARÁ | ANA | -0.9294 | -47.0994 | 1982/02/18 | 2019/08/31 | 38 | 0.0 | 35 | 7.9 |
4 | MARUDA | 47005 | 2 | 32 | MARAPANIM | PARÁ | ANA | -0.6336 | -47.6583 | 1989/08/21 | 2019/07/31 | 31 | 5.0 | 20 | 35.5 |
# Getting the first five stations code as a list
stations_code = list_stations.Code.to_list()[:5]
If we need to know about the spatial distribution from a list of stations, we can use the
hydrobr.Plot.spatial_stations
function.
- To use this function you’ll need a mapbox access token, that you can get here.
- The functions from
hydrobr.Plot
providesplotly
figure, to show that we need to import theplotly
library. Thus it is possible to configure the layout of the plot.
In the example below, I will plot the flow station distribution in the state of Rio de Janeiro.
from plotly.offline import plot
map_box_access_token = 'your-mapbox-token-access-here'
flow_stations_list = hydrobr.get_data.ANA.list_flow_stations(
state='RIO DE JANEIRO')
#Get the spatial stations plot fig
spatial_fig=hydrobr.Plot.spatial_stations(flow_stations_list,map_box_access_token)
# If you want tu update the plot zoom!
spatial_fig.update_layout(mapbox=dict(zoom=7))
plot(spatial_fig,filename='spatial' + '.html')
The .html
figure output:
Stations’ Data
HydroBr has three functions to gate data from the ANA database:
hydrobr.get_data.ANA.prec_data
- To get the precipitation data from a list of precipitation stationshydrobr.get_data.ANA.flow_data
- To get the flow data from a list of flow/stage stationshydrobr.get_data.ANA.stage_data
- To get the stage data from a list of flow/stage stations
#Gettin the data
data_stations = hydrobr.get_data.ANA.prec_data(stations_code)
100%|██████████████████████████████████| 5/5 [00:13<00:00, 2.62s/it]
data_stations.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 22523 entries, 1958-01-01 to 2019-08-31
Freq: D
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 00047000 1918 non-null float64
1 00047002 14516 non-null float64
2 00047003 13397 non-null float64
3 00047004 13524 non-null float64
4 00047005 10212 non-null float64
dtypes: float64(5)
memory usage: 1.0 MB
In the last command, we used the pandas’ info
function, and it returns basic information about the data that we got.
For instance, for the station with code 00047003, we could see 13397 registered daily data (Freq: D) between
1958-01-01 and 2019-08-31.
But we can plot the temporal data availability using the Gantt Plot.
The functions from hydrobr.Plot
provides plotly
figure, to show that we need to import the plotly
library.
Thus it is possible to configure the layout of the plot.
from plotly.offline import plot
gantt_fig = hydrobr.Plot.gantt(data_stations) #Get the Gantt Fig
#Updating the layout
gantt_fig.update_layout(
autosize=False,
width=1000,
height=500,
xaxis_title = 'Year',
yaxis_title = 'Station Code',
font=dict(family="Courier New, monospace", size=12))
#To plot and save the gantt plot as html
plot(gantt_fig,filename='gantt' + '.html')
The .html
figure output: