Read both California energy datasets. Make sure the datetime
variable is in an appropriate data type (i.e. not character).
Merge the two datasets and then melt the resulting dataframe/datatable to make it tidy.
Create a series of new variables:
day
, which is the year-month-day, without the hour. The lubridate
function as_date
will do this.log_output
, which is the natural log of the output.per_output
, which is the percent of daily output represented by each observation. You will need to use group_by
and to create a new variable with the total output for the day. (Make sure to use ungroup()
after this!)Bonus: If you are using dplyr
, try to do this all in one pipe!
dplyr
verb arrange(desc(variable))
to order the data frame so that the largest value of variable
is first. Don’t use desc
and it arranges in ascending order. The data.table
function is setorder
.) Which has the least?The dataset regroup.csv
has information about which sources are considered renewable by the state of California. Use this dataset, along with yourdata manipulation skills, to explore the use of renewable and non-renewable sources. Annotate what your descisions for the analysis.
Hint: Use your merge skills to merge the CA energy data with the regroup
data. Which variable should you join by?