The field of Geographical Information Systems (GIS) is applicable in many different industries and in everyday life (i.e. Google Maps). It’s a large field with many complications. These include projecting a 3D surface onto a 2D map, managing the data, and combining Spatial Data Layers. The different types of data include raster (continuous scalar fields), vector (points or vertices), or image (graphic with geo-reference attributes) data models. One resource for a brief presentation on these can be found here.
Geographic data is a particularly important concept in energy systems. Regulations differ by state, the sun shines in some areas more than others, transmission lines connect geographically separated areas, and people’s incomes vary across districts. All of these factors have a geographic component. These components affect where power plants, T&D lines, and other resources of the electric grid are built.
In this post, I’ll be focusing on the siting of energy storage. The goal is to investigate te connection between energy storage sites and solar sites. I’ll use many simplifications and keep things fairly basic. The first step is getting and manipulating the data. Then, I’ll visualize it by mapping it. Next, I’ll try to develop a correlation between solar and storage sites. Finally, I’ll write output to a shapefile, the common filetype in GIS.
The first step is to investigate the data. The solar sites data come from the NREL’s Open PV Project. This database contains voluntarily submitted information primarily from state run incentive programs, utilities, and large organizations. The only pertinent information from the data is the zipcode which is as granular as the locations get, as there is no latitude/longitude data. The next step is that I get a unique count for each zipcode to create a distribution of the number of sites in each zipcode. In order to map the sites, I need to merge the zipcodes with a database of latitude/longitudes. In this way, I have a table which contains zipcodes of solar sites, the number of projects in that zipcode, and the latitude/longitude of those zipcodes.
The storage projects data come from the DOE’s Global Energy Storage Database. While the dataset contains a myriad of information on each project, the most important for this project was the zipcode and coordinates of the sites. I use the latitude/longitude of each site for more accurate mapping, but use the zipcodes data for analysis with solar data to match granularity. One problem that I came across was a lack of zipcode data. To solve this, I used the Geopy module to look up zipcodes from the more abundant coordinates data. Since, this is done with API calls to Nominatim, there is the issue of having too many calls and being errored out. For this reason, I run the code once to get the zipcodes, then save it in a file and work out of that file for the rest of the development.
The first step I do is to plot the points on a map. I use the Basemap module in python to accomplish this. The projection used is the Miller Cylindrical Projection because it makes for a nice square map. Once again, the solar sites are at zipcode scale, but the storage sites are individual coordinates. There is the possibility of loss of information, especially with bigger zipcode areas which may make it look like there are fewer solar sites because the sites get aggregated together. Even so, there are still so many more solar sites, that this is a moot point. For this reason, I map the storage sites second, on top of the solar sites.
There is a clear correlation between the two. This is seen in the abundance of storage sites on the east coast, west coast, and Denver where there are also more solar sites. There seem to be certain anomalies where there are more solar site than I’d expect; especially, in Wisconsin, Indiana, and Tennessee. I question the data and how it was reported. It may be the case that these are areas that are more heavily reported. It is also the case that the perceived areas get blown u because one dot is scaled pretty large for a map the size of the United States. You have to be careful of the conclusions you draw from this map.
Solar and Storage Correlation
In comparing storage and solar, I look at a scatterplot of the distribution of site counts per zipcode. This is how many solar and storage sites are in each zipcode. For this I need to merge the two pieces on zipcode. I do an inner merge which means that I’m only looking at zipcodes where there is at least one solar site and one storage site. This is why there are no points directly at zero. While this plot isn’t what I was expecting (I thought I’d have a nice positively correlated plot from which I could draw a best fit line), it is telling. First of all, the scales of each are very disproportionate. There are up to 600 solar sites in some zipcodes, while the max number of storage sites in a county is 11. This makes it very hard to draw a relationship. Not withstanding, the overall trend is that there is a large number of solar sites in a zip or large number of storage sites, but not both.
Finally, I want to save these points as a shapefile. This is to get experience working with python’s shapely and fiona packages. As I learned following this tutorial, Shapely manipulates and analyzes geometric data and Fiona does reading and writing of file formats. I end up saving solar points as a layer and storage points as a layer. This outputs two separate .shp files along with their corresponding .dbf (database of attributes) and .shx (index positions) files.
All in all, this was a great experience in mapping clean energy in the United States. The most interesting facet of this project is to see the distribution of sites across the U.S. It’s no surprise that the coasts have a majority of the sites due to some more progressive policies. Also expected is that there are very few storage projects, especially in comparison to solar sites. There are some anomalies of there being more solar sites in the Midwest than expected. It would be very tough to predict the location of a storage site based on current solar sites, even at the relatively large granularity level of zipcodes. Other possible factors that might help prediction might be state policy environment, household income, and current energy mix of the area. Overall, a good experience in mapping clean energy.