Background


Summary: This vignette explores spatial hotspots and seasonality of Flickr images in the northeastern United States, using the spatstat package.

# load some core packages
library(maptools)
library(spatstat)
library(ggmap)
library(rgdal)

Social media data contains a trove of rich information about human activity and behavior. Since the first appearance of popular social media websites, researchers have been keen to develop ways to exploit these Internet data archives for a variety of academic and commercial purposes.

Image sharing platforms that host geotagged photographs, such as Instagram and Flickr, are unique in that they present opportunities to examine spatial and thematic patterns in human activity. Every public image posted to these websites contains (1) coordinates and (2) content, which combined provide numerous opportunities, for better or worse, to explore human interactions in the modern world. For a while, these data were freely available from open source API programs. But this has changed in the last few years with more and more platforms charging fees for data downloads.

Over the last decade, a new wave of research by ecologists and social scientists has examined the capacity of social media images for understanding human-environmental interactions. From estimating park visitation in protected areas (Kuehn et al. 2020; Wood et al. 2013) to monitoring insect populations (Medhat et al. 2017), these data offer a wide variety of applications for conservation. For an in depth recent review, see Ghermandi and Sinclair (2019)https://doi.org/10.1016/j.gloenvcha.2019.02.003.

In this document, I showcase some basic spatial and temporal data summary and visualization techniques with Flickr images from the Northern Forest, a mostly rural region extending from New York to Maine. This project is part of an ongoing study with the State University of New York College of Environmental Science and Forestry.

More information is available at https://www.esf.edu/socialmediastudy/.

 

Data prep


These Flickr data for the Northern Forest Region (NFR) were prepared in advance, so I won’t be demonstrating how they were mined from the Flickr API.

Preparing data for spatspat is straightforward. If you have a shapefile already, you simply load your the file(s), via readOGR or using one of the tidier sf methods.

# load rural Flickr point and gridded shapefiles
flickr.rural <- readOGR("flickr_rural.shp", verbose=FALSE)
flickr.grid <- readOGR("flickr_grid_counts_042518.shp", verbose=FALSE)
NFR.shp <- readOGR("NFR_boundary.shp", verbose=FALSE)

Then you reclassify your point files as ppp objects, which is what spatstat works with.

# create a spatstat spatial planar point pattern (ppp) object
flickr.pts.ppp <- as(flickr.rural, "ppp")
flickr.grid.ppp <- as(flickr.grid, "ppp")

The last step is assigning your points to a window, or an owin object.

NFR <- as.owin(NFR.shp)
class(NFR)
## [1] "owin"

Prior to analyzing the image data and generating maps, duplicate images (i.e., pictures taken by the same photographer on the same day and at the same coordinates) were omitted to reduce redundancies.

Ultimately, 116,360 images were used to identify regional hotspots. Of these, 89,231 (77%) images were taken in rural areas (i.e., outside of US census designated urban areas). We focus on the spatial and temporal patterns of these rural images in this document, using the following data fields.

colnames(flickr.pts.ppp$marks)
##  [1] "OBJECTID"   "caption"    "url"        "isUrban"    "latitude"  
##  [6] "longitude"  "id"         "owner"      "hometown"   "datetaken" 
## [11] "dateupload" "tags1"      "tags2"      "tags3"      "tags4"     
## [16] "tags5"      "tags6"      "tags7"      "tags8"      "tags9"     
## [21] "tags10"     "uniqueID"   "STUSPS"     "month"      "date"

Quick note: the tags were generated from neural networks run in the Clarifai image classification software to identify image content for all the photographs in our database. You can find more information on the project website for this study.

Images


Below are all of the original image locations, clustered by their density in the landscape.

# project rural points to web mercator
flickr.rural.NFR <- spTransform(flickr.rural, CRS("+proj=longlat +datum=WGS84"))
NFR.shp <- spTransform(NFR.shp, CRS("+proj=longlat +datum=WGS84"))

library(leaflet)
flickr.view <- 
  leaflet(data = flickr.rural.NFR) %>%
  addTiles() %>%
  addMarkers(clusterOptions = markerClusterOptions(), 
             popup = paste0("<a href='", flickr.rural.NFR$url, 
                            "' target='_blank'>", 
                            "Click Here to View Image</a>")) %>%
  addPolygons(data = NFR.shp, color = "#444444", weight = 1, smoothFactor = 0.5,
              opacity = 1, fillOpacity = 0.2) %>%  
  addTiles(group = "OSM (default)") %>%
  addProviderTiles(providers$CartoDB.Positron, group = "B/W") %>% 
  addProviderTiles(providers$Esri.WorldTopoMap, group = "Topo") %>%  
  addLayersControl(baseGroups = c("OSM (default)", "B/W", "Topo"),
                   options = layersControlOptions(collapsed = FALSE))

flickr.view

Hotspots


To visualize areas of high photographic activity, we’ll kernel density estimation to generate a series of “hotspot” maps. Instead of the raw point data, we’ll use a version of the Flickr data set to a 1 km2 resolution grid (this was for future geospatial analyses, not included in this vignette).

These hotspots were delineated with individual boundaries for further analyses that assessed visitor movements throughout the region.

Regional hotspots

# image locations
col1 <- rgb(0,0,0.5, alpha = 0.3) 
plot(flickr.grid.NFR[flickr.grid.NFR$marks$flickr_rur > 0,], 
     use.marks = TRUE, which.marks = "flickr_rur", cols = col1, 
     main = "Rural photos", pch = 20, markscale = 0.005, legend = TRUE)
The above map shows rural image locations in the Northern Forest. Each point is sized proportionally to the number of images recorded per square kilometer.

The above map shows rural image locations in the Northern Forest. Each point is sized proportionally to the number of images recorded per square kilometer.

Kernel density estimation is accomplished in spatstat with the density.ppp function (you don’t really need the “ppp” there, as the function will automatically identify ppp objects from the spatstat package.

The key parameter in kernel density estimation is the search bandwidth, or sigma, which you could think of as your density resolution. Smaller input values for sigma mean smaller search radii, and thus, more local/high-resolution densities. Larger values produce wider/low-resolution density surfaces. There are various strategies to identifying a proper value for sigma, including cross-validation algorithms (via “bw.ppl” or “bw.diggle”), but ultimately it’s up to you for the needs of your map. In this instance, I used a 7.5 km value for sigma, after a bit of trial and error.

Some other arguments in density.ppp are:

  • diggle: for making edge corrections if you have a complex boundary you are calculating density over
  • weights: if you want to weight certain points differently in your density calculation (i.e., if you have values associated with individual points)
  • eps: to specify pixel size (resolution for the density surface, based on your current unit)
par(mar=c(1,0,4,0))
# kernel density
library(scico)
img.pal <- scico(30, palette = "batlow")
NFR.rural.den <- density.ppp(flickr.grid.NFR, diggle = TRUE, 
                             sigma = 7.5, eps = 1, 
                             weights = flickr.grid.NFR$marks$flickr_rur)

plot(NFR.rural.den, main="Rural image density", col = img.pal)
The above map shows the kernel density of rural images (higher values/brighter colors indicate areas of greater image density). A search bandwidth of 7.5 km was used to generate density values.

The above map shows the kernel density of rural images (higher values/brighter colors indicate areas of greater image density). A search bandwidth of 7.5 km was used to generate density values.

We see several clusters across the Northern Forest Region from this kernel density surface. The White Mountains of New Hampshire and Acadia National Park particularly stand out.

We can also discretize these kernel density surfaces to density categories, using the quantile function to create custom breaks and the cut and tess functions to adjust the density raster.

par(mar=c(1,0,4,0))

# kernel density quantiles
NFR.breaks<- quantile(NFR.rural.den, probs = (0:3)/3)
NFR.cuts <- cut(NFR.rural.den, breaks = NFR.breaks, 
                labels = c("Low", "Med", "High"))
NFR.quantiles <- tess(image = NFR.cuts) 
plot(NFR.quantiles, main = "Rural image quantiles", col = img.pal[c(1,10,20)])
The above map groups kernel density image values into three categories: low (0 - 33% percentile), medium (33 - 66% percentile), and high (66 - 100% percentile) areas of rural photography.

The above map groups kernel density image values into three categories: low (0 - 33% percentile), medium (33 - 66% percentile), and high (66 - 100% percentile) areas of rural photography.

We can further isolate hotspot regions from this density surface, using a variety of approaches.

The simplest approach would be with as before, with percentiles, extracting all areas above a certain threshold (e.g., top ten percent).

NFR.breaks<- quantile(NFR.rural.den, probs = (0:10)/10)
NFR.cuts <- cut(NFR.rural.den, breaks = NFR.breaks, 
                       labels = c("10", "20", "30", "40", "50", 
                                  "60", "70", "80", "90", "100"))
NFR.quantiles <- tess(image = NFR.cuts) 
NFR.quantiles <- NFR.quantiles$image
NFR.quantiles <- rescale(NFR.quantiles, 0.001, "m")
NFR.quantiles$v[NFR.quantiles$v != "100"] <- NA
hotspot_mask <- as.owin(NFR.quantiles, fatal = TRUE)

plot(hotspot_mask, show.window = FALSE, "Rural image hotspots (percentiles)", 
     axes = FALSE, legend = FALSE)
plot(NFR, add = TRUE, box = TRUE)
Top ten percent of rural Flickr images in the Northern Forest.

Top ten percent of rural Flickr images in the Northern Forest.

We see a range of unique hotspots across the Northern Forest with this approach, from the Thousand Islands region in New York to Baxter State Park in Maine.

Another option to identify hotspots is with a thresholding algorithm. One such package that achieves this is the autothresholdr package, which provides various different algorithms to define high density areas. This package is essentially an R wrapper for the “Auto Threshold” plugin feature in the ImageJ software. More information can be found here: http://imagej.net/Auto_Threshold.

We will use Huang’s threshold algorithm as an example for the Northern Forest.

library(raster)
library(autothresholdr)

flickr.den.mask <- rescale(NFR.rural.den, 0.001, "m")
flickr.den.mask$v <- as.integer(flickr.den.mask$v)
hotspot_mask <- auto_thresh_mask(flickr.den.mask$v, 
                                 method = "Huang", ignore_na = TRUE)
flickr.den.mask$v <- matrix(flickr.den.mask$v*hotspot_mask, 
                            nrow = 626, ncol = 713)
flickr.den.mask$v[flickr.den.mask$v > 0] <- 1
flickr.den.mask$v[flickr.den.mask$v == 0] <- NA
NFR_density_huang <- raster(flickr.den.mask)

plot(NFR_density_huang, axes = FALSE, legend = FALSE, 
     col = "black", main = "Rural image hotspots (Huang)")
plot(NFR, add = TRUE)

Our results are fairly similar to the percentile approach.

I followed the same procedure as above with Huang’s threshold to define distinct hotspots for the Northern Forest for further analyses, but included both urban and rural images to capture key demographic areas. The shapefiles were exported to ArcGIS, where some large contiguous hotspots (e.g., the Champlain Valley-Green Mountains region) were manually divided to reflect economically and geographically distinct recreation/cultural areas.

In total, 29 individual hotspots were mapped as follows.

hotspot_tess <- readOGR("hotspot_tess.shp", verbose=FALSE)
hotspot_poly <- readOGR("hotspot_poly.shp", verbose=FALSE)
NFR.shp <- readOGR("NFR_boundary.shp", verbose=FALSE)

# plot hotspots and NFR boundary
plot(NFR.shp, border = "black")
plot(hotspot_poly, col = "grey", border = "white", add = TRUE)
plot(NFR.shp, border = "darkgrey", add = TRUE)
pointLabel(hotspot_poly$X, hotspot_poly$Y, cex = 1.3, 
           labels = hotspot_poly$Name, font = 2)

I used these discrete hotspots to divide the entire Northern Forest into 29 “hotspot regions” using Thiessen polygons. These hotspots/polygons were used to analyze visitor traces throughout the Northern Forest.

plot(NFR.shp, border = "black")
plot(hotspot_tess, col = "grey", border = "white", add = TRUE)
plot(NFR.shp, border = "darkgrey", add = TRUE)
pointLabel(hotspot_tess$X, hotspot_tess$Y, cex = 1.3, labels = hotspot_tess$Hotspot, font = 2)

State hotspots

These next maps apply the same kernel density estimation and hotspot identification strategy, but to the individual states within the Northern Forest (New York, Vermont, New Hampshire, and Maine) to see more in depth perspectives of social media activity in each state.

We’ll use a smaller search radius of 5km for kernel densities, since we are working with smaller areas. And this time I’ll use percentiles to delineate state hotspot zones.

# load state NFR boundaries
NY.shp <- readOGR("NY_NFR_boundary.shp", verbose=FALSE)
VT.shp <- readOGR("VT_NFR_boundary.shp", verbose=FALSE)
NH.shp <- readOGR("NH_NFR_boundary.shp", verbose=FALSE)
ME.shp <- readOGR("ME_NFR_boundary.shp", verbose=FALSE)

# project data (points) and windows to UTMs
flickr.grid.NYVT <- spTransform(flickr.grid, 
                                CRS("+proj=utm +zone=18+datum=WGS84"))
flickr.grid.NHME <- spTransform(flickr.grid, 
                                CRS("+proj=utm +zone=19+datum=WGS84"))
NY.shp <- spTransform(NY.shp, CRS("+proj=utm +zone=18+datum=WGS84"))
VT.shp <- spTransform(VT.shp, CRS("+proj=utm +zone=18+datum=WGS84"))
NH.shp <- spTransform(NH.shp, CRS("+proj=utm +zone=19+datum=WGS84"))
ME.shp <- spTransform(ME.shp, CRS("+proj=utm +zone=19+datum=WGS84"))

# convert shapefiles to windows
NY <- as.owin(NY.shp)
VT <- as.owin(VT.shp)
NH <- as.owin(NH.shp)
ME <- as.owin(ME.shp)

# create point pattern processes
flickr.grid.NYVT <- as(flickr.grid.NYVT, "ppp")
flickr.grid.NHME <- as(flickr.grid.NHME, "ppp")

flickr.grid.NYVT$marks$flickr_tot <- 
  as.numeric(as.character(flickr.grid.NYVT$marks$flickr_tot))
flickr.grid.NYVT$marks$flickr_rur <- 
  as.numeric(as.character(flickr.grid.NYVT$marks$flickr_rur))
flickr.grid.NHME$marks$flickr_tot <- 
  as.numeric(as.character(flickr.grid.NHME$marks$flickr_tot))
flickr.grid.NHME$marks$flickr_rur <- 
  as.numeric(as.character(flickr.grid.NHME$marks$flickr_rur))

## NY
flickr.grid.NY <- flickr.grid.NYVT[NY]
flickr.grid.NY <- rescale.ppp(flickr.grid.NY, 1000, "km")
## VT
flickr.grid.VT <- flickr.grid.NYVT[VT]
flickr.grid.VT <- rescale.ppp(flickr.grid.VT, 1000, "km")
## NH
flickr.grid.NH <- flickr.grid.NHME[NH]
flickr.grid.NH <- rescale.ppp(flickr.grid.NH, 1000, "km")
## ME
flickr.grid.ME <- flickr.grid.NHME[ME]
flickr.grid.ME <- rescale.ppp(flickr.grid.ME, 1000, "km")

New York

Let’s now make four different maps for rural Flickr images from New York and other states:

  1. Kernel density
  2. Quantiles (three density categories)
  3. Hotspots (top 10%)
  4. Actual point locations of images
par(mfrow = c(2,2), mar=c(1,0,4,0))
# NY density
flickr.NY.den <- density.ppp(flickr.grid.NY, diggle=T, eps = 1, sigma = 5, 
                             weights = flickr.grid.NY$marks$flickr_rur)
plot(flickr.NY.den, main = "NY-NFR Flickr densities", col = img.pal)

# NY quantiles
NY.map.breaks <- quantile(flickr.NY.den, probs = (0:3)/3) 
NY.map.cuts <- cut(flickr.NY.den, breaks = NY.map.breaks, 
                   labels = c("Low", "Med", "High"))
NY.map.quartile <- tess(image = NY.map.cuts)
plot(NY.map.quartile, main="NY-NFR Flickr quantiles", col = img.pal[c(1,10,20)])

# 3. NY hotspots (90%)
NY.map.breaks<- quantile(flickr.NY.den, probs = (0:10)/10)
NY.map.cuts <- cut(flickr.NY.den, breaks = NY.map.breaks, 
                       labels = c("10", "20", "30", "40", "50", 
                                  "60", "70", "80", "90", "100"))
NY.quantiles <- tess(image = NY.map.cuts) 
NY.quantiles <- NY.quantiles$image
NY.quantiles <- rescale(NY.quantiles, 0.001, "m")
NY.quantiles$v[NY.quantiles$v != "100"] <- NA
hotspot_mask <- as.owin(NY.quantiles, fatal = TRUE)
plot(hotspot_mask, show.window = FALSE, "NY-NFR Flickr hotspots")
plot(NY, add = TRUE, box = TRUE)

# 4. NY points, with proportional symbols
col1 <- rgb(0,0,0.5, alpha = 0.3) 
plot(flickr.grid.NY[flickr.grid.NY$marks$flickr_rur > 0,], 
     use.marks = TRUE, which.marks = "flickr_rur", cols = col1, 
     main = "NY photos", pch = 20, markscale = 0.008, legend = TRUE)
New York social media hotspots in the Northern Forest.

New York social media hotspots in the Northern Forest.

Vermont

I’ll hide the code for these other states, since it is just repetitive of the code for New York.

Vermont social media hotspots in the Northern Forest.

Vermont social media hotspots in the Northern Forest.

New Hampshire

New Hampshire social media hotspots in the Northern Forest.

New Hampshire social media hotspots in the Northern Forest.

Maine

Maine social media hotspots in the Northern Forest.

Maine social media hotspots in the Northern Forest.

Tag patterns

In this section, I’ll dive into the actual image content from Clarifai’s classification output.

Here I’ll compare what visitors are actually taking pictures of in heavily trafficked areas (i.e., our hotspots) versus low-use areas (everywhere else) in the Northern Forest. First we need to link our images to new windows that define the hotspot and “coldspot” areas.

hotspots.shp <- readOGR("hotspot_poly.shp", verbose=FALSE)
coldspots.shp <- hotspot_poly <- readOGR("NFR_coldspots.shp", verbose=FALSE)
hotspots <- as.owin(hotspots.shp)
coldspots <- as.owin(coldspots.shp)
hot.ppp <- flickr.pts.ppp[hotspots]
cold.ppp <- flickr.pts.ppp[coldspots]

Next we’ll identify the top ten tags from images within hotspots versus images outside of hotspots, ranked by their frequency.

par(mfrow = c(2,1))
## hot activity
hot_tags <- list(hot.ppp$marks$tags1, hot.ppp$marks$tags2, hot.ppp$marks$tags3, 
                 hot.ppp$marks$tags4, hot.ppp$marks$tags5, hot.ppp$marks$tags6, 
                 hot.ppp$marks$tags7, hot.ppp$marks$tags8, hot.ppp$marks$tags9, 
                 hot.ppp$marks$tags10)

hot_tags <- na.omit(hot_tags)
hot_tag_freq <- as.data.frame(table(unlist(hot_tags)))
colnames(hot_tag_freq) <- c("tag", "Freq")
hot_tag_freq <- hot_tag_freq[order(hot_tag_freq$Freq, decreasing = TRUE),]
hot_tag_freq$prop <- hot_tag_freq$Freq/hot.ppp$n

percentilerank<-function(x){
  rx<-rle(sort(x))
  smaller<-cumsum(c(0, rx$lengths))[seq(length(rx$lengths))]
  larger<-rev(cumsum(c(0, rev(rx$lengths))))[-1]
  rxpr<-smaller/(smaller+larger)
  rxpr[match(x, rx$values)]
}

hot_tag_freq$percentile <- percentilerank(hot_tag_freq$Freq)

hot <- barplot(hot_tag_freq$prop[1:10], names.arg = "", 
               ylim = c(0,max(hot_tag_freq$prop)), xlab = "", 
               ylab = "Frequency", main = "Images in hotspots")
text(hot, par("usr")[3], labels = hot_tag_freq$tag[1:10], 
     srt = 25, adj = c(1.1,1.1), xpd = TRUE)
axis(2)

## cold activity
cold_tags <- list(cold.ppp$marks$tags1, cold.ppp$marks$tags2, 
                  cold.ppp$marks$tags3, cold.ppp$marks$tags4, 
                  cold.ppp$marks$tags5, cold.ppp$marks$tags6, 
                  cold.ppp$marks$tags7, cold.ppp$marks$tags8,
                  cold.ppp$marks$tags9, cold.ppp$marks$tags10)

cold_tags <- na.omit(cold_tags)
cold_tag_freq <- as.data.frame(table(unlist(cold_tags)))
colnames(cold_tag_freq) <- c("tag", "Freq")
cold_tag_freq <- cold_tag_freq[order(cold_tag_freq$Freq, decreasing = TRUE),]
cold_tag_freq$prop <- cold_tag_freq$Freq/cold.ppp$n
cold_tag_freq$percentile <- percentilerank(cold_tag_freq$Freq)

cold <- barplot(cold_tag_freq$prop[1:10], names.arg = "", 
                ylim = c(0,max(cold_tag_freq$prop)), xlab = "Tags", 
                ylab = "Frequency", main = "Images outside hotspots")
text(cold, par("usr")[3], labels = cold_tag_freq$tag[1:10], 
     srt = 25, adj = c(1.1,1.1), xpd = TRUE)
axis(2)

Fairly similar imaging targets within high and low use areas. The only major difference is the greater appearance of mountain-related images in hotspots, which don’t appear as frequently in other parts of the Northern Forest, suggesting that topography is a major natural feature triggering engagement by visitors.

I explore more spatial patterns in Northern Forest image content on our project website for this study.

Seasonality and diurnality


Finally, we’ll look at some temporal patterns of social media activity in the Northern Forest.

Seasonal photography

In ArcGIS, I categorized images as belonging to one of the four seasons based on their date, and added a new field to our gridded shapefile with total images for winter, spring, summer, and fall.

Here are seasonal kernel density maps for rural photography in the entire region.

par(mfrow = c(2,2), mar=c(1,0,4,0))
# winter
plot(density.ppp(flickr.grid.NFR, sigma = 10, edge=T, eps = 1, 
                 weights = flickr.grid.NFR$marks$winter), 
     main = "Winter", col = img.pal)
# spring
plot(density.ppp(flickr.grid.NFR, sigma = 10, edge=T, eps = 1, 
                 weights = flickr.grid.NFR$marks$spring), 
     main = "Spring", col = img.pal)
# summer
plot(density.ppp(flickr.grid.NFR, sigma = 10, edge=T, eps = 1, 
                 weights = flickr.grid.NFR$marks$summer), 
     main = "Summer", col = img.pal)
# fall
plot(density.ppp(flickr.grid.NFR, sigma = 10, edge=T, eps = 1, 
                 weights = flickr.grid.NFR$marks$fall), 
     main = "Fall", col = img.pal)

A few things to take notice of here. Except for the Bigelow Preserve area, there is relatively little Flickr activity in Maine in the winter compared to the other Northern Forest regions. Flickr activity seems to be most widespread in the spring, with many local and regional hotspots appearing in each state.


Monthly photography

Let’s look at monthly patterns of Flickr photography between states in the Northern Forest.

# import cleaned Flickr data frame with time fields
flickr_tidy <- read.csv("flickr_tidy.csv")

NY.monthly <- data.frame(table(flickr_tidy$month[flickr_tidy$STUSPS == "NY"]))
colnames(NY.monthly) <- c("Month", "Freq")

VT.monthly <- data.frame(table(flickr_tidy$month[flickr_tidy$STUSPS == "VT"]))
colnames(VT.monthly) <- c("Month", "Freq")

NH.monthly <- data.frame(table(flickr_tidy$month[flickr_tidy$STUSPS == "NH"]))
colnames(NH.monthly) <- c("Month", "Freq")

ME.monthly <- data.frame(table(flickr_tidy$month[flickr_tidy$STUSPS == "ME"]))
colnames(ME.monthly) <- c("Month", "Freq")

# combine state-based data frames of image quantities
states.monthly <- rbind(NY.monthly, VT.monthly, NH.monthly, ME.monthly)
states.monthly$State <- c(rep("NY", 12), rep("VT", 12), 
                          rep("NH", 12), rep("ME", 12))
states.monthly$Month <- as.integer(states.monthly$Month)
states.monthly$State = factor(states.monthly$State, 
                              levels=c("NY", "VT", "NH", "ME"))

library(ggplot2)
# bar plots separated by state
ggplot(states.monthly, aes(x = Month, y = Freq)) + 
  geom_col() + 
  facet_grid(State ~.) +
  theme_classic() + 
  ylab("Number of images") + xlab("Month (1 = January)") + 
  scale_x_continuous(breaks = c(1:12), labels = c(1:12), limits = c(0,13))

# stacked line graphs
ggplot(states.monthly, aes(x = Month, y = Freq, color = State)) + 
  geom_line(lwd = 1.2) + 
  geom_point() +
  theme_classic() + 
  ylab("Number of images") + xlab("Month (1 = January)") + 
  scale_x_continuous(breaks = c(1:12), labels = c(1:12), limits = c(0,13))

We see that photography is greatest from June to October for all states. People take fewer photos during the winter, and this is especially true in Maine, where the number of images drops precipitously in November and doesn’t pick up until March.


Hourly photography

Finally, how does photography vary over the course of a single day? Here we use the timestamps associated with each Flickr image in the dataset to show these hourly photography rates for each state.

NY.hourly <- data.frame(table(flickr_tidy$hour[flickr_tidy$STUSPS == "NY"]))
colnames(NY.hourly) <- c("Hour", "Freq")
VT.hourly <- data.frame(table(flickr_tidy$hour[flickr_tidy$STUSPS == "VT"]))
colnames(VT.hourly) <- c("Hour", "Freq")
NH.hourly <- data.frame(table(flickr_tidy$hour[flickr_tidy$STUSPS == "NH"]))
colnames(NH.hourly) <- c("Hour", "Freq")
ME.hourly <- data.frame(table(flickr_tidy$hour[flickr_tidy$STUSPS == "ME"]))
colnames(ME.hourly) <- c("Hour", "Freq")

states.hourly <- rbind(NY.hourly, VT.hourly, NH.hourly, ME.hourly)
states.hourly$State <- c(rep("NY", 24), rep("VT", 24), rep("NH", 24), rep("ME", 24))
states.hourly$Hour <- as.integer(states.hourly$Hour)
states.hourly$State = factor(states.hourly$State, levels=c("NY", "VT", "NH", "ME"))

# bar plots separated by state
ggplot(states.hourly, aes(x = Hour, y = Freq)) + geom_col() + 
  facet_grid(State ~.) + theme_classic() + 
  scale_x_continuous(breaks = c(1:24), labels = c(1:24), limits = c(0,25)) + 
  ggplot2::annotate("rect", xmin = 0, xmax = 6, 
                    ymin = 0, ymax = max(states.hourly$Freq), 
                    alpha = 0.2, fill = "midnightblue") + 
  ggplot2::annotate("rect", xmin = 6, xmax = 18, 
                    ymin = 0, ymax = max(states.hourly$Freq), 
                    alpha = 0.2, fill = "orange") + 
  ggplot2::annotate("rect", xmin = 18, xmax = 25, 
                    ymin = 0, ymax = max(states.hourly$Freq), 
                    alpha = 0.2, fill = "midnightblue") + ylab("Number of images")

# stacked line graphs
ggplot(states.hourly, aes(x = Hour, y = Freq, color = State)) + 
  geom_point() + geom_line(lwd = 1.2) + theme_classic() +
  scale_x_continuous(breaks = c(1:24), labels = c(1:24), limits = c(1,24)) + 
  ggplot2::annotate("rect", xmin = 1, xmax = 6, 
                    ymin = 0, ymax = max(states.hourly$Freq), 
                    alpha = 0.2, fill = "midnightblue") + 
  ggplot2::annotate("rect", xmin = 6, xmax = 18, 
                    ymin = 0, ymax = max(states.hourly$Freq), 
                    alpha = 0.2, fill = "orange") + 
  ggplot2::annotate("rect", xmin = 18, xmax = 24, 
                    ymin = 0, ymax = max(states.hourly$Freq), 
                    alpha = 0.2, fill = "midnightblue")+ ylab("Number of images")

In the above figures, nighttime periods (6:00PM - 6:00AM) are shaded blue and daytime periods (6:00AM - 6:00PM) are shaded yellow.

We see a similar pattern for each state, with Flickr activity greatest during daylight to early evening.

 


Session info

sessionInfo() # R packages/versions 
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18362)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] autothresholdr_1.3.6 raster_3.1-5         scico_1.1.0         
##  [4] leaflet_2.0.3        rgdal_1.4-8          ggmap_3.0.0         
##  [7] ggplot2_3.3.0        spatstat_1.63-3      rpart_4.1-15        
## [10] nlme_3.1-147         spatstat.data_1.4-3  maptools_0.9-9      
## [13] sp_1.4-1             knitr_1.28          
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.4.6            lattice_0.20-41         tidyr_1.0.2            
##  [4] deldir_0.1-25           ps_1.3.2                png_0.1-7              
##  [7] leaflet.providers_1.9.0 assertthat_0.2.1        digest_0.6.25          
## [10] R6_2.4.1                plyr_1.8.6              filesstrings_3.1.5     
## [13] backports_1.1.6         evaluate_0.14           httr_1.4.1             
## [16] tensor_1.5              highr_0.8               pillar_1.4.3           
## [19] RgoogleMaps_1.4.5.3     rlang_0.4.5             Matrix_1.2-18          
## [22] goftest_1.2-2           checkmate_2.0.0         rmarkdown_2.1          
## [25] labeling_0.3            splines_4.0.0           stringr_1.4.0          
## [28] foreign_0.8-78          htmlwidgets_1.5.1       polyclip_1.10-0        
## [31] munsell_0.5.0           compiler_4.0.0          xfun_0.13              
## [34] pkgconfig_2.0.3         mgcv_1.8-31             htmltools_0.4.0        
## [37] tidyselect_1.0.0        tibble_3.0.1            codetools_0.2-16       
## [40] crayon_1.3.4            dplyr_0.8.5             withr_2.2.0            
## [43] bitops_1.0-6            grid_4.0.0              jsonlite_1.6.1         
## [46] gtable_0.3.0            lifecycle_0.2.0         magrittr_1.5           
## [49] scales_1.1.0            stringi_1.4.6           farver_2.0.3           
## [52] ellipsis_0.3.0          vctrs_0.2.4             spatstat.utils_1.17-0  
## [55] rjson_0.2.20            tools_4.0.0             glue_1.4.0             
## [58] ijtiff_2.0.5            purrr_0.3.4             crosstalk_1.1.0.1      
## [61] jpeg_0.1-8.1            processx_3.4.2          abind_1.4-5            
## [64] yaml_2.2.1              colorspace_1.4-1        strex_1.2.0

Ghermandi, Andrea, and Michael Sinclair. 2019. “Passive crowdsourcing of social media in environmental research: A systematic map.” Global Environmental Change 55 (September 2018): 36–47. https://doi.org/10.1016/j.gloenvcha.2019.02.003.

Kuehn, Diane, James Gibbs, Harrison Goldspiel, Brannon Barr, Alden Sampson, Marshall Moutenot, Joshua Badding, and Lilly Stradtman. 2020. “Using Social Media Data and Park Characteristics to Understand Park Visitation.” The Journal of Park and Recreation Administration, 1–12. https://doi.org/10.18666/jpra-2019-10035.

Medhat, Moataz, Alan Dorin, Adrian Dyer, Martin Burd, Zoë Bukovac, and Mani Shrestha. 2017. “Mapping species distributions with social media geo-tagged images: Case studies of bees and flowering plants in Australia Ecological Informatics Mapping species distributions with social media geo-tagged images : Case studies of bees and fl owering plants.” Ecological Informatics 39 (February): 23–31. https://doi.org/10.1016/j.ecoinf.2017.02.006.

Wood, Spencer A., Anne D. Guerry, Jessica M. Silver, and Martin Lacayo. 2013. “Using social media to quantify nature-based tourism and recreation.” Scientific Reports 3. https://doi.org/10.1038/srep02976.