Tourism in Nepal

Analysis Visualization Rstats

In the last blog, I extracted the data about Nepal tourism from Wikipedia using the Rvest Package. In this blog, I will analyse the extracted data and create the visualization using ggplot2, plotly and gganimate.

Author

Affiliation

Diwash Shrestha

 

Published

Nov. 22, 2019

DOI

Introduction

Nepal is celebrating year 2020 as “Tourism Year” targeting 2 million international tourist arrivals. You can learn more about the #VisitNepal2020. I want to see the trend/history of nepal tourism and extracted the data from the Wikipedia in Scraping Data with R blog post.

In this section, we will work on the scrapped data from the Scraping Data with R blog post and perform analysis, create some visualization and understand the trend of Tourism in Nepal.

Lets Start

I will load the required package for this blog.

library(ggplot2)
library(rvest)
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
library(lubridate)
library(gganimate)
library(plotly)
library(tibble)
wikipage <- read_html("https://en.wikipedia.org/wiki/Tourism_in_Nepal")
table <- wikipage %>%
html_nodes("table.wikitable") %>%
html_table(header=T,fill = TRUE)
table <- table[[1]]
 
#add the table to a dataframe
tourist_df <- as.data.frame(table)
names(tourist_df) <- c("year","tourist_number","per_change")
tourist_df$tourist_number <- str_remove(tourist_df$tourist_number,",")
tourist_df$per_change <- str_remove(tourist_df$per_change, "%")
tourist_df$tourist_number <- str_remove(tourist_df$tourist_number,",")
tourist_df$tourist_number <- as.integer(tourist_df$tourist_number)
tourist_df$per_change <- as.numeric(tourist_df$per_change)
tourist_df$year <- as.integer(tourist_df$year)
ABCDEFGHIJ0123456789
year
<int>
tourist_number
<int>
per_change
<dbl>
1993293567-12.2
199432653111.2
199536339511.3
19963936138.3
19974218577.2
19984636849.9
19994915046.0
2000463646-5.7
2001361237-22.1
2002275468-23.7

This is the data frame which I got after the extraction and cleaning process in the last blog. Now I will create a visualization from this data using ggplot2 and plotly.

Visuals

gplot <- ggplot(tourist_df, aes(
  x = year,
  y = tourist_number, fill = tourist_number
)) +
  geom_col() +
  labs(
    y = "Number of Tourist",
    title = "Number of Tourist Arrival from 1993 to 2018"
  ) +
  scale_fill_gradient(low="#fedb81", high="red")+
  theme_minimal()
ggplotly(gplot)%>%config(displayModeBar = F)
200020102020025000050000075000010000001250000
5000007500001000000tourist_numberNumber of Tourist Arrival from 1993 to 2018yearNumber of Tourist

This barplot shows the number of arrival of international tourist from 1993 to 2018. The trend of tourist number was good from 1993 to 2000. Then from 2001, the flow of the international tourist decreased as it was the period where civil war was at its height. The country was in the emergency period.

In 2015 there was an earthquake which destroyed most of the historic sites in Kathmandu valley and with large physical and human casualties. This decreased the number of tourist in Nepal. I can see that 2016 onward the number of tourists arrival increased every year. In 2018 the number of tourist arrival crossed first time 1 million.

gplot <- ggplot(tourist_df, aes(
  x = year,
  y = per_change, fill = per_change
)) +
  geom_col() +
  labs(
    y = "Percent Change of Tourist",
    title = "Percentage Change  of Tourist Arrival from 1993 to 2018"
  ) +
  scale_fill_gradient(low="#fedb81", high="red")+
  theme_minimal()
ggplotly(gplot)
200020102020-2002040
-2002040per_changePercentage Change of Tourist Arrival from 1993 to 2018yearPercent Change of Tourist

This plot shows the percentage change in the flow of the tourist arrival every year.

Top 10 Country

In this section, I will find the top 10 countries with must tourist to Nepal.

ABCDEFGHIJ0123456789
Rank
<chr>
Country
<chr>
2013
<chr>
2014
<chr>
2015
<chr>
2016
<chr>
2017
<chr>
2018
<chr>
1India13534375124118249160832194323254150
2China12380566984104005104664153633169543
3United States498304268753645791469189593218
4United Kingdom367592973046295510586346661144
5Sri Lanka375464436757521453616949055869
6Thailand334223233826722391545242941653
7South Korea232051811225171343013721829680
8Australia245161661925507333713842938972
9MyanmarN/A2163125769308524140236274
10Germany180281640523812299183687936641

This is the data I got from the last blog and now I need to extract the top 10 countries from this dataframe.

con_tour_df<- head(con_tour_df,10)

This dataframe has the years in a different column. I keep all the years in year column along with there tourist arrival value.

con_tour_df <- pivot_longer(con_tour_df,-c(Country,Rank),names_to = "year")
con_tour_df$year <- as.numeric(con_tour_df$year)
con_tour_df$value <- as.numeric(con_tour_df$value)

Now, I need to rank the country based on the number of tourist arrival from that country. I use mutate() to create a new column “rank”.

con_tour_df <- con_tour_df%>%
  group_by(year) %>%
  mutate(rank = rank(-value),Value_lbl = paste0(" ",value))%>% filter(rank <= 10) 
con_tour_df$value <- as.integer(con_tour_df$value)
ABCDEFGHIJ0123456789
Rank
<chr>
Country
<chr>
year
<dbl>
value
<int>
rank
<dbl>
Value_lbl
<chr>
1India20131353431135343
1India201475124175124
1India20151182491118249
1India20161608321160832
1India20171943231194323
1India20182541501254150
2China20131238052123805
2China201466984266984
2China20151040052104005
2China20161046642104664

This the final data frame that I got after cleaning. Now, it is time to create some visualization/animation.

I am using the ggplot2 package to create static visualization and gganimate to create beautiful animations.

colors <- c(
  "India" = "#FF7F24", "China" = "#E31A1C", "Sri Lanka" = "#FFB90F",
  "United States" = "#4876FF","Thailand" = "#BF3EFF", 
  "United Kingdom" = "#FF4040", "South Korea" = "#EE8262",
  "Germany" = "#8B7E66","Australia" = "#2E8B57", "Myanmar" = "gray",
  "Bangladesh" = "#006400","France"="#104E8B", "Japan" ="#CDB5CD"
)

In this visualization, i need 10 different colours to show a different country in the animation. I created a colour palette and named colors which will be used while creating visualization below.

anim <- ggplot(con_tour_df, aes(x = rank, y = value, group = Country)) +
  geom_bar(stat = "identity", aes(fill = Country)) +
  geom_text(aes(y = 0, label = paste(Country, " ")), vjust = 0.2, hjust = 1, size = 4.5) +
  geom_text(aes(y = value, label = Value_lbl, hjust = 0), size = 4.5) +
  scale_y_continuous(labels = scales::comma) +
  scale_x_reverse() +
  coord_flip(clip = "off", expand = FALSE) +
  scale_fill_manual(values = colors) +
  theme_minimal() +
  theme(
    axis.line = element_blank(),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    legend.position = "none",
    panel.background = element_blank(),
    panel.border = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_line(size = .1, color = "grey"),
    panel.grid.minor.x = element_line(size = .1, color = "grey"),
    plot.title = element_text(size = 25, hjust = 0.5, face = "bold", vjust = 1),
    plot.subtitle = element_text(size = 18, hjust = 0.5, face = "italic"),
    plot.caption = element_text(size = 8, hjust = 0.5, face = "italic", color = "grey"),
    plot.background = element_blank(),
    plot.margin = margin(2, 2, 2, 4, "cm")
  ) +
  transition_states(year, transition_length = 3, state_length = 1) +
  enter_drift(x_mod = -2) +
  exit_drift(x_mod = 2) +
  ease_aes("cubic-in") +
  view_follow(fixed_x = TRUE) +
  labs(
    title = "Tourist Arrival by country : {closest_state}",
    subtitle = "Top 10 Countries"
  )
animate(anim, fps = 10, width = 1000, height = 550)
anim_save("simulations.gif")

Conclusion

In this blog I showed the interactive visualization made with ggplot2 and plotly. I also made the animation showing the number of tourist arrival based on country from 2013 to 2017.

Reference

AbdulMajedRaja Rs. gganimate ggplot2

Footnotes