Week 5 - BALT 4363 - Descriptive Statistics and Probability Distributions
As can be seen in
the Iris dataset example below, this week’s reading focused on the practical and real-world
applications of both descriptive statistics and probability distributions. It
made me think about how these statistical and analytical tools could have been streamlined
during the data analysis portion of my economics research project with the help
of Python and AI. For some background, my research examines the concept of
logistics sprawl in rural areas. Logistics sprawl is a phenomenon characterized
by the geographic dispersal of distribution centers, warehouses, and other
logistics facilities. In other words, these logistics campuses are relocating
to various areas, including, but not limited to, metropolitan, suburban,
exurban, and rural regions. This is primarily motivated by the proximity of key
transportation thoroughfares such as highways, airports, and railways, as well
as the growing need for larger warehousing centers on larger and relatively
cheaper tracts of land. Arguably, these factors have contributed to the
presence of warehousing and distribution centers in rural areas. By utilizing
data from the U.S. Department of Agriculture’s Economic Research Service, the
U.S. Census Bureau, and the U.S. Bureau of Labor Statistics, my partner and I were
able to construct a scatter plot and a table of descriptive statistics to
demonstrate the growing presence of logistics centers in rural areas.
Long story short,
we had over 250,000 data elements to manipulate. Excel was having
difficulty performing any statistical and graphical functions because the
dataset was so large. Knowing what I know now about Python and AI, I would have
leveraged these resources due to their flexibility and ability to store and
perform functions on large datasets. My struggles with Excel would have been
replaced by writing a few lines of simple code. The functions of NumPy, Pandas,
and Matplotlib would have easily generated the desired mean, median, standard
deviation, and skewness in tabular format while simultaneously providing a
means of creating a comprehensible scatterplot. Furthermore, our project lacked
transportation-related variables due to the U.S. Department of Transportation’s
incoherent data structuring and accessibility. The link to this data could have
been transposed into Python and, consequently, transformed into actionable
insights. By integrating transportation variables within our data analysis,
this use of Python would have ultimately enriched our research argument.
Comments
Post a Comment