Week 5 - BALT 4363 - Descriptive Statistics and Probability Distributions

April 16, 2026

As can be seen in the Iris dataset example below, this week’s reading focused on the practical and real-world applications of both descriptive statistics and probability distributions. It made me think about how these statistical and analytical tools could have been streamlined during the data analysis portion of my economics research project with the help of Python and AI. For some background, my research examines the concept of logistics sprawl in rural areas. Logistics sprawl is a phenomenon characterized by the geographic dispersal of distribution centers, warehouses, and other logistics facilities. In other words, these logistics campuses are relocating to various areas, including, but not limited to, metropolitan, suburban, exurban, and rural regions. This is primarily motivated by the proximity of key transportation thoroughfares such as highways, airports, and railways, as well as the growing need for larger warehousing centers on larger and relatively cheaper tracts of land. Arguably, these factors have contributed to the presence of warehousing and distribution centers in rural areas. By utilizing data from the U.S. Department of Agriculture’s Economic Research Service, the U.S. Census Bureau, and the U.S. Bureau of Labor Statistics, my partner and I were able to construct a scatter plot and a table of descriptive statistics to demonstrate the growing presence of logistics centers in rural areas.

Long story short, we had over 250,000 data elements to manipulate. Excel was having difficulty performing any statistical and graphical functions because the dataset was so large. Knowing what I know now about Python and AI, I would have leveraged these resources due to their flexibility and ability to store and perform functions on large datasets. My struggles with Excel would have been replaced by writing a few lines of simple code. The functions of NumPy, Pandas, and Matplotlib would have easily generated the desired mean, median, standard deviation, and skewness in tabular format while simultaneously providing a means of creating a comprehensible scatterplot. Furthermore, our project lacked transportation-related variables due to the U.S. Department of Transportation’s incoherent data structuring and accessibility. The link to this data could have been transposed into Python and, consequently, transformed into actionable insights. By integrating transportation variables within our data analysis, this use of Python would have ultimately enriched our research argument.

Search This Blog

BALT 4363

Week 5 - BALT 4363 - Descriptive Statistics and Probability Distributions

Comments

Post a Comment

Popular posts from this blog

Week 1 - BALT 4363 - Introduction

Week 2 - BALT 4363 - Python Data Manipulation

Week 3 - BALT 4363 - Handling and Cleaning Data with Python Libraries