Data Critique

Overview

     We chose to analyze data from the Public Libraries Survey in 2022. Data collection was funded by the U.S. Institute of Museum and Library Services (IMLS), a government agency supporting library infrastructure nationwide. A total of 9,248 public libraries in the 51 states and outlying American territories were tasked to complete a census survey – a collection method involving the entire intended population. Local public libraries reported their data to state library agencies, who then forwarded the data to the IMLS to compile nationally. The original data was collected by local libraries and submitted through state library agencies to the IMLS, which coordinated the nationwide survey process. The final dataset includes a broad range of variables that capture how libraries are funded, staffed, and used.

     With these variables, we can analyze critical patterns in public library infrastructure that underscore broader issues of equity, inclusion, and resource distribution in terms of children’s library services. By comparing library income per capita and metrics on children circulation materials and programs, we can analyze local and state-level disparities in both access and investment in the younger generation. Together, these insights prompt humanistic questions: Do children get to benefit from this public knowledge space? How does spatial neglect reinforce social exclusion? And what kinds of community needs and stories are left out when data only measures what can be counted? In this way, this dataset serves as a reflection of structural inequalities that can guide us toward more inclusive public planning, especially with regards to child-centric infrastructure.

Data Processing

     We filtered for libraries that met all the criteria in the Federal-State Cooperative System (FSCS) Public Library definition, as this was also the standard adhered to in the original data file documentation. This introduces silences into our data analysis – libraries that do not meet the criteria below are not represented adequately. An interesting note is that non-FSCS Public Library definition libraries are scattered across high and low income as well as rural and urban states.

     After selecting variables relevant to children’s library services, we created metrics like children’s circulation rate and children’s program rate to measure the proportion of child-centered resources to general resources provided by each library. We also adjusted variables like income and operating expense by dividing them over the library’s service area population. This was to obtain per capita metrics to conduct a fair comparison across different locales and states.

 

     Figure 1 shows the official FSCS definition of a public library, which determined which entries we included or excluded from our analysis.

Figure 1: Public library as defined by FSCS

Limitations

     Limitations of the data include the presence of numerous missing values, as indicated by placeholders -1, -3, or -9. To ensure the integrity of our analysis, we agreed to remove any data points containing these values. Retaining them would bias results downwards, as these negative values do not represent actual measurements and would skew calculations in a decreasing direction.

     Variables measuring children’s program attendance, especially for those targeted for ages 0-11, also included all attendees, such as adult caregivers. As a result, the attendance metrics may overestimate actual child participation. A higher attendance rate does not necessarily reflect a larger number of children; it could instead indicate that more adults were accompanying their children to the event. Since there was no additional information available that could exactly discern between child and adult attendees, we were unable to isolate statistics on solely children attendance. This limitation introduces uncertainty when interpreting children programming reach based on attendance data.

     The original column names were often abbreviated and not immediately interpretable due to naming efficiency constraints. To address this, we referred to the data documentation for more context, better understanding each variable’s meanings. We then renamed key columns and created derived metrics in our final dataset to improve clarity and facilitate easier interpretation at first glance.

Most prominently, the data lacks qualitative and contextual depth. The variables were selected according to institutional priorities, emphasizing numerical outputs like attendance counts, funding, and resource circulation. These choices reflect power – narrowing the success of a public library to measures that don’t necessarily highlight lived experiences and needs.

     People-centered aspects of public libraries like experiential quality (how welcoming the library feels), community needs (how library resources equip local residents), granular user demographics (race, income, disability, language spoken) and many other factors that are key dimensions of equitable public service. However, the technocratic approach applied in the dataset is unable to reveal social inequalities. For example, a high proportion of children’s circulation materials does not mean that all community members benefit equally from them. Marginalized users might face language barriers, unwelcoming environments, or systemic access limitations that the data does not record. 

Learn more now!