Data Critique

Data Set 1: Global Refugee & Asylum-Seeker

The dataset that our group chose, titled Global Refugee and Asylum-Seeker Data (2019-2024), is sourced primarily from the United Nations High Commissioner for Refugees (UNHCR) and is hosted on Kaggle. This dataset has information about refugees, asylum seekers, internally displaced persons (IDPs), stateless persons, and more. The data also breaks down by the country of origin as well as the country of asylum. This information will be extremely helpful in answering our research question for the project, on how asylum-seeking patterns in the United States, specifically Mexican citizens, changed over the years and what global events can be attributed to these trends.

The dataset pulled information from multiple sources, including the UNHCR as mentioned previously. On top of the UNHCR, the Internal Displacement Monitoring Centre (IDMC) and the United Nations Relief and Works Agency for Palestine Refugees in the Near East (UNRWA) all provided information that was included in this dataset. The Expert Group on Refugee and IDP Statistics (EGRIS) also played a pivotal role in transforming this dataset by coordinating the collection of the data from the various sources. In addition, the data includes refugees under UNHCR’s mandate as well as Palestine refugees under the United Nations Relief and Works Agency for Palestine Refugees in the Near East (UNRWA)’s mandate, asylum-seekers, other people in need of international protection and IDPs (reported by the Internal Displacement Monitoring Centre).

According to the dataset’s metadata, the collection methodology indicates that the data was extracted from the UNHCR Data Portal using the provided data finder tool. The data finder was configured to include the following parameters: Data Group: Displacement Dataset: Population Display Type: Totals Population Types: REF (Refugees), ASY (Asylum-seekers), IDP (Internally Displaced Persons), OIP (Other People in Need of International Protection), STA ( Stateless Persons), HST (Host Community), OOC (Others of Concern) Year Range: 2019 to 2024 Country of Origin: All Country of Asylum: All Data Download: The configured data finder query was submitted, and the resulting data was downloaded in a structured format. The data was saved as a CSV file for further processing and analysis. Data Transformation: Cleaning: The downloaded CSV file was cleaned to remove any irrelevant or duplicate entries.

The overall collection of the data is largely funded by governments across the globe. Governments are UNHCR’s largest source of funding and account for ~80% of their annual income. In 2024, the United States donated approximately 2 billion dollars, while the next highest donors, Germany and the European Union, donated 333 million and 270 million respectively. The rest of their funding is from the private sector and 11 national partners.

To build upon the previously mentioned data, the dataset includes information on refugees, asylum seekers, internally displaced persons, etc., all categorized by country of origin and country of asylum. This data will be extremely useful in uncovering trends in migration. For our analysis specifically on the Mexican citizens seeking asylum in the United States, this will be helpful in understanding the trends over the years. For example, we will be able to find the yearly changes in individuals seeking asylum and the number of returned refugees, giving us insight into patterns of migration. In addition, because this data is from 2019 to 2024, we can also see how global events and/or policies such as the COVID-19 pandemic impacted these trends.

The ideological effects of this dataset reflect how the United Nations structures and defines human displacement. By dividing people into categories such as “refugees,” “asylum-seekers,” and “stateless persons,” it simplifies people’s complex situations and experiences and also limits how we understand migration.

While this dataset is extremely telling, it fails to capture the migration patterns on a smaller scale. The data only contains yearly and country-based data, so we will not be able to explore deeper into specific patterns across smaller timelines and more specific areas within both countries. In order to do this, our group will need to explore more dataset options to supplement our analysis using the global asylum seeker data.

The ideological effects of the way in which our sources have been divided into data would be through the names of people, the wealth held by the people, gender, age, religious affiliation. If our dataset was the only source, there would be no data showing concerns and no data on duration. Additionally, we would lack data of what the mandates that immigrants adhere to are. This information is going to be crucial to help write our narrative in understanding how immigrants are impacted by their concerns and how these concerns throughout the years have impacted asylum seeker’s reasons to immigrate. It would also allow us to look at how these concerns may impact their ability to seek refuge or other limitations they may encounter. Also, the lack of data on the duration of travel would also impact us in understanding what policies may impact the duration of an immigrant’s travel.

Another way to look at this dataset is the ethics of representation. Refugees and asylum seekers often have little control over how their identities are recorded or categorized. The UNHCR’s mandate necessitates data collection for protection and resource allocation, but this process can also reinforce narratives of vulnerability or dependence. By quantifying displacement, the dataset risks reducing human suffering to statistics, potentially overlooking the agency and resilience of displaced individuals.

Ultimately, this dataset illustrates both the potential and limitations of digital methods in humanitarian research. Quantitative data enables large-scale pattern analysis, but it must also be paired with qualitative perspectives to capture the lived experiences behind the numbers.

No AI Used.

Data Set 2: Profiles on Lawful Permanent Residents

LPR stands for Lawful Permanent Resident. It is the official U.S. immigration status for a person who has been given the right to live and work permanently in the United States of America. A few other common names for this status are the following: Green Card holder, permanent resident, resident alien, and legal immigrant.

This OHSS (Office of Homeland Security Statistics) dataset on Lawful Permanent Residents Population Estimates is based on administrative records. Tables and population estimates come from the USCIS (United States Citizenship and Immigration Services) administrative records and the DHS (Department of Homeland Security) records.

The OHSS validates and processes this data to produce reports. Information from the U.S. Census Bureau’s American Community Survey, census surveys, and adjustments for mortality and emigration additionally contribute to the estimates of immigrant population. The OHSS creates datasets from the data provided by several federal government agencies. The U.S. Citizenship and Immigration Services (USCIS) provides information on the naturalization and green card cases. It serves as the primary source for many characteristics seen in the data, including age, sex, and class of admission. The U.S. Department of State (VISA office) counts the number of immigrant visas issued abroad and the nationality of visa holders. The U.S. Census Bureau’s census, American Community Survey (ACS) and other survey data are used for additional calculations to estimate populations.

Since the Lawful Permanent Residents Population Estimates is published from a government agency, the funding comes from federal appropriations provided to the agency by the government. It is not an independent academic dataset.

This dataset is not perfect. Undocumented migrants are not datafied in this dataset. Data on the living conditions of LPRs is absent, both prior to and after admission. We have occupations but no socioeconomic conditions. The data does use estimates. Although the agency has mentioned that they do go through a process of matching and validating the data, it is difficult to say that multiple entries from individuals are entirely absent. We do not have information on the total number of entries, only those accepted.

The data presents the process of immigration to be a process of admission and compliance rather than a social, economical, or human-rights phenomena. We don’t know their lived experience. The data we have is only what fits in the frame the agency has chosen. We know how many people are in specific broad occupations but not their working conditions. Human trafficking is completely absent from this data and the debts people take to come. Morally good categories are chosen for the classification of people. Their occupations are listed along with a classification of admission such as if new LPRs were family-sponsored or became LPRs from employment-based opportunities. Undocumented migrants are not datafied, as such, if policies are made strictly from this data, they would be ignored. The aura of authority makes it seem that the data is precise and accurate however aspects of the data are a calculation. Algorithms were used to match entries, give estimations for emigration and deaths, and fill gaps in the data. It has a certain margin of error that was approved of.

This dataset can be used in conjunction with others to have a fuller picture of immigration into the United States. By itself, we lack information on unauthorized migrant populations and on visa overstays. This makes our population estimates of certain nationalities in this country always have a significant error. Demographic data such as family compositions, education, income, and the employment in households is unknown. We can’t tell how the average LPR is living in the country compared to someone who was born here. Their impact on the social-economics of the country can not be told. The process of admission is not given to us nor the length of time people took to become an LPR. We don’t know the amount of delays, denials, and appeals processed by the department. Perhaps most importantly, we don’t see the experiences of people. The motivations for which they sought a change in status. The dataset fails to capture human trafficking, which is frequently associated with immigration. We don’t know the debts people incurred to be given the opportunity to become an LPR and how that might impact their livelihood in the U.S. We do not have information on the outcomes of their time in the U.S.

No AI Used.