Skip to contents

Download and clean up US States,DC,PR, or Island Areas block data Census 2020 (for EJAM)

Usage

census2020_get_data(
  mystates = c(state.abb, "DC", "PR"),
  folder = NULL,
  folderout = NULL,
  do_download = TRUE,
  do_unzip = TRUE,
  do_read = TRUE,
  do_clean = TRUE,
  overwrite = TRUE,
  sumlev = 750,
  cols_to_keep
)

Arguments

mystates

default is DC, PR, and the 50 states – lacks the island areas c('VI','GU','MP','AS') – but census2020_get_data() can in some cases handle a mix of States/DC/PR and/or island areas via helper function census2020_get_data_islandareas(), returning either type of data, or a combined data.table if both are requested. But block resolution is not available from these files for island areas, so default for those is to get block groups, which would not make sense to mix with blocks for states.

However, note this from Census Bureau: "With this release of the 2020 IAC Demographic and Housing Characteristics Summary File, the Census Bureau provides additional demographic and housing characteristics for the Island Areas down to the block, block group, and census tract levels." Despite this, it appears that H1 (housing) table data are provided at block resolution, but P1 (population count) is only at block group, tract, etc. according to page 3 of the Island Areas Tech. Doc.

folder

For downloaded files. Default is a tempdir. Folder is created if it does not exist.

folderout

path for assembled results files, default is what folder was set to.

do_download

whether to do census2020_download(), e.g., to just do subsequent steps if that one step was already done, but depends on temp folder, etc. so easier to just download again (default).

do_unzip

whether to do census2020_unzip()

do_read

whether to do census2020_read()

do_clean

whether to do census2020_clean()

overwrite

passed to census2020_download()

sumlev

Generally should not be changed from defaults. Value of 750 means blocks, the only option likely to work here. 150 would mean blockgroups as for Island Areas since they seemed to lack block data here. 140 is tracts, 40 and 50 are State and County. If mystates are Island Areas, this function uses 150 instead of 750. But a mix of resolutions would not really make sense.

However, note this from Census Bureau: "With this release of the 2020 IAC Demographic and Housing Characteristics Summary File, the Census Bureau provides additional demographic and housing characteristics for the Island Areas down to the block, block group, and census tract levels." Despite this, it appears that H1 (housing) table data are provided at block resolution, but P1 (population count) is only at block group, tract, etc. according to page 3 of the Island Areas Tech. Doc.

cols_to_keep

omit to use the default in census2020_clean(). Otherwise can be a vector of colnames like would be seen after census2020_get_data(do_clean = F) which would keep them all and also not rename them (usually done, via census_col_names_map table). cols_to_keep = "all" means keep them all.

Value

invisibly returns a data.table of US Census blocks with columns like blockid lat lon pop area (area in square meters), or just intermediate info depending on do_read, do_clean, etc.

Details

  To create certain data tables used by the EJAM package,
  which provides reports for EJSCREEN,
  EJAM relied on the census2020_download package, and
  used scripts like EJAM/data-raw/datacreate_ . . . .R
  to do something like this:

  blocks <- census2020_get_data() # default excludes Island Areas
  mylist <- census2020_save_datasets(blocks)

  bgid2fips    = mylist$bgid2fips
  blockid2fips = mylist$blockid2fips
  blockpoints  = mylist$blockpoints
  blockwts     = mylist$blockwts
  quaddata     = mylist$quaddata

For technical details on the files downloaded and tables and variables, see the detailed references in the help for census2020_read().

See also

census2020_save_datasets() creates individual data.tables, after census2020_get_data() has done these:

Examples

 if (FALSE) { # \dontrun{
 # Get race/ethnicity counts by block:
 x = census2020_get_data(c('DE', 'CT'), cols_to_keep = 'all')
 table(x$pop == rowSums(x[,c('hisp', 'nhwa',  'nhba', 'nhaiana',  'nhaa', 'nhnhpia', 'nhotheralone', 'nhmulti')] ))

 y = census2020_get_data() # All States/DC/PR at block resolution
 z = census2020_get_data_islandareas() # VI,GU,MP,AS at blockgroup scale
 } # }