從excel中讀取美國人口信息做一個簡單的統計

本文轉載自查看原文 2018-10-08 02:51 759 python操作文件

假定你有一張電子表格的數據，來自於 2010 年美國人口普查。你有一個無聊的任務，要遍歷表中的幾千行，計算總的人口，以及每個縣的普查區的數目（普查區就是一個地理區域，是為人口普查而定義的）。每行表示一個人口普查區。我們將這個電子表格文件命名為 censuspopdata.xlsx，它的內容下圖所示。盡管 Excel 是要能夠計算多個選中單元格的和，你仍然需要選中 3000 個以上縣的單元格。即使手工計算一個縣的人口只需要幾秒鍾，整張電子表格也需要幾個小時時間。

在這個項目中，你要編寫一個腳本，從人口普查電子表格文件中讀取數據，並在幾秒鍾內計算出每個縣的統計值。下面是程序要做的事：

• 從 Excel 電子表格中讀取數據。

• 計算每個縣中普查區的數目。

• 計算每個縣的總人口。

• 打印結果。

這意味着代碼需要完成下列任務：

• 用 openpyxl 模塊打開 Excel 文檔並讀取單元格。

• 計算所有普查區和人口數據，將它保存到一個數據結構中。

• 利用 pprint 模塊可以把字典轉換成字符串，將該數據結構寫入一個擴展名為.py的文本文件。

import openpyxl, pprint

# Read the spreadsheet data
print('Opening workbook')
wb = openpyxl.load_workbook('censuspopdata.xlsx')
sheet = wb.active

countryData = {}

# Fill in countryData with each city's pop and tracts
for row in range(2, sheet.max_row+1):

	# Each row in the spreasheet has data
	state = sheet['B' + str(row)].value
	country = sheet['C' + str(row)].value
	pop = sheet['D' + str(row)].value

	# make sure the key state exists
	countryData.setdefault(state, {})
	# make sure the key for country in state exists
	countryData[state].setdefault(country,{'tracts':0, 'pop':0})
	# Each row represents one census tract, so increment by one
	countryData[state][country]['tracts'] += 1
	# Increase the country pop by the pop in this census tract
	countryData[state][country]['pop'] += int(pop)

# Open a new text file and write the contents fo countryData to it
print('Writing results...')
resultFile = open('census2010.py', 'w')
resultFile.write('allData = ' + pprint.pformat(countryData))

census2010.py中的內如如下所示

為什么使用pprint和生成一個以py結尾的文件呢，因為在另外一個文件中，我們可以直接使用

import census2010

print(census2010.allData["AK"])

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 美國人口普查數據分析《我們向美國人民學習什么》簡單的重力模擬：在Processing中做一個彈跳的球利用JS跨域做一個簡單的頁面訪問統計系統做一個簡單的scrapy爬蟲獲取一個Java項目的所有接口信息 python3+xlwt 讀取txt信息並寫入到excel中簡單的做一個顏色進制轉換用 Vue 做一個簡單的購物app 用Python做一個簡單的小游戲